Our conference paper on instrument segmentation in cataract surgery videos has been accepted for presentation at the CBMS 2020 conference.
Title: Pixel-Based Tool Segmentation in Cataract Surgery Videos with Mask R-CNN
Authors: Markus Fox, Mario Taschwer, and Klaus Schoeffmann
Abstract: Automatically detecting surgical tools in recorded surgery videos is an important building block of further content-based video analysis. In ophthalmology, the results of such methods can support training and teaching of operation techniques and enable investigation of medical research questions on a dataset of recorded surgery videos. Our work applies a recent deep-learning segmentation method (Mask R-CNN) to localize and segment surgical tools used in ophthalmic cataract surgery. We add ground-truth annotations for multi-class instance segmentation to two existing datasets of cataract surgery videos and make resulting datasets publicly available for research purposes. In the absence of comparable results from literature, we tune and evaluate Mask R-CNN on these datasets for instrument segmentation/localization and achieve promising results (61% mean average precision on 50% intersection over union for instance segmentation, working even better for bounding box detection or binary segmentation), establishing a reasonable baseline for further research. Moreover, we experiment with common data augmentation techniques and analyze the achieved segmentation performance with respect to each class (instrument), providing evidence for future improvements of this approach.