DeepPyram – Semantic Segmentation in Cataract Surgery Videos

Posted on 2022/06/03.

Our paper has been accepted at MICCAI 2022.

Title: DeepPyram: Enabling Pyramid View and Deformable Pyramid Reception for Semantic Segmentation in Cataract Surgery Videos

Authors: Negin Ghamsarian*, Mario Taschwer, Raphael Sznitman*, Klaus Schoeffmann
* University of Bern, Switzerland

Abstract: Semantic segmentation in cataract surgery has a wide range of applications contributing to surgical outcome enhancement and clinical risk reduction. However, the varying issues in segmenting the different relevant structures in these surgeries make the designation of a unique network quite challenging. This paper proposes a semantic segmentation network termed, DeepPyram, that can deal with these challenges using three novelties: (1) a Pyramid View Fusion module which provides a varying-angle global view of the surrounding region centering at each pixel position in the input convolutional feature map; (2) a Deformable Pyramid Reception module which enables a wide deformable receptive field that can adapt to geometric transformations in the object of interest; and (3) a dedicated Pyramid Loss that adaptively supervises multi-scale semantic feature maps. Compbined we show that these can effectively boost semantic segmentation performance, especially in the case of transparency, deformability, scalability, and blunt edges in objects. We demonstrate that our approach performs at a state-of-the-art level and outperforms a number of existing methods without imposing additional trainable parameters.

Pupil reactions in cataract surgery videos

Posted on 2021/10/01.

Our paper has been accepted for publication in the PLOS ONE journal.

Title: Automatic detection of pupil reactions in cataract surgery videos

Authors: Natalia Sokolova, Klaus Schoeffmann, Mario Taschwer, Stephanie Sarny, Doris Putzgruber-Adamitsch, Yosuf El-Shabrawi

Abstract: Nowadays, post-operative analysis of cataract surgeries becomes more and more
important especially to detect intraoperative complications. Some severe complications
may arise from sudden pupil reactions. These may lead to a significant damage of ocular
structure, especially in inexperienced surgeons. Therefore, the automatic retrieval of
such events may be a great support for the post-operative analysis. This helps to train
young surgeons to deal with such situations. In this work, we automatically detect pupil
reactions in cataract surgery videos. We employ the Mask R-CNN architecture as a
segmentation algorithm, which allows us to segment the pupil and iris with pixel-based
accuracy and then track their size changes across the entire video. We can detect pupil
reactions with a harmonic mean (H) of Recall, Precision, and Ground Truth Coverage
Rate (GTCR) of 60, 9% and average prediction length (PL) of 18.93 seconds. However,
we consider the best configuration for practical use the one with an H value of 59.4%
and PL of 10.2 seconds, which is much shorter. We further investigate the
generalization of this method on a slightly different dataset without retraining the
model. In this evaluation, we achieve an H value of 49.3% with the PL of 18.15 seconds.

ReCal-Net: Semantic Segmentation in Cataract Surgery Videos

Posted on 2021/09/28.

Our paper has been accepted for publication at the ICONIP 2021 conference.

Title: ReCal-Net: Joint Region-Channel-Wise Calibrated Network for Semantic Segmentation in Cataract Surgery Videos

Authors: Negin Ghamsarian, Mario Taschwer, Doris Putzgruber-Adamitsch, Stephanie Sarny, Yosuf El-Shabrawi, Klaus Schöffmann

Abstract: Semantic segmentation in surgical videos is a prerequisite for a broad range of applications towards improving surgical outcomes and surgical video analysis. However, semantic segmentation in surgical videos involves many challenges. In particular, in cataract surgery, various features of the relevant objects such as blunt edges, color and context variation, reflection, transparency, and motion blur pose a challenge for semantic segmentation. In this paper, we propose a novel convolutional module termed as ReCal module, which can calibrate the feature maps by employing region and channel inter-dependencies and cross-dependencies. This calibration strategy can effectively enhance semantic representation by correlating different representations of the same semantic label, considering a multi-angle local view centering around each pixel. Thus the proposed module can deal with distant visual characteristics of unique objects as well as cross-similarities in the visual characteristics of different objects. Moreover, we propose a novel network architecture based on the proposed module termed as ReCal-Net. Experimental results confirm the superiority of ReCal-Net compared to rival state-of-the-art approaches for all relevant objects in cataract surgery. Moreover, ablation studies reveal the effectiveness of the ReCal module in boosting semantic segmentation accuracy.

Lens Irregularity Detection in Cataract Surgery Videos

Posted on 2021/06/18.

Our work on methods supporting lens irregularity detection has been accepted for publication at MICCAI’21.

Title: LensID: A CNN-RNN-Based Framework towards Lens Irregularity Detection in Cataract Surgery Videos

Authors: Negin Ghamsarian et al.

Abstract: A critical complication after cataract surgery is the dislocation of the lens implant leading to vision deterioration and eye trauma. In order to reduce the risk of this complication, it is vital to discover the risk factors during the surgery. However, studying the relationship between lens dislocation and its suspicious risk factors using numerous videos is a time-extensive procedure. Hence, the surgeons demand an automatic approach to enable a larger-scale and, accordingly, more reliable study. In this paper, we propose a novel framework as the major step towards lens irregularity detection. In particular, we propose (I) an end-to-end recurrent neural network to recognize the lens-implantation phase and (II) a novel semantic segmentation network to segment the lens and pupil after the implantation phase. The phase recognition results reveal the effectiveness of the proposed surgical phase recognition approach. Moreover, the segmentation results confirm the proposed segmentation network’s effectiveness compared to the state-of-the-art rival approaches.

Relevance Detection in Cataract Surgery Videos

Posted on 2020/10/14.

Our paper has been accepted at ICPR 2020.

Title: Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization

Authors: Negin Ghamsarian, Mario Taschwer, Doris Putzgruber-Adamitsch, Stephanie Sarny, Klaus Schoeffmann

Abstract:
In cataract surgery, the operation is performed with the help of a microscope. Since the microscope enables watching real-time surgery by up to two people only, a major part of surgical training is conducted using the recorded videos. To op- timize the training procedure with the video content, the surgeons require an automatic relevance detection approach. In addition to relevance-based retrieval, these results can be further used for skill assessment and irregularity detection in cataract surgery videos. In this paper, a three-module framework is proposed to detect and classify the relevant phase segments in cataract videos. Taking advantage of an idle frame recognition network, the video is divided into idle and action segments. To boost the performance in relevance detection Mask R-CNN is utilized to detect the cornea in each frame where the relevant surgical actions are conducted. The spatio-temporal localized segments containing higher-resolution information about the pupil texture and actions, and complementary temporal information from the same phase are fed into the relevance detection module. This module consists of four parallel recurrent CNNs being responsible to detect four relevant phases that have been defined with medical experts. The results will then be integrated to classify the action phases as irrelevant or one of four relevant phases. Experimental results reveal that the proposed approach outperforms static CNNs and different configurations of feature-based and end-to- end recurrent networks.

Relevance-based Compression of Cataract Surgery Videos

Posted on 2020/08/03.

Our recent work on relevance-based compression of cataract surgery videos has been accepted as a full paper at ACM Multimedia 2020.

Title: Relevance-based Compression of Cataract Surgery Videos Using Convolutional Neural Networks

Authors: Negin Ghamsarian, Hadi Amirpour, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann

Abstract: Recorded cataract surgery videos play a prominent role in training and investigating the surgery, and enhancing the surgical outcomes. Due to storage limitations in hospitals, however, the recorded cataract surgeries are deleted after a short time and this precious source of information cannot be fully utilized. Lowering the quality to reduce the required storage space is not advisable since the degraded visual quality results in the loss of relevant information that limits the usage of these videos. To address this problem, we propose a relevance-based compression technique consisting of two modules: (i) relevance detection, which uses neural networks for semantic segmentation and classification of the videos to detect relevant spatio-temporal information, and (ii) content-adaptive compression, which restricts the amount of distortion applied to the relevant content while allocating less bitrate to irrelevant content. The proposed relevance-based compression framework is implemented considering five scenarios based on the definition of relevant information from the target audience’s perspective. Experimental results demonstrate the capability of the proposed approach in relevance detection. We further show that the proposed approach can achieve high compression efficiency by abstracting substantial redundant information while retaining the high quality of the relevant content.

Keywords: Video Coding, Convolutional Neural Networks, HEVC, ROI Detection, Medical Multimedia.

Tool Segmentation in Cataract Surgery Videos

Posted on 2020/05/20.

Our conference paper on instrument segmentation in cataract surgery videos has been accepted for presentation at the CBMS 2020 conference.

Title: Pixel-Based Tool Segmentation in Cataract Surgery Videos with Mask R-CNN

Authors: Markus Fox, Mario Taschwer, and Klaus Schoeffmann

Abstract: Automatically detecting surgical tools in recorded surgery videos is an important building block of further content-based video analysis. In ophthalmology, the results of such methods can support training and teaching of operation techniques and enable investigation of medical research questions on a dataset of recorded surgery videos. Our work applies a recent deep-learning segmentation method (Mask R-CNN) to localize and segment surgical tools used in ophthalmic cataract surgery. We add ground-truth annotations for multi-class instance segmentation to two existing datasets of cataract surgery videos and make resulting datasets publicly available for research purposes. In the absence of comparable results from literature, we tune and evaluate Mask R-CNN on these datasets for instrument segmentation/localization and achieve promising results (61% mean average precision on 50% intersection over union for instance segmentation, working even better for bounding box detection or binary segmentation), establishing a reasonable baseline for further research. Moreover, we experiment with common data augmentation techniques and analyze the achieved segmentation performance with respect to each class (instrument), providing evidence for future improvements of this approach.

Pupil Segmentation in Cataract Surgery Videos

Posted on 2020/03/10.

Our workshop paper on iris and pupil segmentation in cataract surgery videos has been accepted for presentation at the ISBI 2020 conference.

Title: Pixel-Based Iris and Pupil Segmentation in Cataract Surgery Videos Using Mask R-CNN

Authors: Natalia Sokolova, Mario Taschwer, Klaus Schoeffmann

Abstract: Cataract surgery replaces the eye lens with an artificial one and is one of the most common surgical procedures performed worldwide. These surgeries can be recorded using a microscope camera and resulting videos stored for educational or documentary purposes as well as for automated post-operative analysis for detecting adverse events or complications. As pupil reactions (dilation or constriction) may lead to complications during surgery, automatic localization of pupil and iris in cataract surgery videos is a necessary preprocessing step for automated analysis. The problems of recognition, localization and tracking of eyes in medical images or videos have already been studied in the literature. However, none of these approaches used pixel-based segmentation, which would allow to localize pupil and iris in a sufficiently accurate way for further automated analysis. In this work, we investigate pixel-based pupil and iris segmentation by a region-based convolutional neural network (Mask R-CNN), which has not been applied to this problem before, to the best of our knowledge. We evaluate the performance of Mask R-CNN with different backbone networks for a manually annotated image dataset. Our method achieves at least 80% of Intersection over Union (IoU) for each iris example and at least 85% IoU for each pupil example in the test dataset.

Deblurring Cataract Surgery Videos

Posted on 2020/01/13.

Our recent work on deblurring surgery videos has been accepted at the ISBI 2020 conference.

Title: Deblurring Cataract Surgery Videos Using a Multi-Scale Deconvolutional Neural Network

Authors: Negin Ghamsarian, Klaus Schoeffmann, Mario Taschwer

Abstract: A common quality impairment observed in surgery videos is blur, caused by object motion or a defocused camera. Degraded image quality hampers the progress of machine-learning-based approaches in learning and recognizing semantic information in surgical video frames like instruments, phases, and surgical actions. This problem can be mitigated by automatically deblurring video frames as a preprocessing method for any subsequent video analysis task. In this paper, we propose and evaluate a multi-scale deconvolutional neural network to deblur cataract surgery videos. Experimental results confirm the effectiveness of the proposed approach in terms of the visual quality of frames as well as PSNR improvement.

MMM’20: Evaluating the Generalization Performance of Instrument Classification in Cataract Surgery Videos

Posted on 2019/10/01.

Our paper has been accepted for publication at the MMM 2020 Conference on Multimedia Modeling. The work was conducted in the context of the ongoing OVID project.

Authors: Natalia Sokolova, Klaus Schoeffmann, Mario Taschwer (AAU Klagenfurt); Doris Putzgruber-Adamitsch, Yosuf El-Shabrawi (Klinikum Klagenfurt)

Abstract:
In the field of ophthalmic surgery, many clinicians nowadays record their microscopic procedures with a video camera and use the recorded footage for later purpose, such as forensics, teaching, or training. However, in order to efficiently use the video material after surgery, the video content needs to be analyzed automatically. Important semantic content to be analyzed and indexed in these short videos are operation instruments, since they provide an indication of the corresponding operation phase and surgical action. Related work has already shown that it is possible to accurately detect instruments in cataract surgery videos. However, their underlying dataset (from the CATARACTS challenge) has very good visual quality, which is not reflecting the typical quality of videos acquired in general hospitals. In this paper, we therefore analyze the generalization performance of deep learning models for instrument recognition in terms of dataset change. More precisely, we trained such models as ResNet-50, Inception v3 and NASNet Mobile using a dataset of high visual quality (CATARACT) and test it on another dataset with low visual quality (Cataract-101), and vice versa. Our results show that the generalizability is rather low in general, but clearly worse for the model trained on the high-quality dataset. Another important observation is the fact that the trained models are able to detect similar instruments in the other dataset even if their appearance is different.

URL: https://link.springer.com/chapter/10.1007/978-3-030-37734-2_51

Bibtex:

@InProceedings{Sokolova2020,
   author    = {Sokolova, Natalia and Schoeffmann, Klaus and Taschwer, Mario and Putzgruber-Adamitsch, Doris and El-Shabrawi, Yosuf},
   title     = {Evaluating the Generalization Performance of Instrument Classification in Cataract Surgery Videos},
   booktitle = {MultiMedia Modeling},
   year      = {2020},
   editor    = {Cheng, Wen-Huang and Kim, Junmo and Chu, Wei-Ta and Cui, Peng and Choi, Jung-Woo and Hu, Min-Chun and De Neve, Wesley},
   pages     = {626--636},
   address   = {Cham},
   publisher = {Springer International Publishing},
   doi       = {10.1007/978-3-030-37734-2_51},
   isbn      = {978-3-030-37734-2}
 }

12 3 »Last