Pupil Segmentation in Cataract Surgery Videos

Posted on .

Our workshop paper on iris and pupil segmentation in cataract surgery videos has been accepted for presentation at the ISBI 2020 conference.

Title: Pixel-Based Iris and Pupil Segmentation in Cataract Surgery Videos Using Mask R-CNN

Authors: Natalia Sokolova, Mario Taschwer, Klaus Schoeffmann

Abstract: Cataract surgery replaces the eye lens with an artificial one and is one of the most common surgical procedures performed worldwide. These surgeries can be recorded using a microscope camera and resulting videos stored for educational or documentary purposes as well as for automated post-operative analysis for detecting adverse events or complications. As pupil reactions (dilation or constriction) may lead to complications during surgery, automatic localization of pupil and iris in cataract surgery videos is a necessary preprocessing step for automated analysis. The problems of recognition, localization and tracking of eyes in medical images or videos have already been studied in the literature. However, none of these approaches used pixel-based segmentation, which would allow to localize pupil and iris in a sufficiently accurate way for further automated analysis. In this work, we investigate pixel-based pupil and iris segmentation by a region-based convolutional neural network (Mask R-CNN), which has not been applied to this problem before, to the best of our knowledge. We evaluate the performance of Mask R-CNN with different backbone networks for a manually annotated image dataset. Our method achieves at least 80% of Intersection over Union (IoU) for each iris example and at least 85% IoU for each pupil example in the test dataset.

Deblurring Cataract Surgery Videos

Posted on .

Our recent work on deblurring surgery videos has been accepted at the ISBI 2020 conference.

Title: Deblurring Cataract Surgery Videos Using a Multi-Scale Deconvolutional Neural Network

Authors: Negin Ghamsarian, Klaus Schoeffmann, Mario Taschwer

Abstract: A common quality impairment observed in surgery videos is blur, caused by object motion or a defocused camera. Degraded image quality hampers the progress of machine-learning-based approaches in learning and recognizing semantic information in surgical video frames like instruments, phases, and surgical actions. This problem can be mitigated by automatically deblurring video frames as a preprocessing method for any subsequent video analysis task. In this paper, we propose and evaluate a multi-scale deconvolutional neural network to deblur cataract surgery videos. Experimental results confirm the effectiveness of the proposed approach in terms of the visual quality of frames as well as PSNR improvement.

MMM’20: Evaluating the Generalization Performance of Instrument Classification in Cataract Surgery Videos

Posted on .

Our paper has been accepted for publication at the MMM 2020 Conference on Multimedia Modeling. The work was conducted in the context of the ongoing OVID project.

Authors: Natalia Sokolova, Klaus Schoeffmann, Mario Taschwer (AAU Klagenfurt); Doris Putzgruber-Adamitsch, Yosuf El-Shabrawi (Klinikum Klagenfurt)

In the field of ophthalmic surgery, many clinicians nowadays record their microscopic procedures with a video camera and use the recorded footage for later purpose, such as forensics, teaching, or training. However, in order to efficiently use the video material after surgery, the video content needs to be analyzed automatically. Important semantic content to be analyzed and indexed in these short videos are operation instruments, since they provide an indication of the corresponding operation phase and surgical action. Related work has already shown that it is possible to accurately detect instruments in cataract surgery videos. However, their underlying dataset (from the CATARACTS challenge) has very good visual quality, which is not reflecting the typical quality of videos acquired in general hospitals. In this paper, we therefore analyze the generalization performance of deep learning models for instrument recognition in terms of dataset change. More precisely, we trained such models as ResNet-50, Inception v3 and NASNet Mobile using a dataset of high visual quality (CATARACT) and test it on another dataset with low visual quality (Cataract-101), and vice versa. Our results show that the generalizability is rather low in general, but clearly worse for the model trained on the high-quality dataset. Another important observation is the fact that the trained models are able to detect similar instruments in the other dataset even if their appearance is different.

URL: https://link.springer.com/chapter/10.1007/978-3-030-37734-2_51


author = {Sokolova, Natalia and Schoeffmann, Klaus and Taschwer, Mario and Putzgruber-Adamitsch, Doris and El-Shabrawi, Yosuf},
title = {Evaluating the Generalization Performance of Instrument Classification in Cataract Surgery Videos},
booktitle = {MultiMedia Modeling},
year = {2020},
editor = {Cheng, Wen-Huang and Kim, Junmo and Chu, Wei-Ta and Cui, Peng and Choi, Jung-Woo and Hu, Min-Chun and De Neve, Wesley},
pages = {626--636},
address = {Cham},
publisher = {Springer International Publishing},
doi = {10.1007/978-3-030-37734-2_51},
isbn = {978-3-030-37734-2}

MediaEval 2018 Medico Task

Posted on .

We participated in the MediaEval 2018 Medico task and recently submitted our working notes paper. This is joint work with Oge Marques (Florida Atlantic University, USA).

Update: Our paper has been accepted and presented at the MediaEval Workshop on Oct 30, 2018. The Workshop proceedings appeared at CEUR-WS.org.

Title: Early and Late Fusion of Classifiers for the MediaEval Medico Task

Authors: Mario Taschwer, Manfred Jürgen Primus, Klaus Schoeffmann, Oge Marques

Abstract: We present our results for the MediaEval 2018 Medico
task, achieved with traditional machine learning methods, such as
logistic regression, support vector machines, and random forests.
Before classification, we combine traditional global image features
and CNN-based features (early fusion), and apply soft voting for
combining the output of multiple classifiers (late fusion). Linear
support vector machines turn out to provide both good classification
performance and low run-time complexity for this task.

Paper: [Preprint PDF] [Official Workshop Paper]

Presentation: [Slides PDF]

Bibtex citation:

Title                    = {Early and Late Fusion of Classifiers for the {MediaEval Medico} Task},
Author                   = {Taschwer, Mario and Primus, Manfred J{\"u}rgen and Schoeffmann, Klaus and Marques, Oge},
Booktitle                = {Working Notes Proceedings of the MediaEval 2018 Workshop},
Year                     = {2018},
Editor                   = {M. Larson and P. Arora and C.H. Demarty and M. Riegler and B. Bischke and E. Dellandrea and M. Lux and A. Porter and G.J.F. Jones},
Series                   = {CEUR Workshop Proceedings},
Volume                   = {2283},
Url                      = {http://ceur-ws.org/Vol-2283/MediaEval_18_paper_23.pdf}


Course in winter term 2018/19

Posted on .

In the upcoming winter term (starting in October 2018), I’ll give a single lab course:

621.703 Computer Organization (PR Rechnerorganisation)

Information about the course (modalities, grading, course material) will be provided in non-public Moodle, accessible by attendees after the first class meeting.

OVID – Relevance Detection in Ophthalmic Surgery Videos

Posted on .

Our FWF research grant proposal OVID (Relevance Detection in Ophthalmic Surgery Videos) has recently been approved! The research project will start in fall 2018 and last for 3 years (3 PhD positions, 1 student assistant – applications are welcome!). The project will be conducted in cooperation with Klinikum Klagenfurt.

Authors: Klaus Schoeffmann, Mario Taschwer, Doris Putzgruber-Adamitsch, Stephanie Sarny, Yosuf El-Shabrawi, Laszlo Böszörmenyi


In this project, we want to investigate fundamental research questions in the field of postoperative analysis of ophthalmic surgery (i.e. concerned with the human eye) videos (OSVs). More precisely, three research objectives are covered: (1) Classification of OSV segments – is it possible to improve upon the state-of-the-art in automatic content classification and content segmentation of OSVs, focusing on regular and irregular operation phases? (2) Relevance prediction and relevance-driven compression – how accurately can the relevance of OSV segments be determined automatically for educational, scientific, and documentary purposes (as medical experts would do), and what compression efficiency can be achieved for OSVs when considering relevance as an additional modality? (3) Analysis of common irregularities in OSVs for medical research – we address three quantitative medical research questions related to cataract surgeries, such as: is there a statistically significant difference in duration or complication rate between cataract surgeries showing intraoperative pupil reactions and those showing no such pupil reactions?

We plan to perform these investigations using data acquisition, data modelling, video content analysis, statistical analysis, and state-of-the-art machine learning methods – such as content classifiers based on deep learning. The proposed methods will be evaluated on annotated video datasets (“ground truth”) created by medical field experts during the project.

Beyond developing novel methods for solving the abovementioned research problems, project results are expected to have innovative effects in the emerging interdisciplinary field of automatic video-based analysis of ophthalmic surgeries. In particular, research results of this project will enable efficient permanent video documentation of ophthalmic surgeries, allowing to create OSV datasets relevant for medical education, training, and research. Moreover, archives of relevant OSVs will enable novel postoperative analysis methods for medical research questions – such as causes for irregular operation phases, for example.

The research project will be a cooperation between computer scientists of AAU Klagenfurt (conducted by Prof. Klaus Schöffmann, supported and advised by Dr. Mario Taschwer and Prof. Laszlo Böszörmenyi) and ophthalmic surgeons and researchers at Klinikum Klagenfurt (Dr. Doris Putzgruber-Adamitsch, Dr. Stephanie Sarny, Prof. Yosuf El-Shabrawi).

Video Dataset of 101 Cataract Surgeries

Posted on .

Our paper “Cataract-101 – Video Dataset of 101 Cataract Surgeries” has been accepted for poster presentation at MMSys 2018 conference (Open DataSet & Software Track).

Authors: Klaus Schoeffmann, Mario Taschwer, Stephanie Sarny, Bernd Münzer, Jürgen Primus, Doris Putzgruber

Cataract surgery is one of the most frequently performed microscopic surgeries in the field of ophthalmology. The goal behind this kind of surgery is to replace the human eye lense with an artificial one, an intervention that is often required due to aging. The entire surgery is performed under microscopy, but co-mounted cameras allow to record and archive the procedure. Currently, the recorded videos are used in a postoperative manner for documentation and training. An additional benefit of recording cataract videos is that they enable video analytics (i.e., manual and/or automatic video content analysis) to investigate medically relevant research questions (e.g., the cause of complications). This, however, necessitates a medical multimedia information system trained and evaluated on existing data, which is currently not publicly available. In this work we provide a public video dataset of 101 cataract surgeries that were performed by four different surgeons over a period of 9 months. These surgeons are grouped into moderately experienced and highly experienced surgeons (assistant vs. senior physicians), providing the basis for experience-based video analytics. All videos have been annotated with quasi-standardized operation phases by a senior ophthalmic surgeon.

Dataset: http://www.itec.aau.at/ftp/datasets/ovid/cat-101/

DOI: https://doi.org/10.1145/3204949.3208137

[Preprint PDF] [Poster]

Erratum: Table 1 of the published paper contains a systematic error in the row titled “Avg. Length / Op”. Numbers have been corrected in the poster.

Classification of Operation Phases in Cataract Surgery Videos

Posted on .

Our paper has been accepted for publication and oral presentation at MMM 2018 conference:

Title: Frame-Based Classification of Operation Phases in Cataract Surgery Videos

Authors: Manfred Jürgen Primus, Doris Putzgruber-Adamitsch, Mario Taschwer, Bernd Muenzer, Yosuf El-Shabrawi, Laszlo Boeszoermenyi and Klaus Schöffmann

Abstract: Cataract surgeries are frequently performed to correct a lens opacification of the human eye, which usually appears in the course of aging. These surgeries are conducted with the help of a microscope and are typically recorded on video for later inspection and educational purposes. However, post-hoc visual analysis of video recordings is cumbersome and time-consuming for surgeons if there is no navigation support, such as bookmarks to specific operation phases. To prepare the way for an automatic detection of operation phases in cataract surgery videos, we investigate the effectiveness of a deep convolutional neural network (CNN) to automatically assign video frames to operation phases, which can be regarded as a single-label multi-class classification problem. In absence of public datasets of cataract surgery videos, we provide a dataset of 21 videos of standardized cataract surgeries and use it to train and evaluate our CNN classifier. Experimental results display a mean F1-score of about 68% for frame-based operation phase classification, which can be further improved to 75% when considering temporal information of video frames in the CNN architecture.

Dataset: http://www.itec.aau.at/ftp/datasets/ovid/cat-21/

Preprint PDF


  Title                    = {Frame-Based Classification of Operation Phases in Cataract Surgery Videos},
  Author                   = {Primus, Manfred J{\"u}ergen and Putzgruber-Adamitsch, Doris and Taschwer, Mario and M{\"u}nzer, Bernd and El-Shabrawi, Yosuf and B{\"o}sz{\"o}rmenyi, Laszlo and Schoeffmann, Klaus},
  Booktitle                = {MultiMedia Modeling},
  Year                     = {2018},

  Address                  = {Cham},
  Editor                   = {Schoeffmann, Klaus and Chalidabhongse, Thanarat H. and Ngo, Chong Wah and Aramvith, Supavadee and O'Connor, Noel E. and Ho, Yo-Sung and Gabbouj, Moncef and Elgammal, Ahmed},
  Pages                    = {241--253},
  Publisher                = {Springer International Publishing},
  ISBN                     = {978-3-319-73603-7}