MediaEval 2018 Medico Task

Posted on .

We participated in the MediaEval 2018 Medico task and recently submitted our working notes paper. This is joint work with Oge Marques (Florida Atlantic University, USA).

Update: Our paper has been accepted and presented at the MediaEval Workshop on Oct 30, 2018. The Workshop proceedings appeared at

Title: Early and Late Fusion of Classifiers for the MediaEval Medico Task

Authors: Mario Taschwer, Manfred Jürgen Primus, Klaus Schoeffmann, Oge Marques

Abstract: We present our results for the MediaEval 2018 Medico
task, achieved with traditional machine learning methods, such as
logistic regression, support vector machines, and random forests.
Before classification, we combine traditional global image features
and CNN-based features (early fusion), and apply soft voting for
combining the output of multiple classifiers (late fusion). Linear
support vector machines turn out to provide both good classification
performance and low run-time complexity for this task.

Paper: [Preprint PDF] [Official Workshop Paper]

Presentation: [Slides PDF]

Bibtex citation:

Title                    = {Early and Late Fusion of Classifiers for the {MediaEval Medico} Task},
Author                   = {Taschwer, Mario and Primus, Manfred J{\"u}rgen and Schoeffmann, Klaus and Marques, Oge},
Booktitle                = {Working Notes Proceedings of the MediaEval 2018 Workshop},
Year                     = {2018},
Editor                   = {M. Larson and P. Arora and C.H. Demarty and M. Riegler and B. Bischke and E. Dellandrea and M. Lux and A. Porter and G.J.F. Jones},
Series                   = {CEUR Workshop Proceedings},
Volume                   = {2283},
Url                      = {}


OVID – Relevance Detection in Ophthalmic Surgery Videos

Posted on .

Our FWF research grant proposal OVID (Relevance Detection in Ophthalmic Surgery Videos) has recently been approved! The research project will start in fall 2018 and last for 3 years (3 PhD positions, 1 student assistant – applications are welcome!). The project will be conducted in cooperation with Klinikum Klagenfurt.

Authors: Klaus Schoeffmann, Mario Taschwer, Doris Putzgruber-Adamitsch, Stephanie Sarny, Yosuf El-Shabrawi, Laszlo Böszörmenyi


In this project, we want to investigate fundamental research questions in the field of postoperative analysis of ophthalmic surgery (i.e. concerned with the human eye) videos (OSVs). More precisely, three research objectives are covered: (1) Classification of OSV segments – is it possible to improve upon the state-of-the-art in automatic content classification and content segmentation of OSVs, focusing on regular and irregular operation phases? (2) Relevance prediction and relevance-driven compression – how accurately can the relevance of OSV segments be determined automatically for educational, scientific, and documentary purposes (as medical experts would do), and what compression efficiency can be achieved for OSVs when considering relevance as an additional modality? (3) Analysis of common irregularities in OSVs for medical research – we address three quantitative medical research questions related to cataract surgeries, such as: is there a statistically significant difference in duration or complication rate between cataract surgeries showing intraoperative pupil reactions and those showing no such pupil reactions?

We plan to perform these investigations using data acquisition, data modelling, video content analysis, statistical analysis, and state-of-the-art machine learning methods – such as content classifiers based on deep learning. The proposed methods will be evaluated on annotated video datasets (“ground truth”) created by medical field experts during the project.

Beyond developing novel methods for solving the abovementioned research problems, project results are expected to have innovative effects in the emerging interdisciplinary field of automatic video-based analysis of ophthalmic surgeries. In particular, research results of this project will enable efficient permanent video documentation of ophthalmic surgeries, allowing to create OSV datasets relevant for medical education, training, and research. Moreover, archives of relevant OSVs will enable novel postoperative analysis methods for medical research questions – such as causes for irregular operation phases, for example.

The research project will be a cooperation between computer scientists of AAU Klagenfurt (conducted by Prof. Klaus Schöffmann, supported and advised by Dr. Mario Taschwer and Prof. Laszlo Böszörmenyi) and ophthalmic surgeons and researchers at Klinikum Klagenfurt (Dr. Doris Putzgruber-Adamitsch, Dr. Stephanie Sarny, Prof. Yosuf El-Shabrawi).

Video Dataset of 101 Cataract Surgeries

Posted on .

Our paper “Cataract-101 – Video Dataset of 101 Cataract Surgeries” has been accepted for poster presentation at MMSys 2018 conference (Open DataSet & Software Track).

Authors: Klaus Schoeffmann, Mario Taschwer, Stephanie Sarny, Bernd Münzer, Jürgen Primus, Doris Putzgruber

Cataract surgery is one of the most frequently performed microscopic surgeries in the field of ophthalmology. The goal behind this kind of surgery is to replace the human eye lense with an artificial one, an intervention that is often required due to aging. The entire surgery is performed under microscopy, but co-mounted cameras allow to record and archive the procedure. Currently, the recorded videos are used in a postoperative manner for documentation and training. An additional benefit of recording cataract videos is that they enable video analytics (i.e., manual and/or automatic video content analysis) to investigate medically relevant research questions (e.g., the cause of complications). This, however, necessitates a medical multimedia information system trained and evaluated on existing data, which is currently not publicly available. In this work we provide a public video dataset of 101 cataract surgeries that were performed by four different surgeons over a period of 9 months. These surgeons are grouped into moderately experienced and highly experienced surgeons (assistant vs. senior physicians), providing the basis for experience-based video analytics. All videos have been annotated with quasi-standardized operation phases by a senior ophthalmic surgeon.



[Preprint PDF] [Poster]

Erratum: Table 1 of the published paper contains a systematic error in the row titled “Avg. Length / Op”. Numbers have been corrected in the poster.

Classification of Operation Phases in Cataract Surgery Videos

Posted on .

Our paper has been accepted for publication and oral presentation at MMM 2018 conference:

Title: Frame-Based Classification of Operation Phases in Cataract Surgery Videos

Authors: Manfred Jürgen Primus, Doris Putzgruber-Adamitsch, Mario Taschwer, Bernd Muenzer, Yosuf El-Shabrawi, Laszlo Boeszoermenyi and Klaus Schöffmann

Abstract: Cataract surgeries are frequently performed to correct a lens opacification of the human eye, which usually appears in the course of aging. These surgeries are conducted with the help of a microscope and are typically recorded on video for later inspection and educational purposes. However, post-hoc visual analysis of video recordings is cumbersome and time-consuming for surgeons if there is no navigation support, such as bookmarks to specific operation phases. To prepare the way for an automatic detection of operation phases in cataract surgery videos, we investigate the effectiveness of a deep convolutional neural network (CNN) to automatically assign video frames to operation phases, which can be regarded as a single-label multi-class classification problem. In absence of public datasets of cataract surgery videos, we provide a dataset of 21 videos of standardized cataract surgeries and use it to train and evaluate our CNN classifier. Experimental results display a mean F1-score of about 68% for frame-based operation phase classification, which can be further improved to 75% when considering temporal information of video frames in the CNN architecture.


Preprint PDF


  Title                    = {Frame-Based Classification of Operation Phases in Cataract Surgery Videos},
  Author                   = {Primus, Manfred J{\"u}ergen and Putzgruber-Adamitsch, Doris and Taschwer, Mario and M{\"u}nzer, Bernd and El-Shabrawi, Yosuf and B{\"o}sz{\"o}rmenyi, Laszlo and Schoeffmann, Klaus},
  Booktitle                = {MultiMedia Modeling},
  Year                     = {2018},

  Address                  = {Cham},
  Editor                   = {Schoeffmann, Klaus and Chalidabhongse, Thanarat H. and Ngo, Chong Wah and Aramvith, Supavadee and O'Connor, Noel E. and Ho, Yo-Sung and Gabbouj, Moncef and Elgammal, Ahmed},
  Pages                    = {241--253},
  Publisher                = {Springer International Publishing},
  ISBN                     = {978-3-319-73603-7}

PhD thesis submitted

Posted on .

My PhD thesis has been submitted on April 6 and graded as excellent (grade 1).

Title of thesis: Concept-Based and Multimodal Methods for Medical Case Retrieval

Medical case retrieval (MCR) is defined as a multimedia retrieval problem, where the document collection consists of medical case descriptions that pertain to particular diseases, patients’ histories, or other entities of biomedical knowledge. Case descriptions are multimedia documents containing textual and visual modalities (images). A query may consist of a textual description of patient’s symptoms and related diagnostic images. This thesis proposes and evaluates methods that aim at improving MCR effectiveness over the baseline of fulltext retrieval. We hypothesize that this objective can be achieved by utilizing controlled vocabularies of biomedical concepts for query expansion and concept-based retrieval. The latter represents case descriptions and queries as vectors of biomedical concepts, which may be generated automatically from textual and/or visual modalities by concept mapping algorithms. We propose a multimodal retrieval framework for MCR by late fusion of text-based retrieval (including query expansion) and concept-based retrieval and show that retrieval effectiveness can be improved by 49% using linear fusion of practical component retrieval systems. The potential of further improvement is experimentally estimated as a 166% increase of effectiveness over fulltext retrieval using query-adaptive fusion of ideal component retrieval systems. Additional contributions of this thesis include the proposal and comparative evaluation of methods for concept mapping, query and document expansion, and automatic classification and separation of compound figures found in case descriptions.

Keywords: multimedia information retrieval / biomedical information retrieval / biomedical concept detection / information fusion / image processing

Bibtex citation:

Title                    = {Concept-Based and Multimodal Methods for Medical Case Retrieval},
Author                   = {Taschwer, Mario W.},
School                   = {Alpen-Adria-Universit{\"a}t Klagenfurt},
Year                     = {2017},
Address                  = {Austria},
Month                    = mar,
Url                      = {}

Compound Figure Separation Journal Paper

Posted on .

We submitted extended work on compound figure separation to the MTAP Journal.

Update: The revised version of our paper has been accepted for publication on Dec 1, 2016 and published online on Dec 29, 2016. The printed version appeared in January, 2018.

Title: Automatic Separation of Compound Figures in Scientific Articles

Content-based analysis and retrieval of digital images found in scientific articles is often hindered by images consisting of multiple subfigures (compound figures). We address this problem by proposing a method (ComFig) to automatically classify and separate compound figures, which consists of two main steps: (i) a supervised compound figure classifier (ComFig classifier) discriminates between compound and non-compound figures using task-specific image features; and (ii) an image processing algorithm is applied to predicted compound images to perform compound figure separation (ComFig separation). The proposed ComFig classifier is shown to achieve state-of-the-art classification performance on a published dataset. Our ComFig separation algorithm shows superior separation accuracy on two different datasets compared to other known automatic approaches. Finally, we propose a method to evaluate the effectiveness of the ComFig chain combining classifier and separation algorithm, and use it to optimize the misclassification loss of the ComFig classifier for maximal effectiveness in the chain.


Bibtex citation:

  Title                    = {Automatic separation of compound figures in scientific articles},
  Author                   = {Taschwer, Mario and Marques, Oge},
  Journal                  = {Multimedia Tools and Applications},
  Year                     = {2018},
  Month                    = {Jan},
  Number                   = {1},
  Pages                    = {519--548},
  Volume                   = {77},
  Doi                      = {10.1007/s11042-016-4237-x},
  ISSN                     = {1573-7721}

Compound Figure Separation at MMM 2016

Posted on .

Our extended work on compound figure separation has been accepted as a regular paper at MMM 2016 conference. A preprint is available here.

Update: Slides of my presentation on Jan 5, 2016 at MMM conference. Official link to published paper. BibTeX citation:

Title                    = {Compound Figure Separation Combining Edge and Band Separator Detection},
Author                   = {Taschwer, Mario and Marques, Oge},
Booktitle                = {MultiMedia Modeling},
Publisher                = {Springer International Publishing},
Year                     = {2016},
Editor                   = {Tian, Qi and Sebe, Nicu and Qi, Guo-Jun and Huet, Benoit and Hong, Richang and Liu, Xueliang},
Pages                    = {162--173},
Series                   = {Lecture Notes in Computer Science},
Volume                   = {9516},

Doi                      = {10.1007/978-3-319-27671-7_14},
ISBN                     = {978-3-319-27670-0}

We propose an image processing algorithm to automatically separate compound figures appearing in scientific articles. We classify compound images into two classes and apply different algorithms for detecting vertical and horizontal separators to each class: the edge-based algorithm aims at detecting visible edges between subfigures, whereas the band-based algorithm tries to detect whitespace separating subfigures (separator bands). The proposed algorithm has been evaluated on two recent datasets for compound figure separation (CFS) in the biomedical domain and achieves a slightly better detection accuracy than state-of-the-art approaches. Conducted experiments investigate CFS effectiveness and classification accuracy of various classifier implementations.

Compound Figure Separation at ImageCLEF 2015

Posted on .

We participated in the compound figure separation subtask of the ImageCLEF 2015 medical classification task (group AAUITEC). The paper describing our approach has been accepted for the CLEF 2015 Working Notes. A preprint version is available here.

Update: I presented a poster at CLEF 2015 in Toulouse on September 9, 2015. The paper appeared online in CEUR Workshop Proceedings. Here is the BibTeX citation:

Title                    = {{AAUITEC} at {ImageCLEF} 2015: Compound Figure Separation},
Author                   = {Taschwer, Mario and Marques, Oge},
Booktitle                = {{CLEF} 2015 Working Notes},
Year                     = {2015},
Month                    = {September},
Series                   = {CEUR Workshop Proceedings, ISSN 1613-0073},
Volume                   = {1391},
Location                 = {Toulouse, France},
Url                      = {}

Our approach to automatically separating compound figures appearing in biomedical articles is split into two image processing algorithms: one is based on detecting separator edges, and the other tries to identify background bands separating subfigures. Only one algorithm is applied to a given image, according to the prediction of a binary classifier trained to distinguish graphical illustrations from other images in biomedical articles. Our submission to the ImageCLEF 2015 compound figure separation task achieved an accuracy of 49% on the provided test set of about 3400 compound images. This stays clearly behind the best submission of other participants (85% accuracy), but is by an order of magnitude faster than other approaches reported in the literature.

ACM Multimedia 2014 Doctoral Symposium

Posted on .

Update: I had my presentation at ACM Multimedia 2014 on November 5, and received the Best Doctoral Symposium Paper Award. The slides are availabe here.

My PhD proposal has been accepted by the ACM Multimedia 2014 Doctoral Symposium. The paper is available here.

The proposed PhD project addresses the problem of finding descriptions of diseases or patients’ health records that are relevant for a given description of patient’s symptoms, also known as medical case retrieval (MCR). Designing an automatic multimodal MCR system applicable to general medical data sets still presents an open research problem, as indicated by the ImageCLEF 2013 MCR challenge, where the best submitted runs achieved only moderate retrieval performance and used purely textual techniques. This project therefore aims at designing a multimodal MCR model that is capable of achieving a substantially better retrieval performance on the ImageCLEF data set than state-of-the-art techniques. Moreover, the potential of further improvement by leveraging relevance feedback of medical expert users for long-term learning will be investigated.