Compound Figure Separation at ImageCLEF 2015

Posted on .

We participated in the compound figure separation subtask of the ImageCLEF 2015 medical classification task (group AAUITEC). The paper describing our approach has been accepted for the CLEF 2015 Working Notes. A preprint version is available here.

Update: I presented a poster at CLEF 2015 in Toulouse on September 9, 2015. The paper appeared online in CEUR Workshop Proceedings. Here is the BibTeX citation:

@InProceedings{Taschwer2015,
Title                    = {{AAUITEC} at {ImageCLEF} 2015: Compound Figure Separation},
Author                   = {Taschwer, Mario and Marques, Oge},
Booktitle                = {{CLEF} 2015 Working Notes},
Year                     = {2015},
Month                    = {September},
Series                   = {CEUR Workshop Proceedings, ISSN 1613-0073},
Volume                   = {1391},
Location                 = {Toulouse, France},
Url                      = {http://ceur-ws.org/Vol-1391/25-CR.pdf}
}

Abstract:
Our approach to automatically separating compound figures appearing in biomedical articles is split into two image processing algorithms: one is based on detecting separator edges, and the other tries to identify background bands separating subfigures. Only one algorithm is applied to a given image, according to the prediction of a binary classifier trained to distinguish graphical illustrations from other images in biomedical articles. Our submission to the ImageCLEF 2015 compound figure separation task achieved an accuracy of 49% on the provided test set of about 3400 compound images. This stays clearly behind the best submission of other participants (85% accuracy), but is by an order of magnitude faster than other approaches reported in the literature.

ACM Multimedia 2014 Doctoral Symposium

Posted on .

Update: I had my presentation at ACM Multimedia 2014 on November 5, and received the Best Doctoral Symposium Paper Award. The slides are availabe here.

My PhD proposal has been accepted by the ACM Multimedia 2014 Doctoral Symposium. The paper is available here.

Abstract:
The proposed PhD project addresses the problem of finding descriptions of diseases or patients’ health records that are relevant for a given description of patient’s symptoms, also known as medical case retrieval (MCR). Designing an automatic multimodal MCR system applicable to general medical data sets still presents an open research problem, as indicated by the ImageCLEF 2013 MCR challenge, where the best submitted runs achieved only moderate retrieval performance and used purely textual techniques. This project therefore aims at designing a multimodal MCR model that is capable of achieving a substantially better retrieval performance on the ImageCLEF data set than state-of-the-art techniques. Moreover, the potential of further improvement by leveraging relevance feedback of medical expert users for long-term learning will be investigated.

Call for Master Thesis on Biomedical Concept Detection

Posted on .

Update: The master thesis has been assigned to Florian Winkler as of June 26, 2014.

This is a call for a master thesis on biomedical concept detection.

Abstract:
Detecting biomedical concepts in text documents or queries is a useful tool for improving biomedical information retrieval or navigating biomedical document collections. Known successful techniques for biomedical concept detection rely on natural language processing and/or machine learning and incur a substantial processing overhead at the document indexing stage compared to keyword-only indexing.
The goal of this master thesis is to evaluate a novel efficient concept detection algorithm based on position-dependent keyword matching on a recent public biomedical data set. Concepts are taken from a biomedical thesaurus called Medical Subject Headings (MeSH). Results should be compared to the accuracy achieved by the well-known MetaMap concept detector.

Textual MCR Methods

Posted on .

I recently completed a technical report on textual methods for medical case retrieval. A corresponding poster prepared for an internal ITEC meeting is available here.

Citation:
M. Taschwer. Textual methods for medical case retrieval. Technical Report TR/ITEC/14/2.01, Institute of Information Technology (ITEC), Alpen-Adria-Universität Klagenfurt, Austria, May 2014.

Abstract:
Medical case retrieval (MCR) is information retrieval in a collection of medical case descriptions, where descriptions of patients’ symptoms are used as queries. We apply known text retrieval techniques based on query and document expansion to this problem, and combine them with new algorithms to match queries and documents with Medical Subject Headings (MeSH). We ran comprehensive experiments to evaluate 546 method combinations on the ImageCLEF 2013 MCR dataset. Methods combining MeSH query expansion with pseudo-relevance feedback performed best, delivering retrieval performance comparable to or slightly better than the best MCR run submitted to ImageCLEF 2013.

ImageCLEF 2013 participation

Posted on .

I participated in one of the ImageCLEF 2013 medical tasks by experimenting with text-based retrieval and query expansion using MeSH ontology. The resulting CLEF 2013 working note paper is available here. The achieved performance for medical case retrieval was moderate, but there are additional options for text retrieval to explore before turning to visual retrieval.

I will have a poster presentation (not reviewed) at CLEF 2013 conference in two weeks. Here is the abstract:

Our approach to the ImageCLEF medical case retrieval task consists of text-only retrieval combined with utilizing the Medical Subject Headings (MeSH) ontology. MeSH terms extracted from the query are used for query expansion or query term weighting. MeSH annotations of documents available from PubMed Central are added to the corpus. Retrieval results improve slightly upon full-text retrieval.

The poster is available here.

PhD proposal

Posted on .

I just finished my PhD proposal regarding medical case retrieval (available as PDF). Here is the abstract:

The proposed PhD project addresses the problem of medical case retrieval (MCR), where a medical case is represented by a multimedia document describing a certain disease or a patient’s history. The ImageCLEF evaluation campaign poses a yearly MCR task using a heterogeneous dataset of more than 75,000 medical publications consisting of text and images. The best results achieved by participants of the ImageCLEF MCR task in 2012 are moderate and call for improvement. Interestingly, approaches based on visual retrieval perform significantly worse than text-only retrieval, even if combined with text retrieval. This project therefore aims at designing an MCR model that is able to deliver a substantially better retrieval performance on the ImageCLEF dataset. Moreover, the potential of further improvement by leveraging the feedback of medical expert users for long-term learning will be investigated.