Authors: Natalia Sokolova, Klaus Schoeffmann, Mario Taschwer (AAU Klagenfurt); Doris Putzgruber-Adamitsch, Yosuf El-Shabrawi (Klinikum Klagenfurt)
In the field of ophthalmic surgery, many clinicians nowadays record their microscopic procedures with a video camera and use the recorded footage for later purpose, such as forensics, teaching, or training. However, in order to efficiently use the video material after surgery, the video content needs to be analyzed automatically. Important semantic content to be analyzed and indexed in these short videos are operation instruments, since they provide an indication of the corresponding operation phase and surgical action. Related work has already shown that it is possible to accurately detect instruments in cataract surgery videos. However, their underlying dataset (from the CATARACTS challenge) has very good visual quality, which is not reflecting the typical quality of videos acquired in general hospitals. In this paper, we therefore analyze the generalization performance of deep learning models for instrument recognition in terms of dataset change. More precisely, we trained such models as ResNet-50, Inception v3 and NASNet Mobile using a dataset of high visual quality (CATARACT) and test it on another dataset with low visual quality (Cataract-101), and vice versa. Our results show that the generalizability is rather low in general, but clearly worse for the model trained on the high-quality dataset. Another important observation is the fact that the trained models are able to detect similar instruments in the other dataset even if their appearance is different.