Orchestrating machine learning development in complex medical imaging environments

Even though machine learning algorithms (especially deep convolutional neural networks) have shown promising results in many medical image analysis tasks (such as segmentation, detection and location) there are still a lot of challenges that need to be solved before these advances can be used in clinical practice.

One hurdle is robustness in accuracy. An algorithm trained on data from hospital A is likely not to performance as well on data from hospital B. In many cases, this drop in accuracy is so significant that the algorithm cannot be used.

To achieve more generalizable results, typically more diverse training data are needed. Getting enough good-quality data is (as in many fields) a challenge. Therefore, our research will aim to give a more in-depth understanding of the training data space, in order to control it better and improve generalizability, even with little data. This will be investigated through data augmentation, data simulation/generation though generative adversarial networks (GANs), among other approaches.

Another challenge is to bridge the gap from developing one algorithm in a sandbox environment, to develop and maintain a wide range of machine learning solutions. A way to efficiently reuse components from the machine learning pipeline is necessary. Modularization and standardization of components will be investigated, with particular focus on training data and trained models, where methods developed related to training data space can be put to further use.

Visualization of H&E stained whole-slide images differ when originating from different medical centers. The images show clusters of image patches originating from five different centers, which ideally should be inseparable.


WASP research at MIT