HomeEvents'Problematic' DS AI Seminar on Predicting the development of liver cancer using MRI

'Problematic' DS AI Seminar on Predicting the development of liver cancer using MRI

Join us Thursday 22 February for another "Problematic Seminar". The problem this time, is related to synthetic data: Predicting the development of liver cancer using MRI and also about self-supervised pretraining techniques as a mean to improve weight initialization. PhD’s and PostDocs working on ML are highly encouraged to join, we need you! Register here to join.

"Problematic" DS AI seminars

A seminar with a twist, a seminar with a challenge, a seminar with a quest for help leading to a solution... Are you the colleague who will help forward another colleague in this ML-challenge? 

The challenge

Jakob Nolte, Beyond synthetic data: Predicting the development of liver cancer using MRI

Using MRI screenings, this research aims to predict the development of liver cancer in cirrhosis patients. The data comprises roughly 800 3D-MRI scans from approx. 250 patients who underwent repeated MRI screening (1-15 observations per patient). From the 250 patients, approx. 50 developed liver cancer. In a proof-of-concept study, we extracted a set of handcrafted features –coined radiomic features - from a manually delineated region of interest within the images. When fitting a simple machine learning model to the extracted data, the results were promising, highlighting especially the extreme intensity (e.g., upper/lower 10th percentile) values as indicative of cancer development. However, when instead fitting a neural network to the raw images, results were subpar and this regardless of model architecture, optimizer, and supervised/ semi-supervised learning approaches. We suspect that this is primarily due to MRI’s lack of a standard intensity scale. For example, even two MRIs from the same person acquired with the same scanner do not share the same range of intensity values. As such, the network faces the challenge to maneuver a heterogeneous feature space with limited data.

 Our primary objective for the meeting thus concerns data quality in real-world data sets. That is, how can we limit the heterogeneity between images while still preserving their original distribution? Our secondary objective concerns the initialization of neural networks. It is well established that small scale dataset benefit from pretraining on large dataset. However, pretrained 3D models - let alone 3D models pretrained on medical images - are scarce. Therefore, I would like to discuss self-supervised pretraining techniques as a mean to improve weight initialization.