Optimising the messiness of real world data at scale

Optimising the messiness of real world data at scale
Illustration of large mass of people in perspective. Credit: Shutterstock

Professor Chris Holmes is a senior member of the Computational Statistics and Machine Learning group at the Department of Statistics, University of Oxford. He is the first person to hold a joint Chair between the Mathematical, Physical and Life Sciences Division and the Medical Sciences Division. He is also Scientific Director for Health at the Alan Turing Institute in London, a role which mirrors his role in Oxford on a national scale. Talking to Ruth Abrahams from his office, designed on clean lines and large windows, he reveals how machine learning, AI and health are hitting a sweet spot.

Why is it important to link the Department of Statistics and the Medical Sciences Division?

I think it’s really key. It’s really key for solving the right problems. It’s absolutely essential that if researchers want to have impact they have to be working at the coalface. They have to be engaged with researchers who understand the complexities and the messiness of real world data. 

Increasingly we’re seeing the use of wearables and sensor information. And especially in certain disease areas it offers a unique information source. In areas such as dementia using sensors and wearables gives us much finer resolution of information than you’d be able to capture by patients or people visiting a clinic or performing tests.

Is AI and machine learning key for the future of personalised medicine?

Everyone’s aware that the world is changing very rapidly. We’re increasingly seeing multimodal data, which is where we’re actually trying to combine information from very different, heterogeneous sources. We’re now really starting to integrate genetics with medical imaging and with electronic health records, and trying to bring together all the information sources to improve clinical decision-making or scientific understanding. The tools that we develop here in the group within the Department of Statistics as well as the group within the Big Data Institute (BDI) are more and more focused on how we can combine heterogeneous information sources at scale.

What are you most excited about for your field of work in terms of health sciences?

Population data sets at scale. There’s increasing recognition of the opportunities around multimodal measurement on a population: things like UK Biobank which has a cohort of half a million people. We’re working with NHS digital who are really starting to pull together patient banks of information. To be able to maximise the opportunities of machine learning and AI you really need data at scale and we’re starting to collect that. There’s going to be huge benefits for patients and for medical understanding that arises from that.

Is the NHS an ideal partner? 

The fact that in the UK we have a cradle to grave single health provider and a genetically diverse population is a huge potential learning base for us and an opportunity to really improve medical science. This is now increasingly being recognised. 

The NHS is looking at how they can operationalise their data for patient good. Clearly that in itself brings questions of data governance and the value for that data, but it’s a real opportunity and there’s some really promising signs.

There’s a spectrum of areas where AI’s going to make impacts in health, from very early stage molecular medicine through to analysis of clinical trials and drug discovery, right onto the operational side and the delivery of patient care. 

How does your group choose which health problems to work on?

There’s lots of interesting questions out there. But the first ones that we ask ourselves are: ‘Is the data research-ready? Is the data available that’s going to support an evidence-based analysis and be able to support the type of machine learning methods that we would wish to apply to them?’

It’s problem-focused. The problems dictate the methods.

So you adapt your methods for each problem?

Absolutely. We view machine learning and AI as a toolbox and we take that to the problems that we try to address. We look to tailor and apply the most relevant methods. 

Do you think that incorporating machine learning into diagnosis, prognosis and treatment plans will become the norm in the future? What sort of timescales are we looking at?

It’s an incredible time. It’s this combining of worlds: capturing data at scale, having the new machine learning, AI algorithms and the computational resources, and being able to analyse those types of data and you put those combinations of data together. It’s an incredibly powerful combination.

What do you think is the best way to ensure that data is protected?

Ethics and data governance are critical front and centre. The way we ensure the highest levels of data governance and ethics around algorithms is very context dependent.

What is the overarching vision or goal that you are working towards in terms of statistical machine learning and healthcare?

Keeping the focus on the problem. Making sure at all times that we are working towards either a scientific question, helping understand a scientific problem or delivering patient benefit. We can then use those to shine a spotlight on where we need methodological innovation as well as new theory to support new methods. For me that’s a really exciting part of being in Oxford and in the Statistics Department and the Medical School. We use our collaborations with clinicians and medical scientists to develop novel machine learning algorithms and from that new theoretical understanding of where things will work and where things won’t.

What’s special about Oxford for those collaborations?

One obvious answer is the outstanding researchers. We have real strength in probabilistic modelling, what we would call statistical machine learning and Bayesian statistics. So it’s having that strength there as well as the strength in the Medical School. 

In the Department of Statistics and Department of Medicine we have two of the leading departments worldwide and so it’s a fantastic marriage to have those two partners. 

We’re also blessed with fantastic graduate students. We have a really great CDT – Centre for Doctoral Training – so we get some really, really strong graduate students. 

What are your plans for next year?

We’re starting big collaborations. One of them is with Novartis, one of the world’s leading pharmacological companies. So that’s with myself and Gil McVean, Director of the BDI.

We’re starting on multiple sclerosis and rheumatoid arthritis of the spine. Roughly speaking Novartis is putting all the data they’ve ever collected into Oxford and we’re going to be applying AI and machine learning methods to better understand disease etiology, progression as well as clinical interventions. We’ve got all the medical imaging they have, all the clinical trials data they have. So that’s really exciting.