ONS data shows that in the middle of COVID only one disease caused higher mortality in the elderly, and that disease was Alzheimer’s. With health data transforming how researchers tackle pressing public health challenges, platforms for data sharing are democratising how researchers access data and speed disease prevention and treatment.

By Professor John Gallacher

Like any so-called ‘black swan’ event, COVID19 has exposed the points of frailty in healthcare systems. It has, at little notice, created demand for many-fold expansion of some of their parts (for example, invasive care units) while practically shutting down others, acceleration of the vaccine development process to timelines beyond the time-frames associated with approval of medicines, the need to offer reliable testing at huge scale. Perhaps the most controversial aspect of COVID has been ‘disease modelling’ (the prediction of what will happen to the disease under different scenarios) with governments around the world basing policy on diverse models. This highlights the need for agile and flexible global data access.

COVID apart, as emerging research questions become more complex, the need for infrastructure to enable agile and flexible data access becomes acute. A good example of such a challenge is the recent focus on ‘multi-morbidity’ which explores the way that a number of different diseases affect each other – this is an increasingly recognised problem in an aging populations. As datasets become larger, repeated data transfer becomes less feasible and while the risks of a security breach increase. As studies increasingly look to replicate analyses and compare evidence across studies to answer complex questions, the ability to do this rigorously and rapidly becomes a priority.

To meet this challenge the Medical Research Council funded the Dementias Platform UK (DPUK) Data Portal1 to provide a one-stop-shop for accessing population-based research data. The Data Portal is a collaboration with cohort research teams who wish to make their data more widely available, and is a unique resource. Over 40 studies, involving 3.4m volunteers are available for discovery and translational science.

In exploiting this opportunity, protecting the identity of the millions of volunteers whose generosity created the resource, is paramount. To achieve this the Data Portal uses a secure data repository from which data cannot be removed. This ensures all analyses are conducted to the highest security and governance standards. The data repository approach is increasingly popular as it promotes rigorous science through improved access, accessibility, replication, and efficiency.

Emerging research questions frequently require access to data from multiple studies, each contributing part of the jigsaw. For example, some studies are strong on genetics, some on imaging, whilst others are closely linked to NHS data. To bring these studies together under one roof enables evidence from multiple independent datasets to be combined more easily.

Bringing studies together allows data to be made more accessible. Data collected over many years by different teams comes in all shapes and sizes, but when brought together it can be re-structured so that it all fits together. This makes it easier for scientists to understand.

By holding data and results centrally, scientists can more easily and more precisely replicate findings. This increases confidence in the findings that tie in with previous research and raises questions over those that don’t.

The repository approach increases efficiency. By streamlining procedures, it reduces the time required for data access. By bringing scientists to the data, it removes the need for repeated data transfer. It also democratises scientific opportunity through allowing researchers access on the Data Portal to all state-of-the-art software for data analysis at no cost. Whether you live in Manchester or Malawi, if you have a great idea, you can test it on some of the world’s best data (even in lockdown).

In 18 months of operation, users from 16 countries representing 72 academic, commercial and governmental institutions have made around 600 data access applications with a median response time of 28 days. Although developed to accelerate dementia research, due its diversity of population studies, the Data Portal is a generic solution that can be used to study many common diseases. To learn more, download our reference paper (EJE paper e-ref).

