We live in an era of big data. The generation of terabytes’ worth of data is now ubiquitous in the modern world. In many scientific disciplines, the ability to cheaply, efficiently and rapidly record data allows experiments themselves to become a sophisticated acquisition exercise.
Science, namely the construction of deep understanding from observations of the world around us, can then be performed in the data. For many years this has meant that teams of scientists, augmented by computers, have been able to extract meaning from data – making an intimate bridge between science and data science. More recently, the sheer size, dimensionality and rate of scientific data has become so vast that increasing reliance on automation and intelligent systems has become prevalent. Algorithms can scour data at scales beyond human capability, finding interesting new phenomena and helping the discovery process.
The physical sciences have many examples of vast-scale algorithmic science projects. When it comes fully online, the Square Kilometre Array, a radio telescope network currently under construction in Australia and South Africa, will generate more data than the entire global internet traffic – and is already streaming data at almost a terabyte per second. The Large Hadron Collider at CERN discovered the elusive Higgs boson in data streams that were produced at a rate of gigabytes per second. Meteorologists and seismologists routinely work with complex global sensor networks that generate vast data sets, all differing in the type, quantity and quality of data produced.
Nor are the problems confined to the volumes of data now produced. The signal-to-noise ratio is often very low, and data may only provide biased estimates of desired quantities. Data is often incomplete, which complicates the extraction of automated meaning. Finally, we must always ask if the data and algorithm combination is able to answer the research question posed, and which combination of data and algorithm is the most valuable given the scientific objectives.
Addressing the issue of what data and which algorithm takes us to the issues of intelligent selection of experiments, models and methods, both to acquire new data and also to shed new light on old data. All these processes can be, and are, automated. The concept of optimal experimental design may be old, but modern equivalents, particularly work on AutoML (automated machine learning) brings intelligence into the way data and algorithms are chosen so as to maximise the informativeness gained. This can also take into account the costs (which may include, for example, economic costs, hardware and memory limitations, and time) associated with data recording and computation, enabling efficient, optimal experimentation to be performed with a given budget.
The laws of science are compressed, elegant representations that offer insight into the functioning of the universe around us. They are, ultimately, developed by logical (mathematical) formulation and through empirical observation. Both of these avenues have seen revolutions in the application of machine learning and AI in recent years. AI systems can formulate axiomatic extensions to existing laws, and the wealth of data available from experiments allows for science to take place in the data.
We are already at the point at which AI systems can infer such things as conservation properties (such as the conservation of energy and momentum) and propose underlying ‘laws’, given only data. Furthermore, they can propose experiments to gather maximal knowledge from new data. Couple to this logical reasoning capability and the ability to operate at scales well beyond human, and one has a recipe for a genuine automated scientist.
In the coming decade we are likely to see a growth in quantum computation for machine learning. This promises the ability to solve the hardest problems in machine learning and beyond using some of the most bizarre physics we know – which will be transformational.
This is truly the age of the algorithm – and these algorithms are machine learning.
Professor Stephen Roberts, RAEng/Man Group Professor of Machine Learning and Director of the Oxford-Man Institute of Quantitative Finance.