High-performance computing in Europe: Deciphering redshifted 21-cm signals

We work closely with the Partnership for Advanced Computing in Europe (PRACE), which provides leading scientists with access to the most powerful computers in Europe. Our new blog series will be showcasing this amazing work through stories we have written for their latest publication, the PRACE Digest 2021.

The genesis of our universe has long been a source of fascination for astrophysicists, but the technological challenges of gathering data about these early stages after the Big Bang has meant that much of it remains a mystery. This is all set to change, however, with the construction of the Square Kilometre Array (SKA) telescope. Professor Andrei Mesinger of Scuola Normale Superiore, Italy, has been preparing for this by developing methods for deciphering the incoming avalanche of data.

“My aim as a scientist is to try and understand the early universe, the first billion years after the Big Bang,” says Mesinger. “In this time the universe expanded and cooled, and the first structures that would eventually become stars and galaxies took shape. After these first galaxies formed, the light from them spread out and eventually percolated all of space. I am investigating these cosmic milestones, known as the cosmic dawn and reionisation, by developing theoretical models for how the first galaxies and the intergalactic medium – the web structures in between galaxies – evolve. But to confirm whether these models are accurate, we need to compare them to real data.”

Professor Andrei Mesinger of Scuola Normale Superiore, Italy

Fortunately for Mesinger, a wealth of such data is set to become available in the form of radio maps that will be produced by the Square Kilometre Array (SKA) telescope. The signal being detected is known as the redshifted 21-cm signal. Corresponding to the spin-flip transition of neutral hydrogen, which makes up the majority of the universe, the 21-cm line can provide information about the temperature and ionization state of cosmic gas. The data being collected by SKA is therefore set to transform astrophysical cosmology, bringing a historically data-starved field into the era of Big Data.

Crucially, with the development of new interferometers like SKA, these 21-cm signal radio maps will allow researchers to map out the first billion years of the universe, enabling them to learn about the properties of the unseen first generations of galaxies. Data from SKA is expected to start arriving around ten years from now, but preparatory work has already begun so that researchers can hit the ground running when the telescope becomes functional. Mesinger has recently been leading a PRACE project called “AIfor21CM – Artificial Intelligence for 21-cm Cosmology”, which was awarded 20 000 000 core hours on Piz Daint hosted by CSCS, Switzerland, and aimed to optimise the analysis of the upcoming 21-cm images.

Deciphering these signals presents a difficult challenge. “We know that the radiation from galaxies drives the patterns in these signals, and if we assume some galaxy model, we can predict what the signal should look like,” says Mesinger. “But the question then is: how can we statistically compare these predictions to actual observations, in order to see which galaxy model is correct?”

The 21-cm signal is highly non-Gaussian, and so the common approach of compressing the images into a power spectrum (PS) summary statistic wastes potentially valuable information. Therefore, to extract as much information as possible from the signal, Mesinger and his team have made use of convolutional neural networks (CNNs). CNNs are especially useful for this purpose because they can adaptively select the optimal summary statistic that maximises their ability to recover astrophysics.

A 2D slice through a simulated radio intensity map of the first billion years of our universe. The vertical axis corresponds to the sky plane (spanning an angular size of roughly 15 Moon diameters), while the horizontal axis corresponds to lookback time (further towards the right corresponds to earlier times/further distances). Radiation from the first galaxies in our Universe imprints the large-scale patterns seen in this map. If we can “decipher” these complex patterns, we can learn the unknown properties of the unseen first galaxies.


“Our method involves using training sets where we know the right answer, which we use to teach the CNN to find the right answer,” explains Mesinger. “This is where the PRACE project came in. Our original work was severely limited by computational resources; network tuning was done “by hand” using only a few configurations, since each training of the CNN took days on our local CPU cluster. However, this PRACE project has allowed us to optimise the performance of our CNNs by using the efficiency of GPU clusters. We were able to introduce recurrent layers to our CNNs, which although very expensive to train, can efficiently follow the evolution of the maps along cosmic time.”

Moreover, since the parameter space of adjustable hyper-parameters (governing the network architecture) is enormous, running an automatic optimisation requires hundreds of thousands of CNN trainings. This can only be done on a Tier-0 GPU cluster like PizDaint. The end goal of this research is an optimised artificial neural network trained to infer the properties of the unseen first galaxies from realistic 21-cm images of reionization and the cosmic dawn. Having such a tool will enable researchers to understand the data gathered by SKA, thus maximising Europe’s significant investment in the experiment.

The project is now completed, but a numbe of obvious extensions to the work remain, as Mesinger explains: “Now that the networks have been trained, the next step is to do this in a fully Bayesian way. Right now, the networks give us a best guess, but we do not have a great idea of the uncertainty of this best guess. Our next move would therefore be to combine the predictions from neural networks with a Bayesian inference framework.”

Mesinger and his team ran into several technical problems during the PRACE allocation due to the demanding nature of their calculations, both in terms of processing power and RAM. “The staff at the centre were very helpful the whole way through, assisting us in restructuring how the data was loaded in and out of memory, and writing a bottleneck-free I/O pipeline. This was crucial for our huge data demands,“ adds David Prelogovic, a PhD student of Mesinger and lead author on the resulting scientific paper. “Having them there helped us to make this project a success, and we hope this work will prove to be useful when SKA finally starts to deliver data,” concludes Mesinger.

Read more about the pioneering science being done on Europe’s most powerful supercomputers

UK can still participate in EU projects, despite Horizon Europe confusion
Previous Story
A welcome guarantee of funding – but we still need to associate with Horizon Europe – now!
Next Story
Want to read more like this? Subscribe to Projects Magazine today

Make your research count

Contact us now and let us help your research reach the right people