SKA signs Big Data cooperation agreement with CERN
17.07.17 - CERN Headquarters, Geneva, Friday 14 July 2017 – SKA Organisation and CERN, the European Laboratory for Particle Physics, yesterday signed an agreement formalising their growing collaboration in the area of extreme-scale computing.
CERN Headquarters, Geneva, Friday 14 July 2017 – SKA Organisation and CERN, the European Laboratory for Particle Physics, yesterday signed an agreement formalising their growing collaboration in the area of extreme-scale computing.
The agreement establishes a framework for collaborative projects that addresses joint challenges in approaching Exascale* computing and data storage, and comes as the LHC will generate even more data in the coming decade and SKA is preparing to collect a vast amount of scientific data as well.
Around the world, countries are engaged in efforts to cope with a leap in the demands of Information and Communication Technology. The Square Kilometre Array (SKA) project, the world’s largest radio telescope when built, and CERN’s Large Hadron Collider (LHC), the world’s largest particle accelerator, famous for discovering the Higgs Boson, will contribute in driving the required technological developments.
“The signature of this collaboration agreement between two of the largest producers of science data on the planet shows that we are really entering a new era of science worldwide”, said Prof. Philip Diamond, SKA Director-General. “Both CERN and SKA are and will be pushing the limits of what is possible technologically, and by working together and with industry, we are ensuring that we are ready to make the most of this upcoming data and computing surge.”
“The LHC computing demands are tackled by the Worldwide LHC computing grid which employs more than half a million computing cores around the globe interconnected by a powerful network. As our demands increase with the planned intensity upgrade of the LHC we want to expand this concept by using common ideas and infrastructure, into a scientific cloud. SKA will be an ideal partner in this endeavour.” said Prof. Eckhard Elsen, CERN Director of Research and Computing.
CERN and SKA have identified the acquisition, storage, management, distribution, and analysis of scientific data as particularly burning topics to meet the technological challenges.
In the case of the SKA, it is expected that phase 1 of the project – representing approximately 10% of the whole SKA – will generate around 300 PB (petabytes) of data products every year. This is ten times more than today’s biggest science experiments.
CERN has just surpassed the 200 PB limit for raw data collected by the experiments at the LHC over the past seven years. A layered (tiered) system provides for data storage in the remote centres. The High-Luminosity LHC is estimated to exceed this level every year.
“This in itself will be a challenge for both CERN and SKA given the step change in the amounts of data we will have to handle in the next 5-10 years”, explains Miles Deegan, High-Performance Computing Specialist for the SKA. “Transferring an average dataset will take days on the SKA’s ultra-fast fibre optic networks, which are 300 times faster than your average broadband connection, so storing or even downloading this data at home or even at your local university is clearly impractical.”
As is already the case at CERN, SKA data will also be analysed by scientific collaborations distributed across the planet. There will be common computational and storage resource needs by both institutions and their respective researchers, with a shared challenge of taking this volume of data and turning them into science that can be published, understood, explained, reproduced, preserved and presented.
“Processing such volumes of complex data to extract useful science is an exciting challenge that we face”, adds Antonio Chrysostomou, Head of Science Operations Planning for the SKA. “Our aim is to provide that processing capability through an alliance of regional centres located across the world in SKA member countries. Using cloud-based solutions, our scientific community will have access to the equivalent of today’s 35 biggest supercomputers to do the intensive processing needed to extract scientific results. In short, we need to fundamentally change how science is done.”
“CERN has proposed the concept of the Federated Open Science Cloud with other EIROForum members. This agreement is an important step in this direction.” said Ian Bird, responsible at CERN for the World-wide LHC Computing Grid. “Essentially, we will provide a giant cloud-based, Dropbox-like, facility to science users around the world, where they will be able to not only access incredibly large files, but will also be able to do extremely intensive processing on those files to extract the science.”
As part of the agreement, CERN and SKA will hold regular meetings to monitor progress and discuss the strategic direction of their collaboration. They will organise collaborative workshops on specific technical areas of mutual interest and propose demonstrator projects or prototypes to investigate concepts for managing and analysing Exascale data sets in a globally distributed environment. The agreement also includes the exchange of experts in the field of Big Data as well as joint publications.
* An Exabyte (EB) represents 1 billion gigabytes (GB)