You may have heard of the Large Hadron Collider (LHC) at CERN, the facility that discovered the Higgs Boson. But you might not have realised just how much experimental data is generated by the Collider, or thought about how all this data is stored for future use. Here at the Scientific Computing Department, it's our job to think about such issues.
Over 10% of the data generated by the third LHC run will be stored at our Scientific Data Centre, at the Rutherford Appleton Laboratory (RAL) in Oxfordshire. We are one of twelve global 'Tier-1' large computing centres responsible for the provision of infrastructure that archives, processes and analyses all this data.
To put this into perspective, leader of the Tier-1 team (which oversees the management of this data), Alastair Dewhurst, says, "At RAL, we expect to archive up to 40PB of data coming from CERN each year during LHC Run 3. This is the equivalent data rate of continuously streaming 400 4K Ultra HD videos, which would saturate a 10Gb/s network link for an entire year!"
The amount of data we are expected to store is not the only concern. The true issue lies in the continuously increasing volume of data being generated. So, what would have been a solution for the required storage capacity last year, will not be sufficient for this year. And next year this will be exacerbated further.
The graph puts into context how the amount of data being generated from the LHC has increased over the last 14 years. To solve this data storage conundrum, our team invested in high-capacity automated tape libraries – also known as 'tape robots'. Such systems allow future enhancement of storage capacity whilst providing a low cost-per-gigabyte solution. We can also utilise the partitions in the library, or use multiple robotic interface modules (RIMs) active in the same partition, to respond to concurrent user requests 24/7.
This year, despite Covid-19 causing a slight delay, the Tier-1 team has installed and integrated a new 9-frame Spectra TFinity tape library into the system already in use. This provides an additional 22 tape drives, 2 IBM® TS1160 FC drives and 20 IBM® TS1160 tape drives with JE media, as well as enhancing the older system by adding 2 more expansion frames. These developments allow us to now store an additional 130PB of data – that's equivalent to over 20 million DVDs!
By 2024, Alastair expects that, “The library will hold around 190PB of data, which will require a new generation of higher density tape storage as well as additional frames."
Watch our time-lapse video of the new tape robot installation.