New ANU-based supercomputer ‘Gadi’ is now fully operational

Gadi is set to solve 'the biggest and more complex' problems

nci australia
NCI Australia

Gadi, the National Computational Infrastructure’s (NCI) newest supercomputer, is now fully operational less than two months after the first tests took place.

Phase 1 of Gadi began on 18 November 2019 when 18 racks of Cascade Lake CPUs from Intel as well as 96 V100 GPUs from Nvidia were made available to its more than 4000 users for a transition phase.

NCI, which is a collaboration between the Australian National University (ANU), Geosciences Australia, CSIRO and the Bureau of Meteorology, worked with users to help bring their applications and hundreds of projects across to the new system.

Now, phase 2 of Gadi has commenced and users have transitioned off the Raijin nodes - NCI’s previous supercomputer. This means all research projects running on Raijin have all been transferred to Gadi.

"All of the technology has improved; the size of the nodes [is] much larger, both in computational throughput and in memory," NCI data storage services manager Daniel Rodwell told Computerworld.

“What that allows you to do is solve the biggest problem on the node and, collectively, larger, more complex problems over the whole machine. All in very general terms, of course."

Raijin had reached end of life, but there were also other reasons for a new system including ongoing growth in demand for NCI's services, Rodwell explained.

NCI also wanted to offer a system that was internationally competitive.

“What's driving that is ultimately time to result,” Rodwell said. “You've got a finite amount of time from conceptualising a research idea to being able to generate a published result. If you need vast amounts of computational time and resource, the bigger the machine you have the quicker the time to resolve.”

Gadi is born

In December 2017, the government announced a $70 million package to help NCI replace Raijin. At the time, it was Australia's highest performance research supercomputer ranking number 70 on the LINPACK Benchmark Top500, with a performance of 1.67 Petaflops – comparable to about 40,000 desktop computers working simultaneously.

The most recent Top500 listed Gadi's phase 1 at number 47 with the Raijin still holding on at number 239 on the list. The list also has an Australian "cloud provider" as number 249 but no other local supercomputers.

Gadi has 72,576 cores and measured performance of 4407 Petaflops. It was built by Fujitsu in partnership with Lenovo. Its runs Xeon Platinum 8274 24C 3.2GHz CPUs and uses Mellanox InfiniBand HDR as its interconnect network. NetApp provided enterprise class storage arrays that are clustered together in a DDN Lustre parallel file system delivering terabyte scale data transfer speeds.

The contract for Gadi, handed to Fujitsu, was made public in July 2019 with the first delivery of racks taking place on 2 August. This was followed by "hundreds of hard disk drives" arriving in August and compute racks in September.

The entire process of putting all parts together, as well as all electrical and cooling systems, took 110 days.

Rodwell said Gadi is expected to provide a "fairly good performance boost" for quite a few applications and it is anticipated it will deliver five to 10 times more computational capacity compared to the original Raijin installation in 2012.

What will happen to Raijin?

Raijin first began operating in 2013 and was at the time Australia's most powerful supercomputer capable of "completing 170,000 calculations for every human on the Earth every second".

In 2016, it was updated thanks to $7 million funding from the government’s National Collaborative Research Infrastructure Strategy (NCRIS) Agility Fund and matching funds from NCI’s partners. The funding helped bump Raijin’s performance up to 1.67 petaflops.

"Some parts of Raijin are quite elderly and will be retired," Rodwell said. "Some of the systems and all the components of Raijin that are newer will be retained on site and integrated into Gadi or integrated into our cloud environments."

Rodwell expects Gadi to last as long as Raijin did.

"Obviously, we'd like to build a new machine all the time. These are fairly complex bits of machinery though,” he said. “Our goal is to make sure that we can always deliver up to date technology to our research community," he added.

Copyright © 2020 IDG Communications, Inc.

  
Shop Tech Products at Amazon