How Melbourne Uni used a hybrid OpenStack cloud to speed up research apps

Application architects at Melbourne University have seen massive performance boosts and latency reductions after using OpenStack tools to rearchitect their high-performance computing (HPC) environment to function more like a cloud computing platform.

Called Spartan, the project grew out of escalating demand for computing resources from the university’s many scientific users, who for years have run a variety of computationally-intensive applications on large HPC clusters built from commodity Linux servers.

Where cloud environments are based on large numbers of virtual machines (VMs) running on moderately-powered commodity servers, HPC environments spread computational tasks across large numbers of computing cores and boost the flow of data between specialised interconnects.

“Cloud systems primarily exist for their ease of management, their flexibility, and for being the historical precursor of virtualised hardware,” Lev Lafayette, HPC support and training officer with the University of Melbourne, told this week’s OpenStack Australia Day in Melbourne.

“However, clouds are not high performance – and they can have very poor performance compared to our bare-metal HPC partitions. But their flexibility is worth the small overheads.”

HPC links large numbers of conventional CPUs using low-latency connections like InfiniBand and 40Gbps Ethernet with nodes based on Nvidia Tesla K80 graphical processing units (GPUs), offering massive additional computing capacity.

This had delivered average latency of around 19 microseconds (µsec) on the typical HPC node – with 276 computing cores and 21GB of RAM per code – compared with 60µsec in a cloud partition running 400 VMs on more than 3000 cores.

Those differences are crucial for high-end HPC workloads involving hundreds of gigabytes of data.

“HPC systems aren’t for the cloud,” compute integration specialist Dr David Perry said.

“They’re managed services, and each is a little different and optimised for its own environment.”

Despite their power, a recent analysis of submitted workloads showed that around 75 percent of the tasks submitted by users were able to run on a single HPC node – meaning they weren’t taking advantage of the expensive HPC interconnects and overall architecture.

Rather than keep blindly investing in those interconnects, Lafayette said, the team began exploring ways to bridge the two usage models – leveraging OpenStack tools to overlay the HPC environment with a more flexible hybrid, cloud-like architecture.

“Your best option is to build a system that is proportional to the usage you have,” Lafayette said.

“And then you can incrementally upgrade at a later date.”

The HPC team drew on a range of commonly used tools to build an abstraction layer over the HPC environment, including the Slurm workload manager, which distributes tasks across available resources; Git version control; Gerrit paired systems administration; and Puppet configuration management.

Rounding out the operating system layer was heavy utilisation of Nova, an OpenStack service that allows for provisioning and decommissioning of virtual machines on demand.

This platform allowed better allocation of the one-node computing jobs within the virtualisation partition, while also allowing conventional HPC tools to shuttle more complex jobs onto the conventional HPC core.

As well as offering users access to a managed hybrid HPC-cloud environment, the team has worked with users to find other ways of optimising their code to boost performance.

One beneficial approach has been to push for new applications to be provided as source code and built within the new environment, using tools like EasyBuild and Singularity.

This approach, which has already been used on more than 1000 applications, has frequently delivered performance improvements of 25 percent or more.

Some applications have boosted speed by 10 times or more “because they are being built from source rather than the commonly available package,” Lafayette said.

“It may take you a fair bit of time to install a package this way, but the first time someone submits a 30-day job you’ve gotten that time back.”

Heavy use of virtualisation has also paid benefits in allowing the team to maintain multiple versions of key applications – which can be essential in an academic environment because haphazard version changes can compromise the reproducibility of research results.

The idea may seem heretical to HPC purists, but by revisiting the overlying architecture the Melbourne University team has been able to deliver a fully containerised architecture running within cloud virtual machines on an HPC environment.

“As far as the user is concerned,” Perry said, “they don’t necessarily know they’re using a supercomputer.”

Future plans include the ability to burst onto the Microsoft Azure cloud platform; expansion in the use of GPUs for raw computing power; use of Thespian for testing; and the addition of new architectures using additional VMs.

The architecture “allows our users to dynamically change their system environment as they need,” Lafayette said.

“They can have the consistency when they’re doing a research project, then switch between modules and recompile with particular extensions. They get the best of both worlds there.”

Boosting zero trust maturity requires a strategic approach

Cisco to lay off thousands more in second job cut this year

GPS spoofers 'hack time' on commercial airlines, researchers say

AGL runs retail technology transformation in two phases

In Pictures: Skybox and BT security roundtable

How Melbourne Uni used a hybrid OpenStack cloud to speed up research apps

Combines HPC’s power with cloud’s virtualised flexibility.

Partner Content

Sponsored Whitepapers

Events

Most Read Articles

Defence to trial four containerised computer rooms

Defence to build 'virtual environments' to model decisions and systems

Coles Group opens automated distribution centre in Sydney's west

How chip giant Intel spurned OpenAI

Digital Nation

Most popular tech stories

State of Security 2023

COVER STORY: Sustainability and AI, a promising partnership or an environmental grey area?

FYAI: What is an AI hallucination and how does it impact business leaders?

Case study: Warren and Mahoney adopts digital tools to reduce its carbon footprint

Cricket Australia automates experiences for fans and players

How Dicker Data is helping partners find success in the Azure Cloud with TechClick

State of the MSP 2024

Register for CRN Pipeline 2024!

IoT Impact conference returns to UTS in 2024

Dell cuts jobs, restructures to become ‘a leaner company’ for AI era

Right to repair: Large scale IT buyers can influence product design... and they should

Shivering in summer? Sweating in winter? Your building is living a lie

Building a modern workplace for a remote workforce

Venom BlackBook Zero 15 Phantom

Right to repair: Large scale IT buyers can influence product design... and they should

Photos: The 2024 IoT Awards winners

Photos: Australian industry explores data for net zero

IoT Impact conference returns to UTS in 2024

Announcing the winners of the 2024 IoT Awards

IoT Awards: WaterGroup combines IoT technology and human support

Boosting zero trust maturity requires a strategic approach

Cisco to lay off thousands more in second job cut this year

GPS spoofers 'hack time' on commercial airlines, researchers say

AGL runs retail technology transformation in two phases

In Pictures: Skybox and BT security roundtable

How Melbourne Uni used a hybrid OpenStack cloud to speed up research apps

Combines HPC’s power with cloud’s virtualised flexibility.

Partner Content

Sponsored Whitepapers

Events

Most Read Articles

Defence to trial four containerised computer rooms

Defence to build 'virtual environments' to model decisions and systems

Coles Group opens automated distribution centre in Sydney's west

How chip giant Intel spurned OpenAI

Digital Nation

Most popular tech stories

State of Security 2023

COVER STORY: Sustainability and AI, a promising partnership or an environmental grey area?

FYAI: What is an AI hallucination and how does it impact business leaders?

Case study: Warren and Mahoney adopts digital tools to reduce its carbon footprint

Cricket Australia automates experiences for fans and players

How Dicker Data is helping partners find success in the Azure Cloud with TechClick

State of the MSP 2024

Register for CRN Pipeline 2024!

IoT Impact conference returns to UTS in 2024

Dell cuts jobs, restructures to become ‘a leaner company’ for AI era

Right to repair: Large scale IT buyers can influence product design... and they should

Shivering in summer? Sweating in winter? Your building is living a lie

Building a modern workplace for a remote workforce

Venom BlackBook Zero 15 Phantom

Right to repair: Large scale IT buyers can influence product design... and they should

Photos: The 2024 IoT Awards winners

Photos: Australian industry explores data for net zero

IoT Impact conference returns to UTS in 2024

Announcing the winners of the 2024 IoT Awards

IoT Awards: WaterGroup combines IoT technology and human support

Log In