High Performance Computing (HPC) 101

Author: Matt Long

Multi-Cloud Computing Drives Digital Transformation

Across the commercial, academic, and defense sectors, organizations are facing increased pressure to modernize their digital infrastructure. Whether it’s mission planning, large-scale simulations, or real-time AI-driven analytics, the demand for flexible, secure, and scalable compute environments is accelerating.

But this shift isn’t just about adopting newer technologies. It’s about enabling smarter decision-making, faster innovation, and maintaining strategic and technological advantage in an increasingly complex digital landscape.

What is a computer?

A computer is a machine that takes an input and performs an operation to produce an output.

A similar machine related to a computer is a basic calculator. When using a calculator, your input consists of two things:

Numbers: Defined as an arithmetical value, expressed by a word, symbol, or figure, representing a particular quantity and used…you get the idea.
Operators: Instructions that tell the calculator what to do with the numbers you provide. A standard calculator specifically accepts arithmetic operators, which include addition, subtraction, multiplication, and division.

The output that follows is the result of the calculation.

Example: 1 + 2 = 3

The key difference between a computer and a calculator is a computer’s ability to deal with logical operators in addition to arithmetic. Like arithmetical operators, logical operators instruct the computer to perform certain tasks based on whether specified conditions are met or not. One of the simplest examples of this process is an if statement. If my coffee mug is empty, then I get more coffee. An if statement is used the same way by computers, often by instructing them to do something based on the value of one or more variables.

Consider this simple bash program:

#!/bin/bash

mug=”empty”

if [ $mug == “empty” ]; then
echo “get more coffee.”
fi

When this program is run, the computer will check the value of the $mug variable. Since the condition is met, it will print a message to “get more coffee.”

$ ./coffee.sh

get more coffee.

Of course, the usefulness of this program is limited by the fact that $mug has been hardcoded to a certain value. It would be handy if the program was actually able to determine if my mug was full. Perhaps it could do that if the computer knew the base weight of the mug when empty and received data from a scale periodically weighing the mug.

What is a supercomputer?

Math quickly gets more complicated than the 1 + 2 = 3 example provided earlier. Even high school-level math begins to require a graphing calculator capable of dealing with much more interesting arithmetic equations. In addition to arithmetic, graphing calculators are also capable of performing logical operations, effectively making them small computers.

Eventually, even your trusty TI-83+ becomes insufficient for unlocking the secrets of the universe. Numbers eventually get too big, and the logic too complicated for the CPU to make decisions fast enough. When you need the biggest computer you can find, you get a supercomputer.

We’re going to need a bigger computer. Image credit: Wikipedia

So what makes a computer super? The line is somewhat blurry. We’re all aware of just how fast computing capabilities advance. What’s bleeding edge today becomes a paper weight within months or years. Supercomputers, relatively speaking, are the same. With that perspective, a supercomputer can be described as a computer that is faster, better, and stronger than an average computer for its time.

Nintendo NES: A computer. Image credit: Wikipedia

A supercomputer…if the NES was the only other computer to compare it to. Image credit: Wikipedia

Blue Waters, an actual, albeit recently retired, supercomputer operated by NCSA at the University of Illinois. Image credit: Wikipedia

How are supercomputers built?

Now that we’ve established that a supercomputer is just a more capable computer, we can start to understand how they’re built. If you were tasked with building a supercomputer yourself, your first instinct might be to make a single giant processor that is faster than anything else available, and stuff it to the gills with memory. This was exactly how the earliest supercomputers were built.

The Cray-1 is an early supercomputer released in 1975, equipped with an 80 MHz CPU, 8.39 MB of memory, and 303 MB of storage. Image credit: Wikipedia

HPC demands quickly surpassed what was possible for a single processor to complete in a timely and cost-efficient manner. Supercomputers like the Cray-2 eventually started to be built with multiple processors that allowed for parallel processing of tasks. However, these parallel workloads were limited by the fact that each task had to be relatively independent. Many HPC workloads today can still be described as “embarrassingly parallel,” which means that each allocated processor is responsible for completing its own assigned task. Imagine you have a set of 1,000 math problems that take your computer 1 minute to solve each. You could wait 1,000 minutes for your computer to complete all of the tasks one at a time, or you could assign a different problem to 1,000 equivalent computers and be done with your work in a minute. This is one of the key advantages to using a supercomputer!

By the late 1990s, two key developments began to take shape that would bring HPC to the modern era. These are Message Passing Interface (MPI), and the Beowulf cluster model.

Developed by a team of scientists across a number of countries and institutions, MPI is a software library that allows processes to pass messages between one another, even over a network. This essentially enables processors to work together on certain problems. Message passing-like functionality may have had limited use prior to the development of MPI, but it would have been up to the application’s developers to implement in a specialized way. MPI provided a common standard to work with that quickly became ubiquitous in the HPC domain.

While early supercomputers could be described as a specially engineered giant machine, MPI and the advancement of computer networking speeds has made it much more cost-efficient to build supercomputers by clustering a large number of smaller computers into a single cohesive system. This is called a Beowulf cluster, and it is by far the most popular way to build a supercomputer today. A typical supercomputer can be made up of hundreds to thousands of distinct nodes, each fulfilling a specific role in the cluster. In contrast to highly parallel workloads described earlier, the development of MPI and the Beowulf cluster model helped pave the way for “tightly coupled” HPC applications that leverage a cluster’s high-speed network to complete their tasks.

A home built Beowulf cluster made using standard consumer desktops. Image credit: Wikipedia

A Beowulf cluster made at NCSA using PS2 Image Credit: NCSA, University of Illinois

How do people use supercomputers?

Supercomputers are used across a wide range of natural sciences, as well as finance. Use cases include: hurricane forecasting, automobile design, and vaccine research. See an example of computational fluid dynamics (CFD) simulation modeling below.

An example of computational fluid dynamics (CFD) simulation modeling

Unlike general purpose computers, the de facto standard UI to interact with a supercomputer is a command-line interface (CLI) such as bash. In recent years, there has been increased demand for a graphical user interface (GUI) to be available on modern HPC systems.

Supercomputer users conduct their work by running jobs on the system. Typically, a job is submitted by writing a batch file and submitting it to a queueing system. A batch file will include parameters such as how many resources the job requires, where to store log files, and instructions to start and run the application to produce results.

What is cloud computing?

The concept of cloud computing has an extensive history, but did not reach widespread use until the early to mid-2000s with the launch of Amazon Web Services (AWS). The premise is simple: instead of buying and maintaining a server yourself, you can rent the compute you need by running a virtual machine on top of someone else’s existing server infrastructure. This was a boon for small businesses and startups with minimal computing needs that did not want to deal with the cost of building out their own infrastructure and hiring IT professionals to deal with its upkeep.

Despite its convenience for small businesses, the emerging cloud computing industry was a tough sell for HPC applications, thanks in part to its high cost at such a large scale. Aside from raw cost, the network, storage, and compute capabilities offered by cloud computing companies were not up to par when compared with specially engineered on-premise HPC systems. Running a lightweight website or database simply doesn’t compare to the requirements of a typical HPC workload.

As the cloud computing industry continues to expand and mature, the notion of moving operations to the cloud at larger scales has become more cost effective. Major cloud service providers, such as AWS, Google Cloud Platform (GCP), and Microsoft Azure, have begun to invest in technology that targets HPC use cases more directly, including larger compute instances, high-speed premium networks, and fully managed high-performance parallel storage services. Generational advances in computer hardware and more efficient virtualization software have also helped minimize the performance overhead typically associated with running HPC scale workloads in the cloud.

Continuing problems of running HPC workloads in the cloud

HPC in the cloud is more accessible than ever, but there are still some complications that can make it less than ideal.

Data center capacity

To most people, “the cloud” is an abstract concept representing an endless supply of computing resources always available with the push of a button. Given the scale of the average cloud user’s footprint, this sentiment generally holds true for many.

The unique demands of HPC bring to light the reality of the situation though. In order to provide their services, the major cloud providers operate multiple large data centers around the world. These data centers are densely packed with servers, storage equipment, and network infrastructure necessary to provide their services. While the average cloud user shouldn’t normally have issues deploying a small set of general purpose VMs, HPC clusters will often require an instance count in the hundreds. This can be a challenge, especially since the larger servers suitable for HPC applications tend to be available in lower volume due to their more niche appeal. Getting the capacity you need for your application can be difficult on its own, but this issue compounds when your application also requires optimal placement of your cluster’s resources as well. In order to maximize your performance, it’s important to make sure your compute nodes are close to each other and their storage in order to minimize communication latency over the network.

Vendor lock-in & unexpected costs

It’s in the cloud provider’s best interest to keep you in their ecosystem, and once you have brought all your data into their infrastructure, it can be incredibly difficult to move somewhere else. This is most commonly seen in the form of egress fees, where the customer is billed for everything they transfer out of their cloud, or even when transferring data to a data center in a different region provided by the same cloud provider. This can make it hard to change providers, or even move data back to a locally available storage system.

Tracking usage costs on the cloud can also be very difficult, as billing data is only updated by the CSP a few times a day. Considering that an HPC cluster in the cloud can cost thousands of dollars after only a few hours, it’s crucial to keep an eye on how much you are spending in order to stay within your budget.

Steep learning curve

AWS, Azure, and GCP all have their own unique interfaces, product names, and features. Familiarity with a particular cloud provider is often reason enough to not shop around for other options, and the services are complex enough that there are entire classes and certifications specific to a single cloud provider and its products.

How does Parallel Works help solve these problems?

Parallel Works aims to make high-performance computing (HPC) system users’ access to supercomputers simpler, more efficient, and easier to track. This is primarily done via a control center capable of provisioning, connecting to, and running applications on HPC resources. The control center is accessible from a graphical web interface, or via an application programming interface (API) that empowers users to utilize the platform programmatically.

Common interface integrated with multiple cloud providers
Push button cloud clusters and integration with on-prem clusters
Real-time cost tracking and budget cutoffs
Integration with reservations

Meet us at SC25!

We’re a month away from SC25, the International Conference for High Performance Computing, Networking, Storage, and Analysis!

Parallel Works will be exhibiting at SC25, Booth #3947! Stop by to meet our team and learn how we’re enabling scientists, engineers, and organizations to accelerate discovery through scalable, intelligent compute orchestration.

Fill out this form to schedule a meeting with us at our booth. At the booth, you’ll get to:

See live demos of how Parallel Works automates high‑performance workloads
Chat with our team about best practices in scaling simulations, AI/ML, and data pipelines
Explore how our platform can reduce time to insight and cost for your HPC/AI workflows
Grab swag and connect over cutting‑edge tech in compute, cloud, and hybrid environments

Whether you’re exploring physics simulations, genomics, climate modeling, digital twins, or other compute intensive challenges, let’s talk about how Parallel Works can help you do more, faster and smarter.

Mark your calendar, swing by Booth #3947, and let’s build the future of compute together. See you in St. Louis!

In a world where speed, flexibility, and resilience are strategic imperatives, hybrid computing is redefining what’s possible from the battlefield to the laboratory and everywhere in between. ACTIVATE delivers secure, intelligent orchestration with guardrails, not roadblocks.

Parallel Works Thought Leadership

High Performance Computing (HPC) 101

October 24, 2025