Next-Generation Cloud Clusters
We're excited to announce major improvements to our cloud clusters with the release of our next-generation provider.
For the past two years, we've provisioned clusters using Infrastructure as Code templates (IaC), which has allowed us to share the responsibility of development of cloud infrastructure provisioning with software engineers, system administrators, and our support staff which have more of an HPC focus. As we've scaled out our offering, and provide users with the ability to provision more varied types of resources across more clouds, we've learned a lot of lessons.
We want to provide a very transparent "window" into the provisioning process as there are a lot of places where things can go wrong. Lifting those problem moments up and making it easier to troubleshoot was a challenge using IaC, because it was ultimately meant to be used interactively. The best you can get is a set of logs, which for most users is not very helpful.
Another major concern we wanted to address was resiliency, so we ultimately needed more control over the entire provisioning process. In our newest provider, all of the code for provisioning clusters has been centralized into our core platform, using CSP-provided Software Development Kits. This will allow us to start building out more interesting and useful features around provisioning as well as rapidly adapt to changes in the CSPs' offerings.
Coming soon, we'll have a new module on the sessions page which shows each component that is being provisioned and its current status, and if something fails, it will be very easy to determine where things went wrong. Since we were already majorly overhauling the provisioning process, we also took this chance to tackle our "wish list" of cluster enhancements.