A Quick-Start Guide to Seamless JupyterLab and Dask Integration on the PW Platform
Follow this guide to test a PW workflow that seamlessly integrates JupyterLab and Dask, allowing you to run a sample Jupyter Notebook with ease.
Purpose
This blog post introduces a PW workflow that generates an interactive session where users can run a sample Jupyter Notebook with Dask dependencies installed. After following the steps in this post, users will be more familiar with interactive sessions and Jupyter Notebooks, empowering them to run more intricate Notebooks (such as this Coiled sample Notebook).
Overview
JupyterLab is an open-source interactive development environment from Project Jupyter. You can use JupyterLab to run Jupyter Notebooks, where you can test code snippets and see their results in real time without having to execute an entire program.
Dask is an open-source Python library that has many uses for parallel computing. Adopters of Dask often use it to scale workloads with other libraries (like NumPy, Pandas, and Jupyter) across a cluster.
The Parallel Works team has developed a streamlined workflow to simplify the initiation of a JupyterLab interactive session on cloud compute resources. This workflow is specifically designed to make it easy for users to install the Dask extension for JupyterLab, which comes with a sample Jupyter Notebook to guide users through Dask and the Dask extension within the PW environment.
In the provided Jupyter Notebook example, the SLURMCluster object is used to deploy Dask on the SLURM cluster. Additionally, the workflow demonstrates the effortless transfer of data to and from a PW storage resource, which corresponds to an AWS S3 bucket. Authentication is handled seamlessly using short-term credentials.
Accessing the JupyterLab Workflow
Before running the Jupyter Notebook, ensure you have added the JupyterLab workflow from the PW Marketplace. Follow this link for detailed instructions.
Setting Up an AWS SLURM Cluster and S3 Storage
Create and start an AWS SLURM cluster and S3 storage on the platform. For detailed instructions, refer to the provided links:
- Creating, configuring, and starting a cluster (ensure you have at least one small partition)
- Creating and configuring the AWS S3 Bucket
Running the JupyterLab Workflow
Once the cluster is running, execute the JupyterLab workflow with the specified inputs, as demonstrated in the accompanying screenshot below.
JupyterLab SettingsToggle the Install Miniconda... parameter to Yes.
JupyterLab Server HostFor Service Host, select your running AWS resource.
Next, select the controller node of the AWS resource. The sample Jupyter Notebook utilizes the SLURMCluster object to submit jobs to the SLURM queue, requiring it to run on the controller node for proper functionality.
Advanced OptionsToggle the Install Dask Packages... parameter to Yes.
This parameter in conjunction with the Install Miniconda... parameter above streamlines the installation process, automatically setting up Miniconda and all the necessary Python packages. Additionally, a YAML definition file for this environment is shared with the workflow.
Configure the JupyterLab Workflow
Connecting to JupyterLab
Execute the workflow and click on the eye icon shown in the image to connect to JupyterLab. Note that the installation process may take some time, and you can monitor the progress in the job logs. When JupyterLab is ready, a notification is sent to the platform.
Connect to JupyterLab
Transferring and Accessing the Demo Notebook
Transfer the demo Jupyter Notebook from the job directory in the user workspace to the home directory of the cluster. The home directory of the cluster is mounted in the user workspace, allowing you to drag and drop the file easily. In the JupyterLab session, double-click on the Jupyter Notebook demo in the cluster's home directory and follow the provided instructions. Ignore any Dask Server Error messages if they appear.
By following these steps, you can seamlessly test the Jupyter Notebook and explore the capabilities of using JupyterLab and Dask on the Parallel Works platform.
Dask Extension for JupyterLab