Running Container Jobs
What is a Container
A container is a lightweight virtualization technology used to encapsulate and isolate the runtime environment of an application or service. Compared to traditional virtualization technologies, containers eliminate the need for operating system virtualization and directly encapsulate the environment and dependencies required by the application or service on the host operating system. As a result, containers are very lightweight, allowing a new container instance to be started in seconds, while traditional virtualization technologies take minutes. At the same time, containers have higher resource utilization because they avoid the additional resource consumption brought by operating system virtualization.
Running computing jobs using container technology can solve environmental dependency problems on the one hand, and a container image that can run in a development environment can be safely submitted to any resource for production; on the other hand, isolation and standardization ensure the repeatability of computing to a large extent.
When to Use Containers
The most popular container technology in the industry today is Docker. Docker not only has a rich public image repository (Docker Hub) but also provides a complete set of interfaces. Therefore, we can not only build our own image repository to ensure the security and isolation of user images but also easily convert Docker images into other container image formats such as Singularity and Podman. The advantages of containers are reflected in many aspects:
- If you need to update your code frequently, using Docker images can save you the trouble of repeatedly creating Bohrium virtual machine images, with incremental pulling and extremely fast speeds;
- If you need to manage multiple versions of your code, Docker images can help you achieve this easily, and you no longer have to worry about not having enough images;
- If you need to replicate the environment on Bohrium for testing elsewhere, we strongly recommend Docker images, which break the strong environmental dependency of traditional cloud providers and supercomputers, allowing users to compile once and run anywhere;
- If you need to share an image with a friend without letting them damage the original environment, give Docker a try;
- ...
As you can see, container technology helps us solve the problems of environment migration, sharing, and deployment. It is very lightweight and does an excellent job of resource isolation at the system level, meeting our usage scenarios in multiple aspects and all-around.
How to Run Container Jobs on Bohrium
Bohrium currently supports Docker containers and provides Docker images. You can visit Bohrium's Image Center-Container Image to view the public container images we provide.
Here we take DeePMD-kit as an example to introduce how to submit container jobs on Bohrium:
Step 1: Prepare Input Data
The input files of DeePMD-kit are all stored in the Bohrium_DeePMD-kit_example
folder. After entering the data disk with the cd /personal
command, execute the following commands in sequence to download and unzip the input files:
wget https://bohrium-example.oss-cn-zhangjiakou.aliyuncs.com/Bohrium_DeePMD-kit_example.zip
unzip Bohrium_DeePMD-kit_example.zip
cd Bohrium_DeePMD-kit_example
Step 2, Prepare job.json
To use the container image and submit the container job, you only need to modify two places in the original job configuration file job.json:
job_type
field: Must be set to "container"
;
image_address
field: Fill in the Bohrium public container image address you need to use, that is, the "image address" at the number 3 in the figure below. You can also quickly query the image address in the "Container Image and Virtual Machine Image Mapping Table" at the end of this article.
Note:
- This field also supports filling in public image addresses in Docker Hub, without the need to fill in the domain name, for example:
"tensorflow/tensorflow-gpu:latest"
. - If you need to submit container jobs in parallel, you currently do not support using images other than public container images. If you have any requirements, you can send an email to bohrium. The job.json is as follows:
The modified job.json is as follows:
Note: Replace the
0000
afterproject_id
with your own project ID
{
"job_name": "DeePMD-kit test",
"command": " cd se_e2_a && dp train input.json > tmp_log 2>&1 && dp freeze -o graph.pb",
"log_file": "se_e2_a/tmp_log",
"backward_files": ["se_e2_a/lcurve.out","se_e2_a/graph.pb"],
"project_id": 0000,
"platform": "ali",
"machine_type": "c4_m15_1 * NVIDIA T4",
"job_type": "container",
"image_address": "registry.dp.tech/dptech/deepmd-kit:2.1.5-cuda11.6"
}
Step 3, Submit container jobs using Lebesgue Utility
After preparing the job.json, you can use Bohrium CLIto submit DeePMD-kit jobs:
bohr job submit -i job.json -p ./
Container Image and Virtual Machine Image Mapping Table
If you are currently using the virtual machine images provided by Bohrium to submit jobs, you can query their corresponding container image addresses in the table below and replace the image_name
field in job.json:
Pre-installed Software | Virtual Machine Image | Container Image Address |
---|---|---|
DeePMD-kit | LBG_DeePMD-kit_2.1.4_v1 and previous DeePMD-kit versions | registry.dp.tech/dptech/deepmd-kit:2.1.5-cuda11.6 |
DPGEN | LBG_DP-GEN_0.10.6_v3 and previous DP-GEN versions | registry.dp.tech/dptech/dpgen:0.10.6 |
LAMMPS | LBG_LAMMPS_stable_23Jun2022_v1 | registry.dp.tech/dptech/lammps:29Sep2021 |
GROMACS | gromacs-dp:2020.2 | registry.dp.tech/dptech/gromacs:2022.2 |
Quantum-Espresso | LBG_Quantum-Espresso_7.1 | registry.dp.tech/dptech/quantum-espresso:7.1 |
VASP | -- | Requires VASP authorization, please send your VASP authorization certificate to borirum email |
Common basic software | LBG_Common_v1 LBG_Common_v2 LBG_base_image_ubun20.04 LBG_base_image_ubun22.04 | registry.dp.tech/dptech/ubuntu:20.04-py3.10-cuda11.6 |
intel oneapi | LBG_oneapi_2021_v1 | registry.dp.tech/dptech/ubuntu:20.04-py3.10-intel2022-cuda11.6 |