跳到主要内容

Container Node Frequently Asked Questions

1. Container Node Bootup Issues

1.1. Image pulling, expected to take xx minutes

  • When a custom image is just built, the image cache is still being created. Using the image immediately will trigger image pulling during bootup.

  • If a custom image is unused for 30 days, the image cache will be cleared automatically. Using it again will trigger image pulling during bootup.

Image Cache Concepts:

  • Image caches are prebuilt to speed up developer machine bootup by avoiding pulling images at bootup.

  • Public images provided by Bohrium already have persistent image caches built. Using public images will boot up without extra image pulling.

  • Custom images are generated when building images for developer machines. When booting up with custom images, note the following:

    • Image cache building starts after the image is built. Cache building usually takes 10-30 mins depending on image size. Bootup will pull the image if cache building failed, causing longer bootup time. It's recommended to wait 30 mins before using the image to boot up.

    • Image cache expires after 30 days of no usage. Subsequent bootup will pull the image again and rebuild the cache asynchronously after bootup (so if cache is missing, after successful bootup using the image, wait awhile more before booting up more machines with it).

1.2. Billing during image pulling.

Billing starts once remote machine resources are allocated, even if still pulling image.

1.3. Bootup duration

  • With cached image, bootup usually takes 20-40s.

  • Average bootup time is around 20s for CPU machines.

  • Average bootup time is around 40s for GPU machines, due to extra preparation of GPU drivers.

    • During resource scarcity, GPU machine bootup could take 1-5 mins.
  • Mounting datasets adds 2-4s extra delay, regardless of dataset count. Only one extra delay.

2. Container Node Terminal Usage Issues

2.1. Unable to SSH into developer machine, can only access via web shell

  • Please confirm if the image contains components that support SSH functionality. Images created via "Add Public Image" are usually public images from DockerHub which typically do not have SSH login components pre-installed.

  • SSH component installation guide: See "Building Custom Images by Pulling from Public Network" section.

2.2. ssh root@domain login failed

  • Log into the machine via web shell and ensure sshd is installed correctly.

  • If a developer machine has been powered off for over 7 days, the domain name binding will expire. It will rebind after reboot, but there could be DNS propagation delays. Retry after 10-30mins. Use web shell first during this period.

2.3. Environment variables differ between SSH and web shell login

  • Web shell is like docker exec, inheriting dockerfile env vars.

    Composition: System env vars + /root/.bashrc vars

  • SSH refreshes /root/.bashrc, overwriting global vars.

    Composition: /root/.bashrc vars

2.4. Slow terminal response on container nodes

  • Check local network condition:

    • Check if on VPN which could cause access issues.

    • Access baidu.com to test speed.

  • Browser tabs after prolonged use could slow down. Try reconnecting.

  • Check if CPU intensive processes via top command:

    View process resource usage via top

    • If large files under current folder taking up CPU.

2.5. Frequent container node disconnections

  • Check local network condition:

    • Check if on VPN which could disconnect and reconnect.

    • Test network for abnormalities like router reboots, network switches etc.

    • Use mtr command to check packet loss when accessing node's domain.

  • Browser tabs after prolonged use could disconnect due to memory buildup. Try reconnecting.

3. Other Container Node Usage Issues

  • Ports 50001-50005 are open by default for public network access on developer machines.

  • GPU driver default version is 525 on developer machines. Cannot self-install or upgrade driver versions.

  • Cannot run docker within container developer machines due to security reasons.

  • Checking mounted datasets via df -h only shows one mountpoint. This is because mounted datasets are multiple but belong to one filesystem, so df -h does auto duplicate removal. Use df -a | grep bohr to view mountpoints.

4. Virtual Node Usage Issues

  • VM image sizes take up allocated system disk space.

  • Mounting datasets is unsupported.

  • Mounting shared disks is unsupported.