hans

hans

通过docker管理大型服务器用户

背景是实验室买了好多豪华服务器,不是很懂运维,也不想弄太复杂专业的服务。所以决定给每个用户创建一个 docker,用 portainer 管理。大家共享一个 raid0 硬盘空间,一人一个 GPU,CPU 利用率平等分配。服务器系统是 CentOS 7。

下面是所有用到的命令:

## set raid0
sudo yum install mdadm
# creat raid volume /dev/md0 based on 3 devices
sudo mdadm --create --verbose /dev/md0 --level=0 --raid-devices=3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
# reformat volume
sudo mkfs.ext4 /dev/md0
# creat mount point
sudo mkdir -p /data
sudo mount /dev/md0 /data
# automatically mount
echo '/dev/md0 /data ext4 defaults 0 0' | sudo tee -a /etc/fstab
# Save RAID Configuration
sudo mdadm --detail --scan | sudo tee -a /etc/mdadm.conf
# Verify the RAID Array
cat /proc/mdstat
# improve raid0 performance
sudo blockdev --setra 65536 /dev/md0
echo 32768 | sudo tee /sys/block/md0/md/stripe_cache_size # do not work

# ==============================================================================

## install portainer
sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo groupadd docker
sudo gpasswd -a $USER docker
newgrp docker
sudo systemctl start docker
sudo systemctl enable docker
docker volume create portainer_data
docker run -d -p 8000:8000 -p 9443:9443 --name portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v /data/portainer_data:/data portainer/portainer-ce:latest

## install portainer-agent, easy to manage all servers
docker run -d -p 9001:9001 --name portainer_agent --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/docker/volumes:/var/lib/docker/volumes portainer/agent

# ==============================================================================

## set nvidia docker
# set Nvidia Container Toolkit
sudo rpm --import https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/7fa2af80.pub
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
sudo yum install -y nvidia-docker2
sudo systemctl daemon-reload
sudo systemctl restart docker
# test installation
docker run --rm --gpus all nvidia/cuda:12.0.0-devel-ubi8 nvidia-smi

# ==============================================================================

## enable ip forward
sudo su
echo 1 > /proc/sys/net/ipv4/ip_forward
# public host port
sudo iptables -I INPUT -p tcp --dport 10241:10299 -j ACCEPT
sudo iptables-save

# ==============================================================================

## set up ubuntu contanier based on cuda version
user="hans"
device='"device=0,1"' # "all"
image="nvidia/cuda:12.0.0-devel-ubuntu22.04" # nvidia/cuda:12.2.2-devel-ubuntu22.04 nvidia/cuda:12.0.0-devel-ubuntu22.04
port=10333

cd /data
sudo mkdir $user
docker run -itd --name ubuntu-$user --ipc=host --pid=host --gpus $device --restart=always -v /data/$user:/home --shm-size 64g --cpu-shares 1024 -p $port:$port --net=bridge $image

# set ssh for ubuntu
docker exec -it ubuntu-$user /bin/bash
apt update && apt install vim ssh sudo make gcc g++ build-essential nfs-common virtualenv git curl
passwd
swansea@CS
echo "PubkeyAuthentication yes" | tee -a /etc/ssh/sshd_config
echo "PermitRootLogin yes" | tee -a /etc/ssh/sshd_config
echo "Port 10333" | tee -a /etc/ssh/sshd_config
service ssh restart
systemctl enable ssh
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64" | tee -a /etc/profile
echo "export PATH=$PATH:/usr/local/cuda/bin" | tee -a /etc/profile
echo "export CUDA_HOME=$CUDA_HOME:/usr/local/cuda" | tee -a /etc/profile
source /etc/profile
# ==============================================================================

## amend container configuration
docker commit ubuntu-hans ubuntu-hans-image
# re-run the container using new config and image

# ==============================================================================

## install nodejs (optional)
sudo yum install https://rpm.nodesource.com/pub_16.x/nodistro/repo/nodesource-release-nodistro-1.noarch.rpm -y
sudo yum install nodejs -y --setopt=nodesource-nodejs.module_hotfixes=1
# install localtunnel
sudo npm install localtunnel
加载中...
此文章数据所有权由区块链加密技术和智能合约保障仅归创作者所有。