2019年2月19日 星期二

Install CUDA 10.0 + docker on Ubuntu 18.04

There are three sections
Section I - Host (Install Nvidia driver in HOST)

1. Check the computer has Nvidia card
$lspci -k | grep -EA2 'VGA|NVIDIA' -A 3
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
 Subsystem: CLEVO/KAPOK Computer 4th Gen Core Processor Integrated Graphics Controller
 Kernel driver in use: i915
 Kernel modules: i915
--
01:00.0 3D controller: NVIDIA Corporation GK106M [GeForce GTX 765M] (rev a1)
 Subsystem: CLEVO/KAPOK Computer GK106M [GeForce GTX 765M]
 Kernel driver in use: nvidia
 Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia


2. Install Nividia Driver (UEFI and Secure boot must be disabled in the BIOS)
    RTX 3070(不要看下的裝)
sudo apt remove nvidia-*
sudo apt-key adv --fetch-keys  http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo bash -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
sudo apt update
sudo apt-get install build-essential
sudo apt-get install linux-headers-$(uname -r)
sudo apt-get install nvidia-driver-410
sudo apt-get install cuda-10-0 (要和cuDNN版本相同)
cuDNN
CUDA Driver Map Table
+---------------------------+--------------------------------+
|CUDA Toolkit               |   Linux x86_64 Driver Version  |
|------------------------------------------------------------|
|CUDA 10.0.130              | >= 410.48                   |
-------------------------------------------------------------|
|CUDA 9.2 (9.2.148 Update 1)| >= 396.37                   |
+------------------------------------------------------------+


3. Switch to Nividia
sudo prime-select nvidia


4. Check which card is being used right now
a. system settings -> details -> Graphes GeForce GTX 765M/PCIe/SSE2
b. $ prime-select query


5. Reboot
$ sudo reboot


6. Check /dev/, it must has below files about nvidia
$ ls /dev/nvidia*
/dev/nvidia0 /dev/nvidia-modeset /dev/nvidiactl /dev/nvidia-uvm 


7. Run nvidia-smi
ubuntu@04911f053958:~$ /usr/bin/nvidia-smi 
Tue Feb 19 05:12:02 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 765M    Off  | 00000000:01:00.0 N/A |                  N/A |
| N/A   54C    P0    N/A /  N/A |    645MiB /  2002MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+


8. Copy *nividia* and *cuda* from HOST /usr/lib/x86_64-linux-gun/ to Docker Container
$mkdir ~/cuda
$find ./ -type d \( -name "nvidia" -o -name "vdpau" -o -name "tls" -o -name "directfb-1.7-7" \) -prune -o -type l -name "*nvidia*" -exec cp -a {} ~/cuda \;
$find ./ -type d \( -name "nvidia" -o -name "vdpau" -o -name "tls" -o -name "directfb-1.7-7" \) -prune -o -type f -name "*nvidia*" -exec cp {} ~/cuda \;
$find ./ -type d \( -name "nvidia" -o -name "vdpau" -o -name "tls" -o -name "directfb-1.7-7" \) -prune -o -type l -name "*cuda*" -exec cp -a {} ~/cuda \;
$find ./ -type d \( -name "nvidia" -o -name "vdpau" -o -name "tls" -o -name "directfb-1.7-7" \) -prune -o -type f -name "*cuda*" -exec cp {} ~/cuda \;
$cp -r nvidia/ vdpau/ tls/ directfb-1.7-7/ ~/cuda/

Section II - Docker Setting

1. Dockerfile
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get -y install vim
RUN apt-get -y install sudo
RUN apt-get -y install locales
RUN apt-get -y install language-pack-zh-hant
RUN apt-get -y install language-pack-zh-hant-base

# Set the locale
RUN locale-gen lzh_TW.UTF-8
ENV LANG lzh_TW.UTF-8
ENV LANGUAGE lzh_TW.UTF-8
ENV LC_ALL lzh_TW.UTF-8
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,video,utility,graphics
ENV NVIDIA_REQUIRE_CUDA "cuda>=9.0"

RUN useradd -G sudo -u 1000 --create-home ubuntu
RUN echo "ubuntu:ubuntu" | chpasswd
RUN echo "root:ubuntu" | chpasswd

ENV HOME /home/ubuntu
WORKDIR /home/ubuntu
$ docker build -t image_name:tag_name . --no-cache
Parameter:
a. NVIDIA_VISIBLE_DEVICES
       all: all GPUs will be accessible, this is the default value in our container images
b. NVIDIA_DRIVER_CAPABILITIES
       compute: required for CUDA and OpenCL applications.
       compat32: required for running 32-bit applications.
       graphics: required for running OpenGL and Vulkan applications.
       utility: required for using nvidia-smi and NVML.
       video: required for using the Video Codec SDK.
       display: required for leveraging X11 display.


2. dock_cuda.sh
#!/bin/bash
xhost +local:
docker run -it  \
        --privileged \
        -v /tmp/.X11-unix:/tmp/.X11-unix \
        -v $LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu \
        -v $HOME:/home/ubuntu/host_home \
        -e DISPLAY=$DISPLAY \
        --user 1000 \
        image_name:tag_name

xhost -local:
$ ./dock_cuda.sh


Section III - Docker Container

1. Install Cuda with runfile
Download
Operating System: Linux
Architecture: x86_64
Distribution: Ubuntu
Version: 18.04
Installer Type: runfile(local)


2. Copy runfile.sh to Docker Container and run it
$ chmod 755 cuda_10.0.130_410.48_linux.run
$ ./cuda_10.0.130_410.48_linux.run
(Don't install Nividia Driver)


3. Setting environment
$vim ~/.bashrc
export PATH=$PATH:/usr/local/cuda/bin
export CUDADIR=/usr/local/cuda
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64

$source ~/.bashrc


4. Check with nvcc
ubuntu@04911f053958:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130


5. Copy library form HOST
$ cp -r ~/host_name/cuda /usr/lib/x86_64-linux-gnu/


6. Check with nvidia-smi (nvidia-smi is from HOST)
ubuntu@04911f053958:~$ ./nvidia-smi 
Tue Feb 19 05:12:02 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 765M    Off  | 00000000:01:00.0 N/A |                  N/A |
| N/A   54C    P0    N/A /  N/A |    645MiB /  2002MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+


7. Success, if see the below information
$cd NVIDIA_CUDA-10.0_Samples
$./bin/x86_64/linux/release/deviceQuery


./bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 765M"
  CUDA Driver Version / Runtime Version          10.0 / 10.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2002 MBytes (2099511296 bytes)
  ( 4) Multiprocessors, (192) CUDA Cores/MP:     768 CUDA Cores
  GPU Max Clock rate:                            863 MHz (0.86 GHz)
  Memory Clock rate:                             2004 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1
Result = PASS




ref:
1. askubuntu
2. Ubuntu 16.04 上安装 CUDA 9.0 詳细教程
3. How To Switch Between Intel and Nvidia Graphics Card on Ubuntu
4. nvidia-container-runtime

Q&A:
1. Q: NVIDIA-SMI couldn't find libnvidia-ml.so
Copy libnvidia-ml.so from HOST's /usr/lib/x86_64-linux-gnu
2. Q: "cudaGetDeviceCount returned 35" when runing deviceQuery
Copy *nvidia*.so and *cuda*.so from HOST's /usr/lib/x86_64-linux-gnu

沒有留言:

張貼留言