Section I - Host (Install Nvidia driver in HOST)
1. Check the computer has Nvidia card
$lspci -k | grep -EA2 'VGA|NVIDIA' -A 3 00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06) Subsystem: CLEVO/KAPOK Computer 4th Gen Core Processor Integrated Graphics Controller Kernel driver in use: i915 Kernel modules: i915 -- 01:00.0 3D controller: NVIDIA Corporation GK106M [GeForce GTX 765M] (rev a1) Subsystem: CLEVO/KAPOK Computer GK106M [GeForce GTX 765M] Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
2. Install Nividia Driver (UEFI and Secure boot must be disabled in the BIOS)
RTX 3070(不要看下的裝)
sudo apt remove nvidia-*
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo bash -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
sudo apt update
sudo apt-get install build-essential
sudo apt-get install linux-headers-$(uname -r)
sudo apt-get install nvidia-driver-410
sudo apt-get install cuda-10-0 (要和cuDNN版本相同)
cuDNNCUDA Driver Map Table
+---------------------------+--------------------------------+ |CUDA Toolkit | Linux x86_64 Driver Version | |------------------------------------------------------------| |CUDA 10.0.130 | >= 410.48 | -------------------------------------------------------------| |CUDA 9.2 (9.2.148 Update 1)| >= 396.37 | +------------------------------------------------------------+
3. Switch to Nividia
sudo prime-select nvidia
4. Check which card is being used right now
a. system settings -> details -> Graphes GeForce GTX 765M/PCIe/SSE2
b. $ prime-select query
5. Reboot
$ sudo reboot
6. Check /dev/, it must has below files about nvidia
$ ls /dev/nvidia*
/dev/nvidia0 /dev/nvidia-modeset /dev/nvidiactl /dev/nvidia-uvm
7. Run nvidia-smi
ubuntu@04911f053958:~$ /usr/bin/nvidia-smi
Tue Feb 19 05:12:02 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 765M Off | 00000000:01:00.0 N/A | N/A |
| N/A 54C P0 N/A / N/A | 645MiB / 2002MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
8. Copy *nividia* and *cuda* from HOST /usr/lib/x86_64-linux-gun/ to Docker Container
$mkdir ~/cuda
$find ./ -type d \( -name "nvidia" -o -name "vdpau" -o -name "tls" -o -name "directfb-1.7-7" \) -prune -o -type l -name "*nvidia*" -exec cp -a {} ~/cuda \;
$find ./ -type d \( -name "nvidia" -o -name "vdpau" -o -name "tls" -o -name "directfb-1.7-7" \) -prune -o -type f -name "*nvidia*" -exec cp {} ~/cuda \;
$find ./ -type d \( -name "nvidia" -o -name "vdpau" -o -name "tls" -o -name "directfb-1.7-7" \) -prune -o -type l -name "*cuda*" -exec cp -a {} ~/cuda \;
$find ./ -type d \( -name "nvidia" -o -name "vdpau" -o -name "tls" -o -name "directfb-1.7-7" \) -prune -o -type f -name "*cuda*" -exec cp {} ~/cuda \;
$cp -r nvidia/ vdpau/ tls/ directfb-1.7-7/ ~/cuda/
Section II - Docker Setting
1. Dockerfile
FROM ubuntu:18.04
RUN apt-get update
RUN apt-get -y install vim
RUN apt-get -y install sudo
RUN apt-get -y install locales
RUN apt-get -y install language-pack-zh-hant
RUN apt-get -y install language-pack-zh-hant-base
# Set the locale
RUN locale-gen lzh_TW.UTF-8
ENV LANG lzh_TW.UTF-8
ENV LANGUAGE lzh_TW.UTF-8
ENV LC_ALL lzh_TW.UTF-8
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,video,utility,graphics
ENV NVIDIA_REQUIRE_CUDA "cuda>=9.0"
RUN useradd -G sudo -u 1000 --create-home ubuntu
RUN echo "ubuntu:ubuntu" | chpasswd
RUN echo "root:ubuntu" | chpasswd
ENV HOME /home/ubuntu
WORKDIR /home/ubuntu
$ docker build -t image_name:tag_name . --no-cache
Parameter: a. NVIDIA_VISIBLE_DEVICES all: all GPUs will be accessible, this is the default value in our container images b. NVIDIA_DRIVER_CAPABILITIES compute: required for CUDA and OpenCL applications. compat32: required for running 32-bit applications. graphics: required for running OpenGL and Vulkan applications. utility: required for using nvidia-smi and NVML. video: required for using the Video Codec SDK. display: required for leveraging X11 display.
2. dock_cuda.sh
#!/bin/bash
xhost +local:
docker run -it \
--privileged \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v $LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu \
-v $HOME:/home/ubuntu/host_home \
-e DISPLAY=$DISPLAY \
--user 1000 \
image_name:tag_name
xhost -local:
$ ./dock_cuda.sh
Section III - Docker Container
1. Install Cuda with runfile
Download Operating System: Linux Architecture: x86_64 Distribution: Ubuntu Version: 18.04 Installer Type: runfile(local)
2. Copy runfile.sh to Docker Container and run it
$ chmod 755 cuda_10.0.130_410.48_linux.run
$ ./cuda_10.0.130_410.48_linux.run
(Don't install Nividia Driver)
3. Setting environment
$vim ~/.bashrc export PATH=$PATH:/usr/local/cuda/bin export CUDADIR=/usr/local/cuda export CUDA_HOME=/usr/local/cuda export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 $source ~/.bashrc
4. Check with nvcc
ubuntu@04911f053958:~$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130
5. Copy library form HOST
$ cp -r ~/host_name/cuda /usr/lib/x86_64-linux-gnu/
6. Check with nvidia-smi (nvidia-smi is from HOST)
ubuntu@04911f053958:~$ ./nvidia-smi
Tue Feb 19 05:12:02 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 765M Off | 00000000:01:00.0 N/A | N/A |
| N/A 54C P0 N/A / N/A | 645MiB / 2002MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
7. Success, if see the below information
$cd NVIDIA_CUDA-10.0_Samples $./bin/x86_64/linux/release/deviceQuery ./bin/x86_64/linux/release/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 765M" CUDA Driver Version / Runtime Version 10.0 / 10.0 CUDA Capability Major/Minor version number: 3.0 Total amount of global memory: 2002 MBytes (2099511296 bytes) ( 4) Multiprocessors, (192) CUDA Cores/MP: 768 CUDA Cores GPU Max Clock rate: 863 MHz (0.86 GHz) Memory Clock rate: 2004 Mhz Memory Bus Width: 128-bit L2 Cache Size: 262144 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: No Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1 Result = PASS
ref:
1. askubuntu
2. Ubuntu 16.04 上安装 CUDA 9.0 詳细教程
3. How To Switch Between Intel and Nvidia Graphics Card on Ubuntu
4. nvidia-container-runtime
Q&A:
1. Q: NVIDIA-SMI couldn't find libnvidia-ml.so
Copy libnvidia-ml.so from HOST's /usr/lib/x86_64-linux-gnu
2. Q: "cudaGetDeviceCount returned 35" when runing deviceQuery
Copy *nvidia*.so and *cuda*.so from HOST's /usr/lib/x86_64-linux-gnu
沒有留言:
張貼留言