When using pytorch in docker with cuda, I always encounter the error no space left on device. That means the shm is full and we can check it by df -h | grep shm.
There’s two way to solve this problem:
-
Reboot the docker container. Make sure stop
xrdpservice in advance. -
Increase
shmsize.
The wired thing is I’ve already shutdown all training process, but the shm size still fully occupied.