Switching from NVIDIA to AMD (including tensorflow)
I have been using my Geforce 1060 extensively for deep learning, both with Python and R. But the always painful play with the closed source drivers and kernel updates, paired with the collapse of my computer’s PSU and/or GPU, I decided to finally do the switch to AMD graphic card and open source stack. And you know what, within half a day I had everything, including Tensorflow running. Yeah to Open Source!
Preliminaries
So what is the starting point: I am running Debian/unstable with a AMD Radeon 5700. First of all I purged all NVIDIA related packages, and that are a lot I have to say. Be sure to search for nv and nvidia and get rid of all packages. For safety I did reboot and checked again that no kernel modules related to NVIDIA are loaded.
Firmware
It seems that the current version of the amd-gpu-firmware
is sufficiently recent, so there is no need to manually update the firmware.
Debian ships the package amd-gpu-firmware
but this is not enough for the current kernel and current hardware. Better is to clone git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
and copy everything from the amdgpu
directory to /lib/firmware/amdgpu
.
I didn’t do that at first, and then booting the kernel did hang during the switch to AMD framebuffer. If you see this behavior, your firmware files are too old, please update the above mentioned package, or use the manual method shown.
Kernel
If you are using the Debian provided kernels in version 5.7 or 5.8 then you should be fine. If you compile your own kernel, make sure that the options shown in the following paragraph are activated:
The advantage of having open source driver that is in the kernel is that you don’t have to worry about incompatibilities (like every time a new kernel comes out the NVIDIA driver needs patching). For recent AMD GPUs you need a rather new kernel, I have 5.6.0 and 5.7.0-rc5 running. Make sure that you have all the necessary kernel config options turned on if you compile your own kernels. In my case this is
CONFIG_DRM_AMDGPU=m
CONFIG_DRM_AMDGPU_USERPTR=y
CONFIG_DRM_AMD_ACP=y
CONFIG_DRM_AMD_DC=y
CONFIG_DRM_AMD_DC_DCN=y
CONFIG_HSA_AMD=y
When installing the kernel, be sure that the firmware is already updated so that the correct firmware is copied into the initrd.
Support programs and libraries
WARNING: this description is for ROCm 3.3, which is not available anymore. OTOH, AMD now ships ROCm 3.8, but that cannot be installed directly due to a packaging “bug”. See a later blog post on how to fix it.
All the following is more or less an excerpt from the ROCm Installation Guide!
AMD provides a Debian/Ubuntu APT repository for software as well as kernel sources. Put the following into /etc/apt/sources.list.d/rocm.list
:
deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main
and also put the public key of the rocm repository into /etc/apt/trusted.d/rocm.asc
.
After that apt-get update
should work.
I did install rocm-dev-3.3.0
, rocm-libs-3.3.0
, hipcub-3.3.0
, miopen-hip-3.3.0
(and of course the dependencies), but not rocm-dkms
which is the kernel module. If you have a sufficiently recent kernel (see above), the source in the kernel itself is newer.
The libraries and programs are installed under /opt/rocm-3.3.0
, and to make the libraries available to Tensorflow (see below) and other programs, I added /etc/ld.so.conf.d/rocm.conf
with the following content:
/opt/rocm-3.3.0/lib/
and run ldconfig
as root.
Last but not least, add a udev rule that is normally installed by rocm-dkms
, put the following into /etc/udev/rules.d/70-kfd.rules
:
SUBSYSTEM=="kfd", KERNEL=="kfd", TAG+="uaccess", GROUP="video"
This allows users from the video
group to access the GPU.
Up to here you should be able to boot into the system and have X running on top of AMD GPU, including OpenGL acceleration and direct rendering:
$ glxinfo
ame of display: :0
display: :0 screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
...
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
...
Tensorflow
WARNING: Although the below example of addition of integers worked, floating point number computations are still (even at ROCm 3.8) NOT supported. For this reason, I have switched back to using my nVidia card for deep learning, and use the AMD for the graphic output. See this blog for details on how to do multiple GPU cards.
Thinking about how hard it was to get the correct libraries to get Tensorflow running on GPUs (see here and here), it is a pleasure to see that with open source all this pain is relieved.
There is already work done to make Tensorflow run on ROCm, the tensorflow-rocm project. The provide up to date PyPi packages, so a simple
pip3 install tensorflow-rocm
is enough to get Tensorflow running with Python:
>> import tensorflow as tf
>> tf.add(1, 2).numpy()
2020-05-14 12:07:19.590169: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so
...
2020-05-14 12:07:19.711478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7444 MB memory) -> physical GPU (device: 0, name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT], pci bus id: 0000:03:00.0)
3
>>
Tensorflow for R
Installation is trivial again since there is a tensorflow for R package, just run (as a user that is in the group staff
, which normally own /usr/local/lib/R
)
$ R
...
> install.packages("tensorflow")
..
Do not call the R function install_tensorflow()
since Tensorflow is already installed and functional!
With that done, R can use the AMD GPU for computations:
$ R
...
> library(tensorflow)
> tf$constant("Hellow Tensorflow")
2020-05-14 12:14:24.185609: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so
...
2020-05-14 12:14:24.277736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7444 MB memory) -> physical GPU (device: 0, name: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT], pci bus id: 0000:03:00.0)
tf.Tensor(b'Hellow Tensorflow', shape=(), dtype=string)
>
AMD Vulkan
From the Vulkan home page:
Vulkan is a new generation graphics and compute API that provides high-efficiency, cross-platform access to modern GPUs used in a wide variety of devices from PCs and consoles to mobile phones and embedded platforms.
Several games are using the Vulkan API if available and it is said to be more efficient.
There are Vulkan libraries for Radeon shipped in with mesa, in the Debian package mesa-vulkan-drivers
, but they look a bit outdated is my guess.
The AMDVLK project provides the latest version, and to my surprise was rather easy to install, again by following the advice in their README. The steps are basically (always follow what is written for Ubuntu):
- Install the necessary dependencies
- Install the Repo tool
- Get the source code
- Make 64-bit and 32-bit builds
- Copy driver and JSON files (see below for what I did differently!)
All as described in the linked README. Just to make sure, I removed the JSON files /usr/share/vulkan/icd.d/radeon*
shipped by Debians mesa-vulkan-drivers
package.
Finally I deviated a bit by not editing the file /usr/share/X11/xorg.conf.d/10-amdgpu.conf
, but instead copying to /etc/X11/xorg.conf.d/10-amdgpu.conf
and adding there the section:
Section "Device"
Identifier "AMDgpu"
Option "DRI" "3"
EndSection
.
To be honest, I did not follow the Copy driver and JSON files literally, since I don’t want to copy self-made files into system directories under /usr/lib
. So what I did is:
- copy the driver files to /opt/amdvkn/lib, so I have now there
/opt/amdvlk/lib/i386-linux-gnu/amdvlk32.so
and/opt/amdvlk/lib/x86_64-linux-gnu/amdvlk64.so
- Adjust the location of the driver file in the two JSON files
/etc/vulkan/icd.d/amd_icd32.json
and/etc/vulkan/icd.d/amd_icd64.json
(which were installed above under Copy driver and JSON files) - added a file
/etc/ld.so.conf.d/amdvlk.conf
containing the two lines:/opt/amdvlk/lib/i386-linux-gnu /opt/amdvlk/lib/x86_64-linux-gnu
With this in place, I don’t pollute the system directories, and still the new Vulkan driver is available.
But honestly, I don’t really know whether it is used and is working, because I don’t know how to check.
With all that in place, I can run my usual set of Steam games (The Long Dark, Shadow of the Tomb Raider, The Talos Principle, Supraland, …) and I don’t see any visual problem till now. As a bonus, KDE/Plasma is now running much better, since NVIDIA and KDE has traditionally some incompatibilities.
The above might sound like a lot of stuff to do, but considering that most of the parts are not really packaged within Debian, and all this is rather new open source stack, I was surprised that in half a day I got all working smoothly.
Thanks to all the developers who have worked hard to make this all possible.
Hello, could you please test tensorflow with basic mnist training. After installation I get SIGABRT 156 error while training.
I have RX 5700 XT and I installed all things according to your guide and I get error.
Here’s code:
from __future__ import print_function
import tensorflow.keras as keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K
batch_size = 128
num_classes = 10
epochs = 12
# input image dimensions
img_rows, img_cols = 28, 28
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
if K.image_data_format() == ‘channels_first’:
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype(‘float32’)
x_test = x_test.astype(‘float32’)
x_train /= 255
x_test /= 255
print(‘x_train shape:’, x_train.shape)
print(x_train.shape[0], ‘train samples’)
print(x_test.shape[0], ‘test samples’)
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation=’relu’,
input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation=’relu’))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation=’relu’))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation=’softmax’))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=[‘accuracy’])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print(‘Test loss:’, score[0])
print(‘Test accuracy:’, score[1])
Here’s output:
(conda-dl) ferhat@ferhat-desktop:~/py$ python main.py
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
2020-08-04 23:14:53.088581: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libhip_hcc.so
2020-08-04 23:14:53.134386: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1579] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: Device 731f ROCm AMD GPU ISA: gfx1010
coreClock: 2.1GHz coreCount: 20 deviceMemorySize: 7.98GiB deviceMemoryBandwidth: -1B/s
2020-08-04 23:14:53.173682: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2020-08-04 23:14:53.174508: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
2020-08-04 23:14:53.179126: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
2020-08-04 23:14:53.179363: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
2020-08-04 23:14:53.179476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-04 23:14:53.179753: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
2020-08-04 23:14:53.184194: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3399905000 Hz
2020-08-04 23:14:53.184564: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55abb0110ec0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-04 23:14:53.184578: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-04 23:14:53.186057: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55abb0112a00 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
2020-08-04 23:14:53.186085: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Device 731f, AMDGPU ISA version: gfx1010
2020-08-04 23:14:53.186211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1579] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: Device 731f ROCm AMD GPU ISA: gfx1010
coreClock: 2.1GHz coreCount: 20 deviceMemorySize: 7.98GiB deviceMemoryBandwidth: -1B/s
2020-08-04 23:14:53.186249: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocblas.so
2020-08-04 23:14:53.186273: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libMIOpen.so
2020-08-04 23:14:53.186282: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocfft.so
2020-08-04 23:14:53.186292: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library librocrand.so
2020-08-04 23:14:53.186324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-04 23:14:53.186335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-04 23:14:53.186340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-08-04 23:14:53.186344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-08-04 23:14:53.186412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7384 MB memory) -> physical GPU (device: 0, name: Device 731f, pci bus id: 0000:03:00.0)
Segmentation fault (core dumped)
Indeed, I also get a segfault. That is not good. Are you sure the code runs correctly on nvidia/cuda, ie. there are no bugs in the code?
It could be easily that the amdgpu tensorflow part has some bugs.
That is the gdb backtrace, not that I know what to do with it
You can safely remove the part about firmware or Kernel configuration. It is unnecessary confusion to less versed users. The firmware and kernel in Debian testing and unstable does have everything built in (and had for more than a year). No need to compile anything. I use standard kernel from testing. Just recommend using linux kernel version 5.7 or 5.8 from Debian, and it will be safe, but even 5.6 works fine.
Thanks, I will remove (or comment) the firmware part, but since I compile all my kernels I will leave the kernel part in there, but mention that Debian provided kernels are actually fine.
Thanks for the suggestions!
You probably should mention that installing rocm-libs3.8.0 is rather tricky on Debian now, due to llvm-amdgpu3.8.0 depending on libgcc-7-dev. This can be fixed by repackaging this package and removing this dependency, or forcing it by dpkg -i somehow. See https://github.com/RadeonOpenCompute/ROCm/issues/1125
Yes, I have worked around this problem in the same way, but this article was written for rocm 3.3 where it worked out of the box. Unfortunately, since then AMD has made completely unreasonable changes. I will try to mention this, too.
I worked on similar transition two days. First I have manged vfio passthrough with latest arch linux kernel and made rx 6500 XT work in virutal machine both on windows and linux . Played 2k 100FPS apex with a freesync monitor then Installed rocm 5.0.2 then intalled amdvlk vulkan then installed glwf libraries and installed vulkan kompute.
Struggled for days for dual rocm install cupy on rocm tensorflow 2.8.0 on rocm. Altough I am not fluent at C++ digged into vulkan API and tried to make vulkan compute work.
Again I have the feeling there is a gigantic mess. Installed AUR packages pacman packages. Compiled from source. Everytime I get lots of errors and warnings when compiling things using gcc lvm. I havent learned SPIR-V but I am in the process of deciding if it is worth it.
Having the ability to code whatever you want in any gpu and os platform sounds promising. However when we have qemu-kvm – ESXİ and bandwidth and cloud servers. Do we really need to use vulkan as functional designers and inventor engineers to prototype ? Only for mobile (maybe mobile can get compute from cloud too considering 5G speeds).
I cant decide really. I thought It would be fascinating to write an indirect encoding neuroevolution agent for realtime AI usage (learns adapts and acts). However I am not sure it I should learn all the optimizations for vulkan to do this. A fast and fluent start would motivate me a lot. Seeing results of some compiled code output will be outstanding. That many struggle to setup. I already feel that I am inadequate. Even though I have written some cuda code in C++ before …
It would be really nice to use vulkan API for compute easily on linux …