I finally replaced my old Thinkpad E480 with a new HP ProBook 440 G11. The laptop comes with an integrated Intel Arc GPU and as a machine learning and AI person I immediatelly wanted to know if I can run AI workloads on it.
I am daily driving Fedora 41 and I performed all experiments on this distro.
Intel OneAPI
OneAPI is Intel’s version of CUDA. It lets you to run compute-intensive tasks on various types of Intel’s hardware: dedicated and integrated GPUs, FPGAs, or NPUs. To make OneAPI work on Fedora, you need to install a few packages:
sudo dnf install intel-opencl intel-compute-runtime oneapi-level-zero
Officialy, OneAPI supports only Ubuntu, but I had no issues installing it on Fedora.
PyTorch on Intel GPUs
Another piece of good news is that PyTorch recently added support for Intel hardware. Just as with CUDA, you need just need to set the proper index:
python -m venv intelgpu
source intelgpu/bin/activate
pip install torch==2.6.0+xpu --index-url https://download.pytorch.org/whl/xpu
PyTorch uses the name xpu
for Intel’s GPUs. After setting up our environment, we can run a simple Python command to check that an xpu
is available:
python -c "import torch; print(torch.xpu.is_available())"
To evaluate performance offered by the integrated GPU we can create a simple neural network in Keras for classifying the MNIST digits. We install Keras as a standard Python package:
pip install keras
Next, we implement a simple Python script for training the classifier inspired by the official example:
import os
os.environ["KERAS_BACKEND"] = "torch"
import numpy as np
import keras
from keras import layers
num_classes = 10
input_shape = (28, 28, 1)
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = keras.Sequential(
[
keras.Input(shape=input_shape),
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(num_classes, activation="softmax"),
]
)
model.summary()
batch_size = 128
epochs = 15
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
Results
With Intel GPU support enabled, one training step took about 12 milliseconds and one training epoch took about 5 seconds.
On the CPU, one traning step took on average 18-24 milliseconds and one training epoch took 7-10 seconds. To get these results, I created a second Python environment with a CPU-only version of PyTorch.
So, the integrated GPU provides a 50-100 percent speed-up. This is a far cry from a dedicated NVIDIA GPU, but if you need to run a quick experiment on a laptop while on the road, why not take advantage of the extra boost?
Caveats
I set the PyTorch version to 2.6.0 because training failed on the latest version 2.7.0. The .fit()
method raised a data corruption error. I am unsure if the issue is caused by Keras or PyTorch.
What about the NPU? The 155H CPU has one. But as far as I can tell, PyTorch doesn’t support it yet. I tried to make it work with the OpenVINO library from Intel, but I didn’t succeed. But NPUs are more about power efficiency than about raw computing power. Even the integrated GPU offers more computing power than an NPU. So not being able to use the NPU is not a big loss.
Leave a Reply