The UL Procyon benchmarks for AI are designed to map different real-world workloads. They cover classic computer vision tasks, image generation with stable diffusion and text generation with large language models. The particular value of this suite lies in the fact that identical models are executed using different inference stacks. This makes it clear whether performance advantages come from optimized runtimes, from the hardware itself or simply from available VRAM. This limit is quickly reached in memory-intensive benchmarks, especially for cards with only 8 GB of memory, meaning that measurements are either severely limited or can no longer be carried out at all.
Interfaces and implementations
Windows ML is a generic inference API integrated into Windows. It uses DirectML and distributes operators to the CPU and GPU. The results are stable and manufacturer-independent, but in benchmarks they usually lag behind specialized runtimes. For example, the Intel Arc Pro B50 with Windows ML achieved 527 points in the Vision test, while the RTX A1000 fell behind with 311 points and the W7500 with 238 points. Especially with memory-hungry models such as LLaMA 2, bottlenecks quickly occur with 8 GB cards, making a run impossible.
TensorRT is NVIDIA’s engine for optimized inference, which merges operators and uses GPU memory efficiently. It achieves the best results on RTX cards in many cases, as long as there is enough VRAM available. In the Stable Diffusion test, the RTX A1000 with TensorRT scored 564 points – well ahead of its performance with ONNX Olive, which was only 174 points due to memory bottlenecks. However, with large language models such as LLaMA 2, the memory was no longer sufficient, so the test was aborted.
OpenVINO is Intel’s inference stack, which streamlines models via the Model Optimizer and distributes them to XMX units as well as CPU and GPU. The Arc Pro B50 regularly achieved the best results in the benchmarks: 609 points in Computer Vision FP32, 757 points in Stable Diffusion FP16 and top scores in text generation with Phi 3.5 (2589 points), Mistral 7B (2479 points), LLaMA 3.1 (2446 points) and LLaMA 2 (2402 points). The decisive factor here was not only the optimization, but also the larger VRAM of 16 GB, which made the difference with complex language models.
ONNX Olive optimizes models within the ONNX Runtime and runs vendor-neutral. Olive achieved solid results on the Arc Pro B50, such as 547 points with Stable Diffusion, or 1768 points with Phi 3.5. Compared to OpenVINO, however, the gap remained visible because Olive is more dependent on generic kernels. With the 8 GB models from AMD and NVIDIA, the memory limited the results: the W7500 only achieved 467 points with Stable Diffusion, the RTX A1000 174 points, while more complex language models such as LLaMA 2 could no longer be run at all with NVIDIA. The AMD-optimized ONNX runtime specifically addresses RDNA GPUs, but also remains dependent on memory.
AI Computer Vision FP32
In the area of general AI acceleration based on FP32, the cards show clear differences in the interpretation and calculation pipeline, which are closely related to the respective framework support. The two Radeon models RX 9070XT and AI R9700 use Microsoft ML very efficiently and therefore take the lead. NVIDIA benefits from TensorRT, but falls behind AMD, while Intel delivers solid but not leading values with OpenVINO with limited VRAM. The early limit of the RTX A1000 is striking, as it fails even before the actual computing pipeline due to insufficient memory.
Stable Diffusion 1.5 (FP16)
The calculations in FP16 mode benefit greatly from optimized ONNX implementations of the Radeon cards, which achieve a clear lead here in all variants. AMD’s efficient implementation of the diffusion pipelines via ONNX AMD scales better than TensorRT at NVIDIA and also more clearly than OpenVINO at Intel. The Intel Arc Pro B60 and B50 nevertheless remain respectably in the race and do not fall short of the relevant performance targets, but are more limited by bandwidths and internal memory latencies as the model size increases.
Text generation PHI 3.5
The autoregressive large-language model tasks show a clearly different picture, as Intel shows its greatest strength with OpenVINO. The Arc Pro B60 and B50 take the lead by a wide margin. NVIDIA benefits less clearly from TensorRT in this scenario and has to take a back seat to AMD, as pure matrix multiplications and token throughput are more important here. AMD performs stably in this test, but does not achieve the throughput characteristics of the Intel implementation.
Text generation Mistral 7B
The Mistral 7B model places higher demands on memory organization and parallel matrix processing, which again favours the Arc cards. Intel clearly takes the lead and maintains this position by a wide margin. AMD follows with decent performance, while NVIDIA slips to the back of the pack in TensorRT configurations and is sometimes slowed down by VRAM limits or memory accesses. The very early limit of the RTX A1000, which cannot perform complete model runs due to its low memory, is particularly noticeable.
Text generation LLAMA 3.1
LLAMA 3.1 once again confirms the leading efficiency of the Intel implementation. The Arc Pro B60 achieves a clear top position as the fastest card, followed by the Arc Pro B50. AMD’s cards follow with harmonious but noticeably lower values. NVIDIA is positioned behind both manufacturers, as TensorRT cannot scale as well with this model as it can with CNN-heavy or diffusion workloads. The RTX A1000 and other models with limited memory clearly fall behind in this test.
Text generation LLAMA 2
The order remains largely identical in LLAMA 2, as the requirements and pipeline structure are similarly distributed. Intel once again leads by a wide margin, AMD is positioned in the solid midfield and NVIDIA remains in the second half of the field despite TensorRT optimizations. The memory pressure of this model once again ensures that the small AMD variants reach their limits and cannot fully display the result.
Interim conclusion
The results so far show very clearly how strongly the performance characteristics of the tested GPUs differ from each other in AI workloads and how decisive the respective software and framework optimization is. While AMD takes a leading position in FP16-based diffusion models and classic image generation tasks, the balance of power in autoregressive language models shifts considerably in favor of Intel in some cases. The Arc-Pro models benefit in particular from OpenVINO, which provides a very efficient execution chain in these scenarios and generates an unexpectedly clear dominance in PHI 3.5 as well as in Mistral 7B and LLAMA variants. NVIDIA cannot exploit its strength in the AI ecosystem to the same extent here, as TensorRT is particularly convincing in inference-optimized CNN architectures, but is more limited by memory bandwidth, VRAM capacity and internal scheduling paths in larger autoregressive models.
The performance profile can therefore not be evaluated monolithically. Each manufacturer achieves clear strengths in different areas, but these are not equally transferable to other model classes. AMD impresses with a robust and broad-based FP16 pipeline, Intel demonstrates exceptionally high efficiency in token-based text models and NVIDIA benefits from TensorRT in selected configurations, but remains visibly limited by the VRAM equipment of the smaller professional models. Overall, a very differentiated but at the same time technically consistent picture emerges, which underlines the importance of software stacks and model optimizations in professional AI applications.
- 1 - Introduction and technical data
- 2 - Test system and equipment
- 3 - Autodesk AutoCAD
- 4 - Autodesk Inventor Pro
- 5 - PTC Creo
- 6 - Dassault Systèmes Solidworks
- 7 - Autodesk Maya
- 8 - SPECviewperf 15 (2025)
- 9 - Adobe Photoshop 26.10
- 10 - Adobe After Effects 2025
- 11 - Adobe Premiere Pro 25.41
- 12 - KI Benchmarks (AI Vision, Image, Text)
- 13 - Rendering
- 14 - Temperatures, clock rate, fans, noise and power draw
- 15 - Summary and conclusion










































19 Antworten
Kommentar
Lade neue Kommentare
Urgestein
Mitglied
Urgestein
1
Urgestein
Mitglied
1
Mitglied
Mitglied
Urgestein
1
Urgestein
1
Urgestein
Veteran
1
Veteran
Veteran
Urgestein
Alle Kommentare lesen unter igor´sLAB Community →