GTR9 Pro 128G RAM Throttled: 72GB/s in 1:4 Mode, Need to Unlock

eddy111

Hi everyone, I recently started using the Beelink GTR9 Pro (Ryzen AI Max+ 395 / Strix Halo) with the top-tier 128GB LPDDR5-8000 configuration for local LLM inference (Qwen 3.5 27B). However, the performance is significantly lower than expected.

[Current Hardware Specs]

CPU/APU: AMD Ryzen AI Max+ 395

RAM: 128GB Micron LPDDR5 @ 8000MT/s

BIOS Version: GTRPR05 / P110

[Core Problem: Bandwidth Bottleneck]
Based on my benchmarks, the BIOS is forcing an extremely conservative mode for the 128GB RAM:

CPU-Z Reading: Uncore Frequency is locked at 998.0 MHz.

Bandwidth Benchmark: winsat mem results show only 72,416 MB/s.

Analysis: The theoretical peak bandwidth for Strix Halo should be around 256 GB/s. My result of 72GB/s indicates the memory controller (UCLK) is stuck in 1:4 Divider Mode, operating at less than 30% efficiency.

AI Inference Impact: Running Qwen 27B Q8, I’m only getting 5-10 t/s. With the full 8000MT/s bandwidth, this should be closer to 20 t/s.

[Attempted Solutions]

Enabled Performance Mode (140W) in BIOS.

Manually allocated 64GB VRAM (UMA).

Used UXTU to push TDP to 140W, but the bandwidth remains hard-locked at 72GB/s.

[Request]
Is there a newer version that addresses the 1:4 divider issue for 128GB RAM? I am looking for a way to manually set UCLK DIV1 MODE to 1:2. Hoping the Beelink team or community experts can provide an optimized firmware to unlock the full potential of this machine!

Beelink CS-Ian Chan

eddy111
Hi ,there

I will need to confirm with my colleague , and get back to you asap.
Thanks for your patience and your understanding.

Korea_ydh

I think there may be a misunderstanding about the reported memory bandwidth on the Beelink GTR9 Pro (Ryzen AI Max+ 395 / Strix Halo) with 128GB LPDDR5-8000.

Based on the screenshots and benchmarks mentioned, the system actually appears to be operating normally rather than being stuck in a “1:4 divider” mode.

CPU-Z DRAM Frequency (₉₉₈ MHz)
CPU-Z reporting around 997–1000 MHz is expected for LPDDR5-8000.

LPDDR5 uses an internal prefetch architecture, so the DRAM frequency displayed in CPU-Z is roughly ⅛ of the effective data rate.

Example calculation:

DRAM Frequency ≈ 998 MHz
Effective data rate ≈ 998 × 8 ≈ 8000 MT/s

So the memory appears to be running exactly at the rated LPDDR5-8000 speed.

Theoretical vs Measured Bandwidth

Strix Halo has a 256-bit LPDDR5 memory interface, which gives a theoretical maximum bandwidth of:

8000 MT/s × 256-bit ÷ 8 = 256 GB/s

However, this figure represents the total SoC memory bandwidth available to the GPU, NPU, and CPU combined.

CPU-based benchmarks such as:

WinSAT

AIDA64 CPU memory tests

CPU-Z

typically measure CPU core → memory bandwidth only, which is limited by the Infinity Fabric and CPU memory ports.

On modern LPDDR systems, measured CPU bandwidth is usually much lower than the theoretical peak.

Typical real-world examples:

Meteor Lake (LPDDR5X-7500)
Theoretical: ₁₂₀ GB/s
CPU benchmark: _60–70 GB/s

Phoenix (LPDDR5-6400)
Theoretical: ₁₀₂ GB/s
CPU benchmark: _55–65 GB/s

Given this, a WinSAT result of ₇₂ GB/s on LPDDR5-8000 is actually within a reasonable range for CPU-side memory bandwidth.

“UCLK 1:4 divider” assumption

The assumption that the system is stuck in a UCLK 1:4 divider mode based on CPU-Z’s “uncore frequency” is likely incorrect.

LPDDR systems do not behave the same way as desktop DDR5 platforms. Much of the memory PHY and clocking logic is integrated inside the SoC, and tools like CPU-Z do not always report those clocks accurately.

Therefore, interpreting the ₉₉₈ MHz value as a divider state is probably misleading.

LLM inference performance

For reference, running Qwen 27B Q8 on CPU only typically produces something like:

_5–10 tokens/s on high-end CPUs

So the reported inference speed is not unusual if the model is running primarily on CPU.

To fully utilize the available memory bandwidth on Strix Halo, inference should ideally use GPU offload (RDNA iGPU) rather than CPU-only execution.

Conclusion

Based on the data shown:

LPDDR5-8000 appears to be running at the correct speed

The ₇₂ GB/s bandwidth result is plausible for CPU-side benchmarks

The “1:4 divider” interpretation is likely a misunderstanding of how LPDDR clocking is reported

It would still be useful to run additional benchmarks (e.g., AIDA64 memory test or y-cruncher) to confirm the full memory performance of the system.

Beelink CS-Ian Chan

Korea_ydh
Hi , there

We appreciate your sharing !