Hello good people,
Hardware:
Setup: SER10 Max + Mate SE
OS: Ubuntu 24.04.4.
The mate se is equipped with 2× 4TB Samsung 990PRO in Raid1. The mate is connected to a smartphone charger to avoid power releated problems. no peripherals are connected though. The SER10 is only connected to power and ethernet.
Bios Settings:
Fast Boot: Enabled
USB Support: Full Initial
NVMe Support: Enabled
Above 4G Decoding: Enabled
Re-Size BAR Support: Enabled
SR-IOV Support: Disabled
BME DMA Mitigation: Disabled
Hot-Plug Support: Enabled
The issue:
The remaining issue is not the RAID configuration itself. The RAID1 array currently assembles cleanly and both Samsung 990 PRO drives show no SMART media errors or critical warnings.
The problem appears to be an intermittent USB4/PCIe link initialization issue between the Beelink SER10 Max and the Mate SE enclosure. During boot, the NVMe devices behind the USB4/Thunderbolt path sometimes briefly disappear or fail to respond, causing Link Down, Card not present, NVMe I/O errors, and temporary RAID degradation.
After the USB4/PCIe path is re-enumerated, the drives usually return and mdadm can assemble /dev/md0 as healthy [UU]. This points more toward USB4 retimer/link training, PCIe resource allocation, firmware/BIOS behavior, cable/port stability, or enclosure compatibility than toward a defective SSD.
A full sequential read of the complete mdadm RAID1 array completed successfully:
4.0 TB read in 21 minutes at 3.1 GB/s. No new dmesg errors appeared during or after the test, and the array remained healthy as [2/2] [UU].
What can I do? :(
Diagnostic info:
===== SYSTEM =====
Static hostname: homelab
Icon name: computer-desktop
Machine ID: b777be9993b042fc90a382e1bc0da262
Boot ID: 49addb8a239846d589b0bb423cd06b17
Operating System: Ubuntu 24.04.4 LTS
Kernel: Linux 6.17.0-22-generic
Architecture: x86-64
Hardware Vendor: AZW
Hardware Model: SER
Firmware Version: GPT.4xx.SERM1.V102.P8C0M0C15.09.BL
Firmware Date: Tue 2026-01-20
Firmware Age: 3month 6d
===== KERNEL CMDLINE =====
BOOT_IMAGE=/vmlinuz-6.17.0-22-generic root=/dev/mapper/ubuntu–vg-ubuntu–lv ro ipv6.disable=1 quiet splash amdgpu.dc=1 nvme_core.default_ps_max_latency_us=0 pci=realloc iommu=pt vt.handoff=7
===== RAID STATUS =====
Personalities : [raid0] [raid1] [raid4] [raid5] [raid6] [raid10] [linear]
md0 : active raid1 nvme1n1p1[2] nvme0n1p1[0]
3906884608 blocks super 1.2 [2/2] [UU]
bitmap: 0/30 pages [0KB], 65536KB chunk
unused devices: <none>
/dev/md0:
Version : 1.2
Creation Time : Sat Apr 25 19:15:41 2026
Raid Level : raid1
Array Size : 3906884608 (3.64 TiB 4.00 TB)
Used Dev Size : 3906884608 (3.64 TiB 4.00 TB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon Apr 27 15:19:57 2026
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
Name : homelab:0 (local to host homelab)
UUID : 12a4fcae:e64e947c:733d0481:01b9c535
Events : 1271
Number Major Minor RaidDevice State
0 259 12 0 active sync /dev/nvme0n1p1
2 259 11 1 active sync /dev/nvme1n1p1
===== BLOCK DEVICES =====
NAME SIZE FSTYPE TYPE MOUNTPOINTS MODEL SERIAL
sda 0B disk MassStorageClass 000000002961
nvme2n1 931,5G disk CT1000P310SSD8 2527528B2DB7
├─nvme2n1p1 200M vfat part
├─nvme2n1p2 16M part
├─nvme2n1p3 930,5G ntfs part
└─nvme2n1p4 810M ntfs part
nvme1n1 3,6T disk Samsung SSD 990 PRO 4TB S7DPNU0YA28520P
└─nvme1n1p1 3,6T linux_raid_member part
└─md0 3,6T ext4 raid1
nvme3n1 1,8T disk WD_BLACK SN850X 2000GB 2448B24A4H13
├─nvme3n1p1 512M vfat part /boot/efi
├─nvme3n1p2 5,5G ext4 part /boot
└─nvme3n1p3 1,8T LVM2_member part
└─ubuntu–vg-ubuntu–lv 1,8T ext4 lvm /var/snap/firefox/common/host-hunspell
/
nvme0n1 3,6T disk Samsung SSD 990 PRO 4TB S7DPNU0YA28500M
└─nvme0n1p1 3,6T linux_raid_member part
└─md0 3,6T ext4 raid1
===== USB4 / THUNDERBOLT =====
● AZW Mate SE
├─ type: peripheral
├─ name: Mate SE
├─ vendor: AZW
├─ uuid: c4148780-0071-18b1-ffff-ffffffffffff
├─ generation: USB4
├─ status: authorized
│ ├─ domain: 76923804-406c-a390-ffff-ffffffffffff
│ ├─ rx speed: 40 Gb/s = 2 lanes * 20 Gb/s
│ ├─ tx speed: 40 Gb/s = 2 lanes * 20 Gb/s
│ └─ authflags: none
├─ authorized: Mo 27 Apr 2026 13:33:44 UTC
├─ connected: Mo 27 Apr 2026 13:33:41 UTC
└─ stored: Sa 25 Apr 2026 16:21:30 UTC
├─ policy: iommu
└─ key: no
===== NVME LIST =====
Node Generic SN Model Namespace Usage Format FW Rev
/dev/nvme0n1 /dev/ng0n1 S7DPNU0YA28500M Samsung SSD 990 PRO 4TB 0×1 810.84 GB / 4.00 TB 512 B + 0 B 4B2QJXD7
/dev/nvme1n1 /dev/ng1n1 S7DPNU0YA28520P Samsung SSD 990 PRO 4TB 0×1 4.00 TB / 4.00 TB 512 B + 0 B 4B2QJXD7
/dev/nvme2n1 /dev/ng2n1 2527528B2DB7 CT1000P310SSD8 0×1 1.00 TB / 1.00 TB 512 B + 0 B V9CR001
/dev/nvme3n1 /dev/ng3n1 2448B24A4H13 WD_BLACK SN850X 2000GB 0×1 2.00 TB / 2.00 TB 512 B + 0 B 620361WD
===== NVME SMART SUMMARY =====
— /dev/nvme0 —
critical_warning : 0
temperature : 51 °C (324 K)
available_spare : 100%
available_spare_threshold : 10%
percentage_used : 0%
host_read_commands : 113791701
host_write_commands : 6878162
controller_busy_time : 97
power_cycles : 9
power_on_hours : 6
unsafe_shutdowns : 7
media_errors : 0
num_err_log_entries : 0
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1 : 51 °C (324 K)
Temperature Sensor 2 : 62 °C (335 K)
— /dev/nvme1 —
critical_warning : 0
temperature : 49 °C (322 K)
available_spare : 100%
available_spare_threshold : 10%
percentage_used : 0%
host_read_commands : 1666658
host_write_commands : 76041468
controller_busy_time : 96
power_cycles : 13
power_on_hours : 6
unsafe_shutdowns : 11
media_errors : 0
num_err_log_entries : 0
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1 : 49 °C (322 K)
Temperature Sensor 2 : 55 °C (328 K)
— /dev/nvme2 —
critical_warning : 0
temperature : 33 °C (306 K)
available_spare : 100%
available_spare_threshold : 5%
percentage_used : 0%
host_read_commands : 2610374
host_write_commands : 3452271
controller_busy_time : 13
power_cycles : 33
power_on_hours : 9
unsafe_shutdowns : 20
media_errors : 0
num_err_log_entries : 0
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1 : 33 °C (306 K)
— /dev/nvme3 —
critical_warning : 0
temperature : 34 °C (307 K)
available_spare : 100%
available_spare_threshold : 10%
percentage_used : 0%
host_read_commands : 830612473
host_write_commands : 201659786
controller_busy_time : 1379
power_cycles : 50
power_on_hours : 9840
unsafe_shutdowns : 39
media_errors : 0
num_err_log_entries : 0
Warning Temperature Time : 1
Critical Composite Temperature Time : 0
===== RELEVANT DMESG =====