- Power Supply/Electrical Issues
Suspect: A failing PSU or unstable power delivery can cause abrupt lockups, especially under idle/low-load conditions (common at night).
Test: Swap the PSU with a known-good unit if possible.
Check: Ensure all power cables (motherboard, CPU, GPU) are securely connected. Look for capacitor bulges/leaks on the motherboard.
- Kernel/Driver Issues
Kernel Updates: Ubuntu 24.04’s stock kernel (likely 6.8.x) might have compatibility gaps.
Try upgrading to a mainline kernel (e.g., 6.9 or newer) via Ubuntu Mainline Kernel Installer.
Check for firmware updates (sudo fwupdmgr update).
Proprietary Drivers: Install vendor-specific drivers (e.g., GPU, chipset) instead of generic ones. Use ubuntu-drivers list and ubuntu-drivers install.
- Power Management & BIOS
Disable Aggressive Power Saving:
OS: Disable CPU idle states:
bash
sudo sed -i ‘s/GRUB_CMDLINE_LINUX_DEFAULT="/&processor.max_cstate=1 intel_idle.max_cstate=0 /’ /etc/default/grub
sudo update-grub
Prevent USB/SATA autosuspend:
bash
echo ‘ACTION==“add”, SUBSYSTEM==“usb”, TEST==“power/control”, ATTR{power/control}=“on”’ | sudo tee /etc/udev/rules.d/50-usb-power.rules
Update BIOS:Please press delete key as soon as you turn on the PC, so that you can go to BIOS. Please send us a picture of the Main page.
We will check if you need to update the BIOS.
- Overheating/Idle-State Bugs
Monitor Thermals: Use sensors or psensor to log temperatures. Even if stress tests pass, idle-state cooling bugs (e.g., fans stopping prematurely) might cause lockups.
Adjust Fan Curves: Use BIOS or fancontrol (part of lm-sensors) to ensure fans don’t shut off at low temps.
- Storage Firmware/Configuration
Update Drive Firmware: Even with clean SMART data, firmware bugs can cause hangs. Check manufacturer tools (e.g., Samsung Magician, Intel MAS).
Filesystem Check: Run fsck on the root filesystem after a crash to rule out corruption.
- Scheduled Tasks/Background Processes
Audit Cron/Systemd Timers: Check for tasks running overnight:
bash
systemctl list-timers
crontab -l
Disable Non-Essential Services: Temporarily stop services like snapd, apt-daily, or backups to test.
- Advanced Logging
Enable Persistent Journal:
bash
sudo mkdir -p /var/log/journal
sudo systemctl restart systemd-journald
Check Kernel Ring Buffer (post-crash):
bash
dmesg -T | grep -i “error|warn|fail|thermal|temperature”
Configure syslog: Use rsyslog or syslog-ng to mirror logs to a remote server for crash analysis.
- Hardware Isolation
Test Minimal Configuration: Remove non-essential hardware (discrete GPU, extra drives, RAM sticks) to identify conflicts.
Replace Cables/Ports: Faulty SATA/USB cables/ports can cause instability.
Check for Known Issues
Ubuntu 24.04 Bugs: Search Launchpad or forums for similar lockups (e.g., Ask Ubuntu).
Advanced Diagnostics
Kernel Crash Dump: Enable kdump to capture crash context:
bash
sudo apt install linux-crashdump
sudo kdump-config show
Network Monitoring: Use netdata or grafana to track system metrics leading up to crashes.