Conversation
Notices
-
:hacker_f::hacker_s::hacker_e:
It's up again. Didn't reboot the box to try out the new kernel (with the hotplug to disable the bad stick in software) over the weekend because dog, will do it this weekend, or just drive down there, shut it down, and pop the bad stick.
- and † top dog :pedomustdie: like this.
-
Okay, are we alive?
If we stay alive, it is because the shit I guessed was correct. Got I/O errors attempting to use memory hotplug. The memory failure subsystem (/usr/src/linux-*/Documentation/ABI/testing/sysfs-memory-page-offline) worked, but I'm not entirely certain I got the addresses right; often, mcelog and dmesg and /sys have different opinions on what an address is. But no MCEs, no problems, everything seems fine, and the VM hasn't had an aneurysm yet, so I guess it's cool.
I look forward to forgetting what I typed that fixed it by the next time something retarded happens and the machine reboots.
-
Gonna just pop the stick in bank 18 ("B7") over the weekend. Gonna be a weird week for FSE. :bofh:
-
Then it bounces twice. Working right now, will probably boot the machine to try the new kernel anyway if it's gonna be this much of a pain in the ass.
-
Basically when it crashed, it said "Memory failure: 0xfca3a4: already hardware poisoned". I don't know what would make the kernel hand that memory to a process if it's already poisoned.
Anyway, reasonable idea would be to just pop the bad stick instead of trying to hack around it in software.