Dealing with bad memory ranges on Linux.
My desktop machine has been randomly locking up for the last couple weeks. I started troubleshooting by refreshing my coolant.
This did not seem to help at all. I then installed memtest86 and tested my memory. I found a failure at a specific address. The failure occurred whether the CPU was overclocked or not.
I started to look for a way to skip this memory range, and I stumbled upon BadRAM. Unfortunately, I could not get this patch to compile on my system (AMD64). I tried various patches, and various kernel versions, always receiving the following error.
mm/page_alloc.c: In function 'badram_markpages': mm/page_alloc.c:3982: error: 'mem_map' undeclared (first use in this function) mm/page_alloc.c:3982: error: (Each undeclared identifier is reported only once mm/page_alloc.c:3982: error: for each function it appears in.) mm/page_alloc.c: In function 'badram_setup': mm/page_alloc.c:4008: error: 'mem_map' undeclared (first use in this function) make: *** [mm/page_alloc.o] Error 1 make: *** [mm] Error 2
Finally I found out that there is already a memmap kernel parameter which is capable of masking out holes (broken ranges) of memory. I really don’t want to trash a 4Gb DIMM over a single bad byte of memory.
At this point I am not sure how to tell if it is active or not. I guess I will wait to see if my system continues to lock up.