Glen Pitt-Pladdy :: BlogLinux RAID (mdadm) Rewriting Bad Blocks | |||
On my Home Lab storage server I have 15TB of storage in RAID6. With large drives there appears to be increased risk (more space = more opportunity) of developing a bad spot as they wear. That's exactly what has happened on one drive. RAID6 gives N+2 redundancy so this is not a major problem but smartmontools warns me about the pending sectors when the machine boots up and it would be good to force them to reallocate and stop this. Drives are built with spare capacity so that wearing areas on the drive can be reallocated to the spare area. This typically happens on a write when the data on the bad spit can't be read. The simple way of doing this would be to completely rebuild the array but that takes a very long time so I worked out a trick. BitmapsA useful feature with Linux RAID is the ability to have a bitmap which tracks writes so that when the array needs to be rebuilt it knows specifically what blocks to resync. This saves enormous time. You can see if there are bitmaps with: # cat /proc/mdstat Which will give the bitmap status like 1/23 which essentially is the number of pages which have bits set. It also gives the bitmap chunk size which we will use later. If the array doesn't have a bitmap then it's easily added with: # mdadm --grow /dev/mdX --bitmap=internal After this the bitmap allocation should fall to hopefully 0 (or nearly ongoing activity). This enables devices to be removed and re-added to the array and rebuilds to only have to hit the blocks where there are outstanding writes. There are other things that you can do like optimising the bitmap chunk size, but this is really beyond what I'm looking at here. Bad BlocksI have identified some bad blocks on the wearing device. There are various ways of doing this including looking in logs, scanning with badblocks, running a long SMART test and checking the error position, and even just looking at the SMART logs. Next we need to do some maths. We need to work out which bitmap chunk (hence bitmap number) the bad blocks fall in. This is basically dividing the position on the device by chunk size and rounding down. In my case I've identified sector (512 byte) 4127647976 is a bad spot and have the default 65536k (64M) chunks. This turns into: 4127647976 / 2 = 2063823988k 2063823988 / 65536 = 31491.45.... Rounded down this is chunk 31491 This is the chunk number that we will need to use in the bitmap to force the array to re-write this area on our device. It's important to keep in mind that this number has to be calculated based on the actual device added to the array. If this is a partition then you need to ensure it's the position of the bad block within that partition that you are using. RewritingEssentially the process I'm following is:
To fail and remove the device from the array I use: # mdadm --fail /dev/mdX /dev/sdYZ Now we need to manually set the bitmap with: # echo 31491 >/sys/devices/virtual/block/mdX/md/bitmap_set_bits If you want to do a range then it's easy enough to do a for loop: # for (( i=31400; $i<31600; i=$(($i+1)) )); do echo $i >/sys/devices/virtual/block/mdX/md/bitmap_set_bits; done Then just add the device back: # mdadm --re-add /dev/mdX /dev/sdYZ Then wait for the rebuild to complete: # cat /proc/mdstat If you check there will hopefully now be fewer Pending Sectors and/or Offline Uncorrectable sectors, and possibly more Reallocated Sectors. In my case I have a few groups of sectors so need to hit a few areas with this method. Bad sectors have gone from 1168 to 24 after doing this. The remaining sectors are probably in another location that I haven't discovered yet. |
|||
This is a bunch of random thoughts, ideas and other nonsense, and is not intended to be taken seriously. I'm experimenting and mostly have no idea what I am doing with most of this so it should be taken with cuation and at your own risk. Intrustive technologies are minimised where possible. For the purposes of reducing abuse and other risks hCaptcha is used and has it's own policies linked from the widget.
Copyright Glen Pitt-Pladdy 2008-2023
|