Sunday, March 14, 2010

Migrating Linux software raid from 4 device raid5 to 6 device raid6

In a previous post, I discussed migrating from 2xIDE device mirror to a 2xSATA device mirror.  Since the old arrays were using 160GB and I bought 500GB drives I figured I'd use the space left over to add a couple more devices to my storage array.

Here's what I'm starting with:
# cat /proc/mdstat
md0 : active raid5 sdd1[1] sdb1[3] sdc1[2] sda1[0]
      937705728 blocks level 5, 128k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/150 pages [0KB], 1024KB chunk

It's a 4x320GB raid5 array.  I'm going to expand the array to include 2 more devices (sde4 and sdf4) and reshape it to a raid6 array at the same time.  I will end up gaining 1 device's worth of space (320GB) and 1 more drive of redundancy.  With raid6, the array will be able to survive 2 failures and still function instead of the 1 failure a raid5 array can survive.

Unfortunately, the current, stable hardened kernel is 2.6.28-r9 and to reshape a raid5 to raid6 requires at least a 2.6.31 kernel.  Additionally mdadm >=3.1.0 is required and 3.0 is currently stable.  The second is reasonably easy to fix:
# echo "=sys-fs/mdadm-3.1.1-r1" >> /etc/portage/package.keywords 
# emerge -av mdadm

For the kernel, I installed layman and added the hardened-development overlay (not covered here) and unmasked the minimum required kernel:
# echo "=sys-kernel/hardened-sources-2.6.31-r11" >> /etc/portage/package.keywords
# emerge -av hardened-sources

I'm also not going to cover configuring/building/installing/booting to the new kernel.  If you're using Gentoo, you should know what you're doing already in that respect.

After all the prerequisites are taken care of (I created the partitions I'm using here during the previous array muddling in the previous blog post) we can move forward.

Add the 2 new devices to the array.
# mdadm /dev/md0 --add /dev/sde4 /dev/sdf4

At this point the new devices will be acting as "spares" as shown below (the (S) next to the device):
# cat /proc/mdstat
md0 : active raid5 sdf4[4](S) sde4[5](S) sdd1[1] sdb1[3] sdc1[2] sda1[0]
      937705728 blocks level 5, 128k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 1/150 pages [4KB], 1024KB chunk

Turn off the write-intent bitmap on the array temporarily.  This is necessary for the reshape to occur.  I originally was getting an error and Neil Brown (mdadm author http://neil.brown.name) told me I needed to remove the bitmap while reshaping:
# mdadm --grow /dev/md0 --bitmap none

To speed up the sync process we're about to cause, issue the following:
# echo 200000 > /proc/sys/dev/raid/speed_limit_max
# echo 200000 > /proc/sys/dev/raid/speed_limit_min

Start the reshape:
# mdadm --grow /dev/md0 --level=6 --raid-devices=6 --backup-file=/root/raid-backup 
mdadm level of /dev/md0 changed to raid6 
mdadm: Need to backup 1536K of critical section..

Watch the *extremely* slow reshape (you can literally watch it with watch -n 1 cat /proc/mdstat):
# cat /proc/mdstat
md0 : active raid6 sda1[4] sdf4[0] sde4[5] sdb1[3] sdd1[1] sdc1[2]
      937705728 blocks super 0.91 level 6, 128k chunk, algorithm 18 [6/7] [UUUUUU]
      [====>................]  reshape = 22.6% (70662528/312568576) finish=286.1min speed=14088K/sec 


At this point, mdadm --detail output still shows my array as being the old size:
Array Size : 937705728 (894.27 GiB 960.21 GB)
Used Dev Size : 312568576 (298.09 GiB 320.07 GB) 


I was curious about this as I should have gained 320GB.  My device size is 320GB, raid5 capacity is n-1 devices: 320x3 = 960GB.  After the reshape it will be n-2 devices: 320x4=1280GB. So I ran a test with some loopback devices and the size of the array will be correct when the reshape is completed.

After the reshape, turn the write intent bitmap back on:
# mdadm --grow /dev/md0 --bitmap internal

As you can see, the array now has the proper 320x4 size (and the superblock version went back 0.90):
# mdadm -D /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Thu Sep  7 18:41:05 2006
     Raid Level : raid6
     Array Size : 1250274304 (1192.35 GiB 1280.28 GB)
  Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 0
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sun Mar 14 11:40:59 2010
          State : active
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

           UUID : fdc29307:ba90c91c:d9adde8d:723321bc
         Events : 0.692377

    Number   Major   Minor   RaidDevice State
       0       8       84        0      active sync   /dev/sdf4
       1       8       49        1      active sync   /dev/sdd1
       2       8       33        2      active sync   /dev/sdc1
       3       8       17        3      active sync   /dev/sdb1
       4       8        1        4      active sync   /dev/sda1
       5       8       68        5      active sync   /dev/sde4 


Since I use LVM to chop up this array, I just need to grow my pv to make LVM aware of the new, larger, size of the underlying raid array:
# pvresize /dev/md0

pvdisplay now shows the full size:
  PV Size               1.16 TiB / not usable 2.81 MiB

Similarly, vgdisplay shows the extra space available for allocation:
  Free  PE / Size       86234 / 336.85 GiB

And that's about it.  Big thanks to Neil for the tip on the write-intent bitmap.  The combination of Linux kernel raid and mdadm let's you do some pretty amazing things.  I was able to do both the raid1 migrations and this raid5 -> raid6 extend/reshape while the system was up and running with live filesystems.  That's pretty impressive.

No comments: