Saturday, April 23, 2016

Simplifying hard drive layout

Whew, it's been a while since I've posted here...Gotta love Gentoo, I'm still running my original install from sometime back in 2006 (my oldest raid superblock says the array was created "Thu Sep  7 18:41:05 2006").  The Gentoo install is probably older than that because I know I didn't start out using kernel raid at all.  So I've gone from a single drive, to a 2 drive mirror, to adding a 4 disk raid5 to swapping out the original mirror drives to bigger ones and creating a couple Frankenstein arrays of those 2 disks being partly mirrored and partly being added to the original 4 disk raid5 array and converted to a raid6 array which gave me double redundancy and one more 320GB slice of capacity.  It is this mess that is the topic of today's post.  I realized I don't have a need for as much storage, the drives in my system are in some cases near 10 years old and way out of warranty and I wanted a simpler setup that would use less electricity.   So I did some research and bought 2 new 1TB drives and we're going to migrate everything onto a simple mirror of those 2 drives.

Sidebar: I was originally looking into getting NAS drives but after a bit of research I decided on WD Black drives.  For a raid5/6 array you want NAS drives but for raid 0/1 you don't want NAS drives that support TLER.

Important Note: Before we do anything we make sure we have backups of our important data right?  And we all know that raid isn't a backup, right?  Moving on...

OK, so to recap, this is the current situation:

erma ~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Thu Sep  7 18:41:05 2006
     Raid Level : raid6
     Array Size : 1250274304 (1192.35 GiB 1280.28 GB)
  Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
   Raid Devices : 6
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Fri Apr 22 16:05:07 2016
          State : clean, degraded
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

           UUID : fdc29307:ba90c91c:d9adde8d:723321bc
         Events : 0.2437894

    Number   Major   Minor   RaidDevice State
       0       8       68        0      active sync   /dev/sde4
       2       0        0        2      removed
       2       8       17        2      active sync   /dev/sdb1
       3       8       33        3      active sync   /dev/sdc1
       4       8        1        4      active sync   /dev/sda1
       5       8       84        5      active sync   /dev/sdf4
erma ~ # mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Sat Nov 23 10:12:58 2013
     Raid Level : raid1
     Array Size : 39040 (38.13 MiB 39.98 MB)
  Used Dev Size : 39040 (38.13 MiB 39.98 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Fri Apr 22 15:37:36 2016
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 3979d926:51bc94d9:cb201669:f728008a
         Events : 0.361

    Number   Major   Minor   RaidDevice State
       0       8       65        0      active sync   /dev/sde1
       1       8       81        1      active sync   /dev/sdf1
erma ~ # mdadm --detail /dev/md3
/dev/md3:
        Version : 0.90
  Creation Time : Sun Mar 18 17:18:46 2007
     Raid Level : raid1
     Array Size : 155244032 (148.05 GiB 158.97 GB)
  Used Dev Size : 155244032 (148.05 GiB 158.97 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 3
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Fri Apr 22 16:14:53 2016
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 950ef5a7:b7b41171:b9e86deb:7164bf0e
         Events : 0.2168580

    Number   Major   Minor   RaidDevice State
       0       8       67        0      active sync   /dev/sde3
       1       8       83        1      active sync   /dev/sdf3


So we have my bulk data array (raid6) and my boot and root arrays (raid1) respectively.  I am going to migrate all of this onto a new raid1 array.  You'll notice md0 is already running degraded as I have previously shut the system down, removed one of the 320GB drives and replaced it with a new single 1TB drive (I don't have any free SATA ports so I needed to substitute an old for a new drive, I removed a drive that is only in the raid 6 array since that has double redundancy meaning I could still lose any other drive in the system and still be ok).

First, I need to find the new 1TB drive's name:  fdisk -l is our friend here:

Disk /dev/sdd: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes



This looks promising, it's about the right size and there's no partition table, let's check some more info just to be sure:
erma ~ # hdparm -i /dev/sdd

/dev/sdd:

 Model=WDC WD1003FZEX-00MK2A0, FwRev=01.01A01, SerialNo=WD-WCC3F7KJUCTH
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: Reserved:  ATA/ATAPI-1,2,3,4,5,6,7

 * signifies the current active mode
 

 Looks good.  Every other existing drive was a Seagate.  

After some math and some trial and error I arrived at the following partition layout:
Device     Boot      Start        End    Sectors   Size Id Type
/dev/sdd1             2048     206847     204800   100M fd Linux raid autodetect
/dev/sdd2           206848 1947412848 1947206001 928.5G fd Linux raid autodetect
/dev/sdd3       1947414528 1953525167    6110640   2.9G 82 Linux swap / Solaris


Let's create the 2 new arrays.
erma ~ # mdadm --create /dev/md10 --level=1 --metadata=0.90 --raid-devices=2 missing /dev/sdd1
mdadm: array /dev/md10 started.
erma ~ # mdadm --create /dev/md11 --level=1 --metadata=0.90 --raid-devices=2 missing /dev/sdd2
mdadm: array /dev/md11 started.



Format /dev/md10 which will become /boot as ext2:

erma ~ # mkfs.ext2 /dev/md10

Create a single LVM PV out of the main array on the disks which is /dev/md11:
erma ~ # pvcreate /dev/md11

My old root filesystem wasn't LVM but the new one will be.  I also had several other mounts carved out from the "bulk" raid6 array.  I'll create a new LV for root but I can utilize the LVM tools to migrate the data from the existing drives to the new.

Add the new PV to the existing VG:
erma ~ # vgextend vg /dev/md11
  Volume group "vg" successfully extended
erma ~ # pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/md0   vg   lvm2 a--    1.16t 245.85g
  /dev/md11  vg   lvm2 a--  928.50g 928.50g


Tell LVM to migrate LVs from old PV to new PV (This is going to take a while...):
erma ~ # pvmove --atomic /dev/md0 /dev/md11

Now all the LVM data has been moved off of the /dev/md0 array and I can turn remove that from VG and delete the array.
erma ~ # vgreduce vg /dev/md0
  Removed "/dev/md0" from volume group "vg"

erma ~ # pvremove /dev/md0
  Labels on physical volume "/dev/md0" successfully wiped
erma ~ # mdadm --stop /dev/md0
mdadm: stopped /dev/md0
erma ~ # mdadm --remove /dev/md0

Now I just need to copy the old boot and root arrays which weren't LVM to the new disk.  The new boot partition array is still not LVM but the root partition (which I'm about to create) will now be a LV so we'll just rsync the data over.
lvcreate -L 150G -n root vg /dev/md11
mkfs.xfs /dev/vg/root

mkdir /mnt/root
mount /dev/vg/root /mnt/root
rsync -avxHAXW --info=progress2 / /mnt/root/
Now, the system is still running so it's going to be copying over some files that will be changing while/after this is going on.  I'll need to shut down the system and boot off a livecd and re-run the same rsync command after mounting the filesystems to let it get all the things that changed/it missed since the first run, but doing it this way minimizes down-time.

At this point I am going to shut down the server and remove the remaining 3 320GB drives and add the second new 1TB drive.  After booting up some drive letters will change (But everything will be fine and mount without issues cause you use UUIDs in your fstab instead of device names right?) so the first new drive (what was /dev/sdd before) is now /dev/sdb and the second (just added) new drive is /dev/sda.

Let's copy the partition layout from sdb to the newly added sda:
erma ~ # sfdisk -d /dev/sdb | sfdisk /dev/sda

Add the 2 missing partitions to the boot and LVM arrays, they'll immediately start syncing drives to get the arrays at 100%.  I'll issue a couple commands to increase the sync speed since, by default, it doesn't go as fast as it can in order to not put a large drag on your system.  I'm not really doing anything else important and want the sync to happen ASAP.
erma ~ # mdadm --add /dev/md10 /dev/sda1
mdadm: added /dev/sda1
erma ~ # mdadm --add /dev/md11 /dev/sda2
mdadm: added /dev/sda2

erma ~ # echo 200000 > /proc/sys/dev/raid/speed_limit_max
erma ~ # echo 200000 > /proc/sys/dev/raid/speed_limit_min

erma ~ # cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md10 : active raid1 sda1[0] sdb1[1]
      102336 blocks [2/2] [UU]

md11 : active raid1 sda2[2] sdb2[1]
      973602880 blocks [2/1] [_U]
      [>....................]  recovery =  1.5% (15322176/973602880) finish=97.3min speed=164128K/sec
      bitmap: 6/8 pages [24KB], 65536KB chunk


Now go watch a movie until it's done syncing...According to iostat it's getting 150-170MB/sec transfer speed.

In the meantime, I'm going to do some more housekeeping.  I'll activate the new swap partition, turn off the existing 2 swap partitions, format the second new swap partition and update the fstab (using UUIDs) so everything will be automatic when the system boots up.
erma ~ # swapon /dev/sdb3
erma ~ # swapoff /dev/sdd2
erma ~ # swapoff /dev/sdc2
erma ~ # mkswap /dev/sda3
Setting up swapspace version 1, size = 2.9 GiB (3128643584 bytes)
no label, UUID=570c612c-3209-4a34-89da-0b4e72357258
erma ~ # swapon /dev/sda3
erma ~ # cat /proc/swaps
Filename                                Type            Size    Used    Priority
/dev/sda3                               partition       3055316 0       -1
/dev/sdb3                               partition       3055316 0       1


Install GRUB on the new drives:
grub> device (hd0) /dev/sda
grub> device (hd1) /dev/sdb
grub> root (hd0,0)
grub> setup (hd0)
grub> root (hd1,0)
grub> setup (hd1)
grub> quit


Up next is booting the system to a live cd, rsyncing the root filesystem one last time, changing the fstab and grub.conf to point to the new arrays and hope she boots up.

Well, she didn't, at least not without a little more work.  I had overlooked that my current initramfs wasn't set up to handle kernel raid so it wasn't assembling the arrays properly which meant no logical volumes were found and no root filesystem.  A few searches later I found I needed to run generkernel with --lvm and --mdadm flags.  I needed chroot into the system using the normal Gentoo install process and then I was able to run genkernel with the proper flags, add domdadm to my kernel line in grub.conf and after that everything worked fine.  I spent some more time cleaning up the old arrays, removing the last of the 6 old hard drives and getting the device names for the raid arrays to md0 and md1 which is nice and simple how I wanted it.  The nice thing about Gentoo is that using it over a long period of time makes you learn things that help you fix problems a lot quicker than if you used an "easy" distribution.

 So I'm finally left with:
erma ~ # cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sda1[0] sdb1[1]
      102336 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
      973602880 blocks [2/2] [UU]
      bitmap: 1/8 pages [4KB], 65536KB chunk

erma ~ # pvs
  PV         VG   Fmt  Attr PSize   PFree
  /dev/md1   vg   lvm2 a--  928.50g 258.00g
 

I'm actively using just under 500GB of the array.  Some guys at work were questioning why in the age of 8TB drives I purchased 1TB drives.  I had about 1.5TB of storage before but I had grown certain LVs over time and wasn't needing that much space and I also deleted a bunch of stuff before migrating the LVs over to the new PV so 1TB of total space (or 928.5G) is more than enough.  I still have the 2 500GB Seagates and can create another mirrored array if the need arises but I don't think I'll need it.  

That's all for now.