[轉貼] Setting up software RAID in Ubuntu Server－經驗交流分享與備忘

Updated Mar 13 2009 to reflect improvements in Ubuntu 8.04 and later.

Linux has excellent software-based RAID built into the kernel. Unfortunately information on configuring and maintaining it is sparse. Back in 2003, O’Reilly published Managing RAID on Linux. That book is still mostly up-to-date, but finding clear instructions on the web for setting up RAID has become a chore.

Here is how to install Ubuntu Server with software RAID 1 (disk mirroring). This guide has been tested on Ubuntu Server 8.04 LTS (Hardy Heron). I strongly recommend using Ubuntu Hardy or later if you want to boot from RAID1.

Software RAID vs. hardware RAID

Some system administrators still sneer at the idea of software RAID. Years ago CPUs didn’t have the speed to manage both a busy server and RAID activities. That’s not true any more, especially when all you want to do is mirror a drive with RAID1. Linux software RAID is ideal for mirroring, and due to kernel disk caching and buffering it can actually be faster than RAID1 on lower end RAID controllers. However, for larger requirements like RAID 5, the CPU can still get bogged down with software RAID.

Software RAID is inexpensive to implement: no need for expensive controllers or identical drives. Software RAID works with ordinary EIDE, Serial ATA and SCSI drives and controllers. You can mix together whatever drive types and sizes you have on hand. When all you need are mirrored drives, software RAID is an especially good choice.

However, there are reasons you might prefer hardware RAID over software RAID:

Hardware RAID is simpler to setup and manage.
Depending on the server BIOS, a system using Linux software RAID probably won’t be able to boot automatically when the first disk of a mirror is first replaced with a new blank drive (It can still be booted manually though).
Linux will only boot when the “/boot” and “/” partitions are on RAID1. It cannot boot when those partitions are on RAID5. Other partitions, however, can be RAID5.
With software RAID, after replacing a failed drive the administrator must login and enter commands to add the new drive to the array and re-sync the contents. Good hardware RAID controllers re-sync automatically as soon as they see a new drive, without operator intervention.

Notice I said “good hardware RAID controllers”. Low-end controllers like those integrated with consumer-grade motherboards that require software drivers are not suitable for server use. Cheap motherboard RAID (often called “fake RAID”) is designed for gamers who want RAID 0 to boost disk read times, not for reliability. Server-grade hardware RAID requires controllers from Adaptec, 3ware or another reputable manufacturer.

A simple RAID1 example

For this example we’ll construct a simple RAID1 mirror using a server that has two 4 GB serial ATA drives. Such a configuration will keep running if either drive fails, but (obviously) not if both fail.

EIDE or SCSI drives can be used with Linux RAID, but right now serial ATA provides the best combination of low cost, performance and flexibility.

For this example, partitioning will be done as simply as possible:

Drive	Partition	Type	Mounted on	Size
Drive0	`/dev/sda1`	Primary	/	4.1 GB
Drive0	`/dev/sda2`	Primary	(swap area)	(remainder of disk)
Drive1	`/dev/sdb1`	Primary	/	4.1 GB
Drive1	`/dev/sdb2`	Primary	(swap area)	(remainder of disk)

In Linux software RAID each mount point is usually configured as a separate RAID device. It’s possible for entire drives to be RAID members rather than each partition (e.g. combine /dev/sda and /dev/sdb) but the resulting device will not be bootable.

In this example partitions sda1 and sdb1 will be made members of a RAID1 device named /dev/md0. Partitions sda2 and sdb2 will be members of a RAID1 device named /dev/md1.

RAID device	Type	Mounted on	Size	Members
`/dev/md0`	RAID1 mirror	/	4.1 GB	`/dev/sda1`
`/dev/md0`	RAID1 mirror	/	4.1 GB	`/dev/sdb1`
`/dev/md1`	RAID1 mirror	(swap)	(remainder of disk)	`/dev/sda2`
`/dev/md1`	RAID1 mirror	(swap)	(remainder of disk)	`/dev/sdb2`

On a real world server it’s a good idea to have at least /var and /home on their own partitions, but the above scheme is good enough for this example. We are also purposely avoiding complications like logical volume management (LVM), just to keep things simple.

In Linux RAID, corresponding partitions on each drive in a RAID device should be the same size. If they aren’t, software RAID will still work but each RAID device will only be as large as the smallest member partition (e.g. if you add a 10GB partition and a 20GB partition into a RAID1 array, the resulting array will only have 10GB of usable space).

Installing Ubuntu server with RAID1

To install a fresh Ubuntu System with RAID, boot from the CD-ROM as usual. Follow the prompts until you get at the “Partition Disks” dialog.

From the “Partitions Disks” dialog box, select “Manually edit the partition table”.
Select the first disk (“sda”)
Say yes to “Create a new empty partition table on this device?”
Use the dialog boxes to create a new primary partition large enough to hold the root filesystem but leave space for a swap partition (4.1 GB in this example).
For “How to use this partition” select “physical volume for RAID“, not the default “Ext3 journaling file system”
Make the partition bootable (Bootable flag “on”)
Use the dialogs to create one other primary partition taking up the remaining disk space (197.4 MB in this example). Later this will be used for swap.
For “How to use this partition” select “physical volume for RAID“, not the default “Ext3 journaling file system” and not “swap area”
Repeat the above steps to create identical partitions on the second drive. Remember to mark partition one on both drives as “bootable”. The final result should look similar to the following:

(click for full size)

Once the partitions are configured, at the top of the “Partition Disks” main dialog select “Configure Software RAID”
When asked “Write the changes to the storage devices and configure RAID” select “Yes”.
For “Multidisk configuration actions” select “Create MD device”
For “Multidisk device type” select “RAID1″
For “Number of active devices for the RAID1 array” enter “2″
For Number of spare devices for the RAID1 array” enter “0″ (zero)
When asked to select “Active devices for the RAID1 multidisk device” select both /dev/sda1 and /dev/sdb1
From the next dialog select “create MD device”
Repeat the above steps to create an MD device that contains /dev/sda2 and /dev/sdb2
Finally, from the dialog “Multidisk configuration actions” select “Finish”

Next configure RAID #0 (md0) to be mounted as the “/” filesystem and RAID device #1 (md1) to be mounted as swap:

From the “Partition Disks” dialog, move the cursor bar under “RAID device #0″ and select “#1 4.1 GB”
Configure the device as an Ext3 filesystem mounted on /, as shown:

(click image for full size)

From the Partition Disks dialog under “RAID device #1″ select “#1 197.3 MB”
Configure the device as “swap area”, as shown:

(click image for full size)

The final partitioning screen should resemble the following:

(click image for full size)

Select “Finish partitioning and write changes to disk”.The RAID1 mirrors are created and made active, the filesystem is formatted and installation of Ubuntu proceeds as usual.
Allow the installation to complete then reboot when requested.

Booting with a failed drive

The GRUB bootloader has always made it tricky to boot from a RAID array when one of the drives has failed. Fortunately, the Ubuntu team improved the situation in Ubuntu 8.10 Intrepid Ibex and backported the changes to Ubuntu Server 8.04 Hardy Heron (See Bug 290885 for the whole saga).

Now administrators can choose how the server will behave when a drive in a RAID array has failed:

Wait at a boot prompt for manual intervention (the default), or
Automatically boot from the other drive

Most administrators will want an automatic boot. After all, the purpose of RAID is to increase server availability. However, with some hardware failures automatically booting could wipe out data on the remaining drive. If you have good backup and recovery procedures, that risk is probably acceptable, but it is your decision as administrator.

To make Ubuntu Server automatically boot when one drive in a RAID array has failed do the following:

From a running server, do a package update to make sure you have the latest kernel and boot loader (e.g. sudo apt-get update; apt-get upgrade).
Reboot the server to ensure any new kernel and bootloader packages are in place.
From the command line run “sudo grub-install /dev/md0” to ensure GRUB is installed on all members of the boot RAID device.
From the command line run “sudo dpkg-reconfigure mdadm“
When asked “Should mdadm run monthly redundancy checks of the RAID arrays?”, select either Yes or No (read the warning about possible performance impact and decide. “Yes” is the safer choice)
When asked “Do you want to start the md monitoring daemon?” select Yes.
Enter a valid email address to send warning messages to.
When asked “Do you want to boot your system if your RAID becomes degraded?” select Yes.

Now when the system boots and either of the drives has failed, the system will seem to hang at the “Loading, please wait…” stage for approximately five minutes, then proceed to boot normally.

Some friendly advice

RAID systems that boot and continue to function with failed members are great for continuity, but we often see administrators either not notice that drives have failed, or wait to long to replace them.

Suddenly the last drive also dies and they face a long downtime while the system is rebuilt or restored. It may seem obvious, but when dealing with RAID:

Make sure your system properly alerts you when a drive fails. Don’t just rely on the MD monitoring daemon to send email alerts: also run smartmontools to monitor physical health and setup a script in your server monitor (e.g. monit or Nagios), run a script from cron that parses /proc/mdstat, or whatever method works best in your environment.
Don’t wait to repair or replace the failed drive. When a drive fails, act immediately. Drives have an eerie tendency to fail at nearly the same time, especially when they are identical models made during the same production run (which is likely if all were purchased at the same time from the same vendor).

Why RAID swap?

You might be wondering why we put swap on a RAID device, causing system swap activity to suffer the additional overhead of RAID.

Though Linux is capable of handling multiple independent swap partitions on multiple drives, if a drive containing an active swap partition dies it may take the system down with it. That defeats the point of having RAID in the first place, so to avoid that possibility we put the swap in RAID.

This creates more overhead, but swap is only meant as temporary substitute memory during rare moments of memory exhaustion. If the system is regularly using swap, performance is already being severely impacted and it’s time to add more physical memory.

Care and feeding

Having two drives configured in a RAID1 mirror allows the server to continue to function when either drive fails. When a drive fails completely, the kernel RAID driver automatically removes it from the array.

However, a drive may start having seek errors without failing completely. In that situation the RAID driver may not remove it from service and performance will degrade. Luckily you can manually remove a failing drive using the “mdadm” command. For example, to manually mark both of the RAID devices on drive sda as failed:

mdadm /dev/md0 --fail /dev/sda1 mdadm /dev/md1 --fail /dev/sda2

The above removes both RAID devices on drive sda from service, leaving only the partitions on drive sdb active.

Removing a failed drive

When Ubuntu sees that RAID has been configured, it automatically runs the mdadm command in “monitor mode” to watch each device and send email to root when a problem is noticed. You can also manually inspect RAID status using commands like the following:

cat /proc/mdstat mdadm --query --detail /dev/md0 mdadm --query --detail /dev/md1

As mentioned above, it’s wise to use the smartmontools package to monitor each drive’s internal failure stats. However, as noted in a 2007 analysis by Google (PDF link), drives are perfectly capable to dying without any warning showing in their SMART indicators.

To replace a drive that has been marked as failed (either automatically or by using “mdadm --fail“), first remove all partitions on that drive from the array. For example to remove all partitions from drive sda:

mdadm /dev/md0 --remove /dev/sda1 mdadm /dev/md1 --remove /dev/sda2

Once removed it is safe to power down the server and replace the failed drive.

Boot problems

If it was the first drive that failed, after replacing it with a new unformatted drive the system may no longer boot: some BIOSs only attempt to boot from the lowest numbered hard drive (e.g. sda or hda) and if it is blank the system will hang. In that case you’ll need a rescue CD capable of running a GRUB boot prompt so you can manually boot from the second physical drive.

There are many free Linux-based rescue CDs available (e.g. SystemRescueCD) but for quick access to GRUB try the Super Grub Disk. This small download can be written to bootable floppy or CDROM and give quick access to system boot tools, especially the GRUB command line.

Whatever rescue tool you use, use it to boot to a GRUB command prompt and force the system to boot from the second installed hard drive using commands similar to the following:

root (hd1,0) kernel /boot/vmlinuz-whatever root=/dev/md0 ro initrd /boot/initrd.img-whatever boot

To find the correct file names for the “kernel” and “initrd” parameters, GRUB has bash-style command-line completion… type just enough of the path then press TAB to auto-complete or see a list of available choices.

Preparing the new drive

Once system as been rebooted with the new unformatted replacement drive in place, some manual intervention is required to partition the drive and add it to the RAID array.

The new drive must have an identical (or nearly identical) partition table to the other. You can use fdisk to manually create a partition table on the new drive identical to the table of the other, or if both drives are identical you can use the sfdisk command to duplicate the partition. For example, to copy the partition table from the second drive “sdb” onto the first drive “sda”, the sfdisk command is as follows:

sfdisk -d /dev/sdb | sfdisk /dev/sda

Warning: be careful to specify the right source and destinations drives when using sfdisk or your could blank out the partition table on your good drive.

Once the partitions have been created, you can add them to the corresponding RAID devices using “mdadm --add” commands. For example:

mdadm --add /dev/md0 /dev/sda1 mdadm --add /dev/md1 /dev/sda2

Once added, the Linux kernel immediately starts re-syncing contents of the arrays onto the new drive. You can monitor progress via “cat /proc/mdstat“. Syncing uses idle CPU cycles to avoid overloading a production system, so performance should not be affected too badly. The busier the server (and larger the partitions), the longer the re-sync will take.

You don’t have to wait until all partitions are re-synchronized… servers can be on-line and in production while syncing is in progress: no data will be lost and eventually all drives will become synchronized.

Summary

Linux software RAID is far more cost effective and flexible than hardware RAID, though it is more complex and requires manual intervention when replacing drives. In most situations, software RAID performance is as good (and often better) than an equivalent hardware RAID solution, all at a lower cost and with greater flexibility. When all you need are mirrored drives, software RAID is often the best choice.

More information on Linux RAID:

Managing RAID on Linux (O’Reilly Media, 2003)
Software RAID HOWTO (Linux Documentation project)
Linux: Why software RAID? (Jeff Garzik)

Related posts:

Posted in Linux, Safeguarding data |
Tags: Linux, RAID, software RAID, system administration, ubuntu

Frank

經驗交流分享與備忘

Frank 發表在痞客邦留言(0) 人氣()

E-mail轉寄

經驗交流分享與備忘

感情最痛恨 劈腿. 背叛 與 欺騙

部落格文章訂閱