= Setting up an LSI (MegaRAID) RAID controller card = This documents how to set up a RAID using an LSI RAID controller card. The first step is to install the software. There are two parts, the GUI and the StorCLI command line tool. At the time of writing the GUI software is available at : {{{ https://docs.broadcom.com/docs/17.05.00.02_Linux-64_MSM.gz }}} This can be installed (as root) like so : {{{ # tar xfz 17.05.00.02_Linux-64_MSM.gz # cd disk ./install.csh -s }}} Note that /bin/csh needs to be installed, to do the above steps, so you may have to install tcsh to get that. The StorCLI utility is available at : {{{ https://docs.broadcom.com/docs/1.21.16_StorCLI.zip }}} It is installed on CentOS as follows : {{{ # unzip 1.21.16_StorCLI.zip # unzip versionChangeSet/univ_viva_cli_rel/storcli_All_OS.zip # rpm -i storcli_All_OS/Linux/storcli-1.21.06-1.noarch.rpm }}} After the software is installed, you should be able to see the disks attached to controller zero (/c0) with this command (which can also be used as a general status check once the RAID is created as detailed here) : {{{ # /opt/MegaRAID/storcli/storcli64 /c0 show Generating detailed summary of the adapter, it may take a while to complete. . . . ----------------------------------------------------------------------------- DG Arr Row EID:Slot DID Type State BT Size PDC PI SED DS3 FSpace TR ----------------------------------------------------------------------------- 0 0 0 41:0 46 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 1 41:1 52 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 2 41:2 53 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 3 41:3 47 DRIVE Onln N 7.276 TB dflt N N dflt - N 0 0 4 41:4 49 DRIVE Onln N 7.276 TB dflt N N dflt - N }}} The number 41 is the "enclosure number". This is used when assembling the RAID, or "virtual drive", which is done like so : {{{ [root@netdrms02 ~]# /opt/MegaRAID/storcli/storcli64 /c0 add vd r6 \ name=SUMS drives=41:0-17 strip=256 Spares=41:18-19 }}} NOTE that there is an important "gotcha" : The arguments above are ORDER DEPENDENT. To hammer this home : the same command with the "strip" and "drives" entries reversed : {{{ [root@netdrms02 ~]# /opt/MegaRAID/storcli/storcli64 /c0 add vd r6 \ name=SUMS strip=256 drives=41:0-17 Spares=41:18-19 }}} will NOT work, and will blather about not recognizing tokens. This is a hole that is hard to get out of, since the error was pretty nonsensical. In the above command, "r6" means RAID 6. The online help is available through {{{ /opt/MegaRAID/storcli/storcli64 /c0 add vd help }}} In the above command, there are 20 physical disks in the JBOD, we're using disks 0 to 17 in the raid and setting disks 18 to 19 as spares. We also found there was no need to initialize the disk, ie. to tell the OS about it with a separate command, like : {{{ $ /opt/MegaRAID/storcli/storcli64 /c0/v0 start init }}} Again, that command fails with an error that is not intuitive. So time was lost on that. But dmesg showed that the OS already knew about the disk. dmesg showed the new device added as /dev/sdb (the key word below is "Attached") : {{{ [110076.339285] scsi 12:2:0:0: Direct-Access LSI MR9286CV-8e 3.27 PQ: 0 ANSI: 5 [110076.359214] sd 12:2:0:0: [sdb] 250031898624 512-byte logical blocks: (128 TB/116 TiB) [110076.359220] sd 12:2:0:0: [sdb] 4096-byte physical blocks [110076.359311] sd 12:2:0:0: [sdb] Write Protect is off [110076.359315] sd 12:2:0:0: [sdb] Mode Sense: 1f 00 00 08 [110076.359320] sd 12:2:0:0: Attached scsi generic sg5 type 0 [110076.359364] sd 12:2:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [110076.377336] sd 12:2:0:0: [sdb] Attached SCSI disk }}} OK, spiffy, but we need to use 'parted' to write a gpt partition table for disk of this size. 'fdisk' will not work since it, being a kinds sorta DOS based thing has size limits (another hole to avoid). So we need to do something like this (for one big ol' partition) : {{{ # parted /dev/sdb (parted) mklabel gpt (parted) mkpart primary xfs 0% 100% (parted) quit }}} OK, now we have /dev/sdb1, so we create an xfs filesystem on it : {{{ # mkfs -t xfs /dev/sdb1 }}} Mount it to test : {{{ # mkdir /SUM01; mount -t xfs /dev/sdb1 /SUM01 }}} And it shows up in 'df', which is great : {{{ # df -lh Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G 8.3G 42G 17% / devtmpfs 63G 0 63G 0% /dev tmpfs 63G 0 63G 0% /dev/shm tmpfs 63G 18M 63G 1% /run tmpfs 63G 0 63G 0% /sys/fs/cgroup /dev/sda1 1014M 236M 779M 24% /boot /dev/mapper/centos-home 169G 126M 169G 1% /home tmpfs 13G 0 13G 0% /run/user/1000 tmpfs 13G 12K 13G 1% /run/user/42 tmpfs 13G 8.0K 13G 1% /run/user/1001 /dev/sdb1 117T 38M 117T 1% /SUM01 }}} Un-mount it : {{{ $ umount /SUM01 }}} To get this mount to happen automatically, you have to put something like this in /etc/fstab : {{{ UUID=2b598aba-0b60-4966-a443-90c9ca730974 /SUM01 xfs defaults 1 1 }}} To get the UUID : {{{ # blkid /dev/sdb1 /dev/sdb1: UUID="2b598aba-0b60-4966-a443-90c9ca730974" TYPE="xfs" PARTLABEL="primary" PARTUUID="5dd86180-2420-4020-ac17-6623a2f6db56" }}} And then it will mount if you ask it to mount everything in /etc/fstab : {{{ # mount -a }}} == Other handy things == {{{ # Show controller 0 /opt/MegaRAID/storcli/storcli64 /c0 show # Show the enclosure (enclosure ID is 37) /opt/MegaRAID/storcli/storcli64 /c0/e37 show # Show the details on the drive in slot 15 /opt/MegaRAID/storcli/storcli64 /c0/e37/s15 show all # Force a drive, slot 11, to be good (would only have to do this if a drive has # been marked as "F" - Foreign - due to use in a previous RAID). /opt/MegaRAID/storcli/storcli64 /c0/e32/s11 set good force # Set the drive in slot 11 as a hot spare. /opt/MegaRAID/storcli/storcli64 /c0/e32/s11 add hotsparedrive # Context dependent help is available /opt/MegaRAID/storcli/storcli64 /c0/e37/s15 help # The context dependent help is how I figured out how to show rebuild progress. # Drive status must be "Rbld" for this to do anything meaningful : /opt/MegaRAID/storcli/storcli64 /c0/e37/s15 show rebuild Controller = 0 Status = Success Description = Show Drive Rebuild Status Succeeded. ---------------------------------------------------------- Drive-ID Progress% Status Estimated Time Left ---------------------------------------------------------- /c0/e37/s15 - Not in progress - ---------------------------------------------------------- }}} It looks like these LSI controllers will remove failed drives from that RAID automatically, as compared to the 3ware controllers, where you have to do it by hand. When new drives are put in to replace failed drives, the LSI controllers will, y default, do a "copyback". This involves copying data around so that the disks in the slots that were previously the Designated Hot Spares (DHS) are again the DHS drives for the RAID. During the "copyback" process, the RAID status may look, in part, like : {{{ 37:5 41 Onln 0 7.276 TB SATA HDD N N 512B ST8000VN0022-2EL112 U - 37:6 44 Onln 0 7.276 TB SATA HDD N N 512B ST8000VN0022-2EL112 U - 37:7 39 Onln 0 7.276 TB SATA HDD N N 512B ST8000VN0022-2EL112 U - 37:8 50 Onln 0 7.276 TB SATA HDD N N 512B ST8000VN0022-2EL112 U - 37:9 49 Onln 0 7.276 TB SATA HDD N N 512B ST8000VN0022-2EL112 U - 37:10 55 Onln 0 7.276 TB SATA HDD N N 512B ST8000VN0022-2EL112 U - 37:11 54 Cpybck - 7.276 TB SATA HDD N N 512B ST8000VN004-2M2101 U - 37:12 45 Onln 0 7.276 TB SATA HDD N N 512B ST8000VN0022-2EL112 U - 37:13 58 Onln 0 7.276 TB SATA HDD N N 512B ST8000VN0022-2EL112 U - 37:14 46 Onln 0 7.276 TB SATA HDD N N 512B ST8000VN0022-2EL112 U - 37:15 57 Cpybck - 7.276 TB SATA HDD N N 512B ST8000VN004-2M2101 U - 37:16 51 Onln 0 7.276 TB SATA HDD N N 512B ST8000VN0022-2EL112 U - 37:17 52 Onln 0 7.276 TB SATA HDD N N 512B ST8000VN0022-2EL112 U - 37:18 53 Onln 0 7.276 TB SATA HDD N N 512B ST8000VN0022-2EL112 U - }}} The "Copyback" feature can be turned off, but it is the default.