wiki:LSIraid

Version 8 (modified by niles, 17 months ago) (diff)

--

Setting up an LSI (MegaRAID) RAID controller card

This documents how to set up a RAID using an LSI RAID controller card.

The first step is to install the software. There are two parts, the GUI and the StorCLI command line tool. At the time of writing the GUI software is available at :

https://docs.broadcom.com/docs/17.05.00.02_Linux-64_MSM.gz

This can be installed (as root) like so :

# tar xfz 17.05.00.02_Linux-64_MSM.gz
# cd disk
./install.csh -s

Note that /bin/csh needs to be installed, to do the above steps, so you may have to install tcsh to get that.

The StorCLI utility is available at :

https://docs.broadcom.com/docs/1.21.16_StorCLI.zip

It is installed on CentOS as follows :

# unzip 1.21.16_StorCLI.zip
# unzip versionChangeSet/univ_viva_cli_rel/storcli_All_OS.zip
# rpm -i storcli_All_OS/Linux/storcli-1.21.06-1.noarch.rpm

After the software is installed, you should be able to see the disks attached to controller zero (/c0) with this command (which can also be used as a general status check once the RAID is created as detailed here) :

# /opt/MegaRAID/storcli/storcli64 /c0 show
Generating detailed summary of the adapter, it may take a while to complete.
.
.
.
-----------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type  State BT       Size PDC  PI SED DS3  FSpace TR 
-----------------------------------------------------------------------------
 0 0   0   41:0     46  DRIVE Onln  N    7.276 TB dflt N  N   dflt -      N  
 0 0   1   41:1     52  DRIVE Onln  N    7.276 TB dflt N  N   dflt -      N  
 0 0   2   41:2     53  DRIVE Onln  N    7.276 TB dflt N  N   dflt -      N  
 0 0   3   41:3     47  DRIVE Onln  N    7.276 TB dflt N  N   dflt -      N  
 0 0   4   41:4     49  DRIVE Onln  N    7.276 TB dflt N  N   dflt -      N  

The number 41 is the "enclosure number". This is used when assembling the RAID, or "virtual drive", which is done like so :

[root@netdrms02 ~]# /opt/MegaRAID/storcli/storcli64 /c0 add vd r6 \
name=SUMS drives=41:0-17 strip=256 Spares=41:18-19 

NOTE that there is an important "gotcha" : The arguments above are ORDER DEPENDENT. To hammer this home : the same command with the "strip" and "drives" entries reversed :

[root@netdrms02 ~]# /opt/MegaRAID/storcli/storcli64 /c0 add vd r6 \
name=SUMS strip=256 drives=41:0-17 Spares=41:18-19

will NOT work, and will blather about not recognizing tokens. This is a hole that is hard to get out of, since the error was pretty nonsensical.

In the above command, "r6" means RAID 6. The online help is available through

/opt/MegaRAID/storcli/storcli64 /c0 add vd help

In the above command, there are 20 physical disks in the JBOD, we're using disks 0 to 17 in the raid and setting disks 18 to 19 as spares.

We also found there was no need to initialize the disk, ie. to tell the OS about it with a separate command, like :

$ /opt/MegaRAID/storcli/storcli64 /c0/v0 start init

Again, that command fails with an error that is not intuitive. So time was lost on that. But dmesg showed that the OS already knew about the disk.

dmesg showed the new device added as /dev/sdb (the key word below is "Attached") :

[110076.339285] scsi 12:2:0:0: Direct-Access     LSI      MR9286CV-8e      3.27 PQ: 0 ANSI: 5
[110076.359214] sd 12:2:0:0: [sdb] 250031898624 512-byte logical blocks: (128 TB/116 TiB)
[110076.359220] sd 12:2:0:0: [sdb] 4096-byte physical blocks
[110076.359311] sd 12:2:0:0: [sdb] Write Protect is off
[110076.359315] sd 12:2:0:0: [sdb] Mode Sense: 1f 00 00 08
[110076.359320] sd 12:2:0:0: Attached scsi generic sg5 type 0
[110076.359364] sd 12:2:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[110076.377336] sd 12:2:0:0: [sdb] Attached SCSI disk

OK, spiffy, but we need to use 'parted' to write a gpt partition table for disk of this size. 'fdisk' will not work since it, being a kinds sorta DOS based thing has size limits (another hole to avoid). So we need to do something like this (for one big ol' partition) :

# parted /dev/sdb
(parted) mklabel gpt
(parted) mkpart primary xfs 0% 100%
(parted) quit

OK, now we have /dev/sdb1, so we create an xfs filesystem on it :

# mkfs -t xfs /dev/sdb1

Mount it to test :

# mkdir /SUM01; mount -t xfs /dev/sdb1 /SUM01

And it shows up in 'df', which is great :

# df -lh
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root   50G  8.3G   42G  17% /
devtmpfs                  63G     0   63G   0% /dev
tmpfs                     63G     0   63G   0% /dev/shm
tmpfs                     63G   18M   63G   1% /run
tmpfs                     63G     0   63G   0% /sys/fs/cgroup
/dev/sda1               1014M  236M  779M  24% /boot
/dev/mapper/centos-home  169G  126M  169G   1% /home
tmpfs                     13G     0   13G   0% /run/user/1000
tmpfs                     13G   12K   13G   1% /run/user/42
tmpfs                     13G  8.0K   13G   1% /run/user/1001
/dev/sdb1                117T   38M  117T   1% /SUM01

Un-mount it :

$ umount  /SUM01

To get this mount to happen automatically, you have to put something like this in /etc/fstab :

UUID=2b598aba-0b60-4966-a443-90c9ca730974 /SUM01           xfs     defaults        1 1

To get the UUID :

# blkid /dev/sdb1
/dev/sdb1: UUID="2b598aba-0b60-4966-a443-90c9ca730974" TYPE="xfs" PARTLABEL="primary" PARTUUID="5dd86180-2420-4020-ac17-6623a2f6db56" 

And then it will mount if you ask it to mount everything in /etc/fstab :

# mount -a

Other handy things

# Show controller 0
/opt/MegaRAID/storcli/storcli64 /c0 show

# Show the enclosure (enclosure ID is 37)
/opt/MegaRAID/storcli/storcli64 /c0/e37 show

# Show the details on the drive in slot 15
/opt/MegaRAID/storcli/storcli64 /c0/e37/s15 show all

# Force a drive, slot 11, to be good (would only have to do this if a drive has
# been marked as "F" - Foreign - due to use in a previous RAID).
/opt/MegaRAID/storcli/storcli64 /c0/e32/s11 set good force

# Set the drive in slot 11 as a hot spare.
/opt/MegaRAID/storcli/storcli64 /c0/e32/s11 add hotsparedrive DGs=0

# Context dependent help is available
/opt/MegaRAID/storcli/storcli64 /c0/e37/s15 help

# The context dependent help is how I figured out how to show rebuild progress.
# Drive status must be "Rbld" for this to do anything meaningful :

/opt/MegaRAID/storcli/storcli64 /c0/e37/s15 show rebuild

Controller = 0
Status = Success
Description = Show Drive Rebuild Status Succeeded.


----------------------------------------------------------
Drive-ID    Progress% Status          Estimated Time Left 
----------------------------------------------------------
/c0/e37/s15 -         Not in progress -                   
----------------------------------------------------------



# Similarly one can get the time remaining on a copyback

/opt/MegaRAID/storcli/storcli64 /c0/e37/s3 show copyback
Controller = 0
Status = Success
Description = Show Drive Copyback Status Succeeded.


-----------------------------------------------------------
Drive-ID   Progress% Status      Estimated Time Left       
-----------------------------------------------------------
/c0/e37/s3         0 In progress 1 Days 3 Hours 38 Minutes 
-----------------------------------------------------------




# Turn on/off LED light to find a given disk
/opt/MegaRAID/storcli/storcli64 /c0/e37/s4 start locate
/opt/MegaRAID/storcli/storcli64 /c0/e37/s4 stop locate


Delete foreign drive status, then add drive as a spare.

/opt/MegaRAID/storcli/storcli64 /c0/fall del
/opt/MegaRAID/storcli/storcli64 /c0/e252/s6 add hotsparedrive DGs=1

It looks like these LSI controllers will remove failed drives from that RAID automatically, as compared to the 3ware controllers, where you have to do it by hand.

When new drives are put in to replace failed drives, the LSI controllers will, y default, do a "copyback". This involves copying data around so that the disks in the slots that were previously the Designated Hot Spares (DHS) are again the DHS drives for the RAID. During the "copyback" process, the RAID status may look, in part, like :

37:5     41 Onln    0 7.276 TB SATA HDD N   N  512B ST8000VN0022-2EL112 U  -    
37:6     44 Onln    0 7.276 TB SATA HDD N   N  512B ST8000VN0022-2EL112 U  -    
37:7     39 Onln    0 7.276 TB SATA HDD N   N  512B ST8000VN0022-2EL112 U  -    
37:8     50 Onln    0 7.276 TB SATA HDD N   N  512B ST8000VN0022-2EL112 U  -    
37:9     49 Onln    0 7.276 TB SATA HDD N   N  512B ST8000VN0022-2EL112 U  -    
37:10    55 Onln    0 7.276 TB SATA HDD N   N  512B ST8000VN0022-2EL112 U  -    
37:11    54 Cpybck  - 7.276 TB SATA HDD N   N  512B ST8000VN004-2M2101  U  -    
37:12    45 Onln    0 7.276 TB SATA HDD N   N  512B ST8000VN0022-2EL112 U  -    
37:13    58 Onln    0 7.276 TB SATA HDD N   N  512B ST8000VN0022-2EL112 U  -    
37:14    46 Onln    0 7.276 TB SATA HDD N   N  512B ST8000VN0022-2EL112 U  -    
37:15    57 Cpybck  - 7.276 TB SATA HDD N   N  512B ST8000VN004-2M2101  U  -    
37:16    51 Onln    0 7.276 TB SATA HDD N   N  512B ST8000VN0022-2EL112 U  -    
37:17    52 Onln    0 7.276 TB SATA HDD N   N  512B ST8000VN0022-2EL112 U  -    
37:18    53 Onln    0 7.276 TB SATA HDD N   N  512B ST8000VN0022-2EL112 U  -   

The "Copyback" feature can be turned off, but it is the default.