cciss-vol-status/cciss_vol_status.8

.\" Copyright (C) 2006,2007,2012,2013 Hewlett-Packard Development Company, L.P.
.\"
.\"
.\"	Copyright 2006,2007,2012,2013 Hewlett-Packard Development Company, L.P.
.\"
.\"	Author: Stephen M. Cameron
.\"
.\"	This file is part of cciss_vol_status.
.\"
.\"	cciss_vol_status is free software; you can redistribute it and/or modify
.\"	it under the terms of the GNU General Public License as published by
.\"	the Free Software Foundation; either version 2 of the License, or
.\"	(at your option) any later version.
.\"
.\"	cciss_vol_status is distributed in the hope that it will be useful,
.\"	but WITHOUT ANY WARRANTY; without even the implied warranty of
.\"	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
.\"	GNU General Public License for more details.
.\"
.\"	You should have received a copy of the GNU General Public License
.\"	along with cciss_vol_status; if not, write to the Free Software
.\"	Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
.\"
.TH CCISS_VOL_STATUS "8" "May 2013" "cciss_vol_status (ccissutils) " ""
.SH NAME
cciss_vol_status \- show status of logical drives attached to HP Smartarray controllers
.SH SYNOPSIS
.B cciss_vol_status
[\fIOPTION\fR] [\fIDEVICE\fR]...
.SH DESCRIPTION
.\" Add any additional description here
.PP
Shows the status of logical drives configured on HP Smartarray
controllers.
.SH OPTIONS
.TP
\fB\-p, --persnickety\fR
Without this option, device nodes which can't be opened, or which
are not found to be of the correct device type are silently ignored.
This lets you use wildcards, e.g.: cciss_vol_status /dev/sg* /dev/cciss/c*d0,
and the program will not complain as long as all devices which are found
to be of the correct type are found to be ok.  However, you may wish
to explicitly list the devices you expect to be there, and be notified
if they are not there (e.g. perhaps a PCI slot has died, and the system has
rebooted, so that what was once /dev/cciss/c1d0 is no longer there at
all).  This option will cause the program to complain about any device
node listed which does not appear to be the right device type, or
is not openable.
.TP
\fB\-C, --copyright\fR
If stderr is a terminal, Print out a copyright message,
and exit.
.TP
\fB\-q, --quiet\fR
This option doesn't do anything.
Previously, without this option and if stderr is a
terminal, a copyright message precedes the normal program output.
Now, the copyright message is only printed via the -C option.
.TP
\fB\-s\fR
Query each physical drive for S.M.A.R.T data and report any drives
in "predictive failure" state.
.TP
\fB\-u, --try-unknown-devices\fR
If a device has an unrecognized board ID, normally the program will
not attempt to communicate with it.  In case you have some Smart Array
controller which is newer than this program, the program may not
recognize it.  This option permits the program to attempt to interrogate
the board even if it is unrecognized on the assumption that it is
in fact a Smart Array of some kind.
.TP
\fB\-v, --version\fR
Print the version number and exit.
.TP
\fB\-V, --verbose\fR
Print out more information about the controllers and physical drives.
For each controller, the board ID, number of logical drives, currently
running firmware revision and ROM firmware revision are printed.  For
each physical drive, the location, vendor, model, serial number,
and firmware revision are printed.
.TP
\fB\-x, --exhaustive\fR
Deprecated.  Previously, it "exhaustively" searched for logical
drives, as, under some circumstances some logical drives might
otherwise be missed.  This option no longer does anything, as the
algorithm for finding logical drives was changed to obviate the
need for it.
.SH DEVICE
.PP
The DEVICE argument indicates which RAID controller is to be queried.
Note, that it indicates which RAID controller, not which logical drive.
.PP
For the cciss driver, the "d0" nodes matching "/dev/cciss/c*d0" are the
nodes which correspond to the RAID controllers.  (See note 1, below.)
It is not necessary to invoke cciss_vol_status on each logical drive
individually, though if you do this, each time it will report the
status of ALL logical drives on the controller.
.PP
For the hpsa driver, or for fibre attached MSA1000 family devices, or
for the hpahcisr sotware RAID driver which emulates Smart Arrays,
the RAID controller
is accessed via the scsi generic driver, and the device nodes will
match "/dev/sg*"   Some variants of the "lsscsi" tool will easily
identify which device node corresponds to the RAID controller.  Some
variants may only report the SCSI nexus (controller/bus/target/lun
tuple.)  Some distros may not have the lsscsi tool.
.PP
.br
Executing the following query to the /sys filesystem and correlating
this with the contents of /proc/scsi/scsi or output of lsscsi
can help in finding the right
/dev/sg node to use with cciss_vol_status:
.PP
.nf
.LD
wumpus:/home/scameron # ls -l /sys/class/scsi_generic/*
lrwxrwxrwx 1 root root 0 2009-11-18 12:31 /sys/class/scsi_generic/sg0 -> ../../devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:03.0/host0/target0:0:0/0:0:0:0/scsi_generic/sg0
lrwxrwxrwx 1 root root 0 2009-11-18 12:31 /sys/class/scsi_generic/sg1 -> ../../devices/pci0000:00/0000:00:1f.1/host2/target2:0:0/2:0:0:0/scsi_generic/sg1
lrwxrwxrwx 1 root root 0 2009-11-19 07:47 /sys/class/scsi_generic/sg2 -> ../../devices/pci0000:00/0000:00:05.0/0000:0e:00.0/host4/target4:3:0/4:3:0:0/scsi_generic/sg2
wumpus:/home/scameron # cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: COMPAQ   Model: BD03685A24       Rev: HPB6
  Type:   Direct-Access                    ANSI  SCSI revision: 03
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: SAMSUNG  Model: CD-ROM SC-148A   Rev: B408
  Type:   CD-ROM                           ANSI  SCSI revision: 05
Host: scsi4 Channel: 03 Id: 00 Lun: 00
  Vendor: HP       Model: P800             Rev: 6.82
  Type:   RAID                             ANSI  SCSI revision: 00
wumpus:/home/scameron # lsscsi
[0:0:0:0]    disk    COMPAQ   BD03685A24       HPB6  /dev/sda
[2:0:0:0]    cd/dvd  SAMSUNG  CD-ROM SC-148A   B408  /dev/sr0
[4:3:0:0]    storage HP       P800             6.82  -
.DE
.fi
.PP
From the above you can see that /dev/sg2 corresponds to SCSI nexus 4:3:0:0,
which corresponds to the HP P800 RAID controller listed in /proc/scsi/scsi.
.SH EXAMPLE
.nf
.LD
	[root@somehost]# cciss_vol_status -q /dev/cciss/c*d0
	/dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.
	/dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 1 status: OK.
	/dev/cciss/c0d0: (Smart Array P800) RAID 1 Volume 2 status: OK.
	/dev/cciss/c0d0: (Smart Array P800) RAID 5 Volume 4 status: OK.
	/dev/cciss/c0d0: (Smart Array P800) RAID 5 Volume 5 status: OK.
	/dev/cciss/c0d0: (Smart Array P800) Enclosure MSA60 (S/N: USP6340B3F) on Bus 2, Physical Port 1E status: Power Supply Unit failed
	/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 0 status: OK.
	/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 1 status: OK.
	/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 2 status: OK.
	/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 3 status: OK.
	/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 4 status: OK.
	/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 5 status: OK.
	/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 6 status: OK.
	/dev/cciss/c1d0: (Smart Array P800) RAID 5 Volume 7 status: OK.

	[root@someotherhost]# cciss_vol_status -q /dev/sg0 /dev/cciss/c*d0
	/dev/sg0: (MSA1000) RAID 1 Volume 0 status: OK.   At least one spare drive.
	/dev/sg0: (MSA1000) RAID 5 Volume 1 status: OK.
	/dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.

	[root@localhost]# ./cciss_vol_status -s /dev/sg1
	/dev/sda: (Smart Array P410i) RAID 0 Volume 0 status: OK.
		 connector 1I box 1 bay 1                 HP      DG072A9BB7                               B365P6803PCP0633     HPD0 S.M.A.R.T. predictive failure.
	[root@localhost]# echo $?
	1

	[root@localhost]# ./cciss_vol_status -s /dev/cciss/c0d0
	/dev/cciss/c0d0: (Smart Array P800) RAID 0 Volume 0 status: OK.
		 connector 2E box 1 bay 8                 HP      DF300BB6C3                           3LM08AP700009713RXUT     HPD3 S.M.A.R.T. predictive failure.
	/dev/cciss/c0d0: (Smart Array P800) Enclosure MSA60 (S/N: USP6340B3F) on Bus 2, Physical Port 2E status: OK.


	[root@localhost cciss_vol_status]# ./cciss_vol_status --verbose /dev/sg0
	Controller: Smart Array P420i
	  Board ID: 0x3354103c
	  Logical drives: 1
	  Running firmware: 3.42
	  ROM firmware: 3.42
	/dev/sda: (Smart Array P420i) RAID 1 Volume 0 status: OK.
	  Physical drives: 2
		 connector 1I box 2 bay 1                 HP      EG1200FCVBQ                                      KZG21NVD     HPD1 OK
		 connector 2I box 2 bay 5                 HP      EG1200FCVBQ                                      KZG20X7D     HPD1 OK
	/dev/sg0(Smart Array P420i:0): Non-Volatile Cache status:
			   Cache configured: Yes
			  Read cache memory: 81 MiB
			 Write cache memory: 735 MiB
			Write cache enabled: Yes
	   Flash backed cache present


.DE
.fi
.SH DIAGNOSTICS
.PP
Normally, a logical drive in good working order should
report a status of "OK."  Possible status values are:
.TP
"OK." (0) - The logical drive is in good working order.
.TP
"FAILED." (1) - The logical drive has failed, and no i/o to it is poosible.
Additionally, failed drives will be identified by connector, box and bay,
as well as vendor, model, serial number, and firmware revision.
.TP
"Using interim recovery mode." (3) - One or more drives has failed,
but not so many that the logical drive can no longer operate.  The
failed drives should be replaced as soon as possible.
.TP
"Ready for recovery operation." (4) -  Failed drive(s) have been
replaced, and the controller is about to begin rebuilding
redundant parity data.
.TP
"Currently recovering." (5) - Failed drive(s) have been replaced,
and the controller is currently rebuilding redundant parity
information.
.TP
"Wrong physical drive was replaced." (6) - A drive has failed, and
another (working) drive was replaced.
.TP
"A physical drive is not properly connected." (7) - There is some
cabling or backplane problem in the drive enclosure.
.TP
(From fwspecwww.doc, see cpqarray project on sourceforge.net):
Note: If the unit_status value is 6 (Wrong physical drive was replaced)
or 7 (A physical drive is not properly connected), the unit_status
of all other configured logical drives will be marked as
1 (Logical drive failed). This is to force the user to
correct the problem and to insure that once the problem
is corrected, the data will not have been corrupted by
any user action.
.TP
"Hardware is overheating." (8) - Hardware is too hot.
.TP
"Hardware was overheated." (9) - At some point in the past,
the hardware got too hot.
.TP
"Currently expannding." (10) - The controller is currently in the
process of expanding a logical drive.
.TP
"Not yet available." (11) - The logical drive is not yet finished
being configured.
.TP
"Queued for expansion." (12) - The logical drive will be expended
when the controller is able to begin working on it.
.PP
Additionally, the following messages may appear regarding spare
drive status:
.PP
.nf
.LD
	"At least one spare drive designated"
	"At least one spare drive activated and currently rebuilding"
	"At least one activated on-line spare drive is completely rebuilt on this logical drive"
	"At least one spare drive has failed"
	"At least one spare drive activated"
	"At least one spare drive remains available"
.DE
Active spares will be identified by connector, box and bay, as well
as by vendor, model, serial number, and firmware revision.
.fi
.PP
For each logical drive, the total number of failed
physical drives, if more than zero, will be reported as:
.TP
.nf
.LD
	"Total of n failed physical drives detected on this logical drive."
.DE
.fi
.PP
with "n" replaced by the actual number, of course.
.PP
"Replacement" drives -- newly inserted drives that replace
a previously failed drive but are not yet finished rebuilding --
are also identified by connector, box and bay, as well as
by vendor, model, serial number, and firmware revision.
.PP
If the -s option is specified, each physical drive will be
queried for S.M.A.R.T data, any any drives in predictive failure
state will be reported, identified by connector, box and bay,
as well as vendor, model, serial number, and firmware revision.
.PP
Additionally failure conditions of disk enclosure fans,
power supplies, and temperature are reported as follows:
.PP
.nf
.LD
	"Fan failed"
	"Temperature problem"
	"Door alert"
	"Power Supply Unit failed"
.DE
.fi
.SH FILES
/dev/cciss/c*d0 (Smart Array PCI controllers using the cciss driver)
.br
/dev/sg* (Fibre attached MSA1000 controllers and
Smart Array controllers using the hpsa driver or
hpahcisr software RAID driver.)
.SH EXIT CODES
.TP
0 - All configured logical drives queried have status of "OK."
.TP
1 - One or more configured logical drives queried have status other than "OK."
.SH BUGS
.P
MSA500 G1 logical drive numbers may not be reported correctly.
.P
I've seen enclosure serial numbers contain garbage.
.P
Some Smart Arrays support more than 128 physical drives on a single RAID
controller.  cciss_vol_status does not.
.SH AUTHOR
Written by Stephen M. Cameron
.SH "REPORTING BUGS"
.P
Report bugs to <scameron@beardog.cce.hp.com>
.SH COPYRIGHT
Copyright \(co 2007 Hewlett-Packard Development Company, L.P.
.br
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
.SH "SEE ALSO"
http://cciss.sourceforge.net
.SH note 1
The /dev/cciss/c*d0 device nodes of the cciss driver do double duty.
They serve as an access point to both the RAID controllers, and to the
first logical drive of each RAID controller.  Notice that a /dev/cciss/c*d0
node will be present for each controller even if no logical drives are
configured on that controller.  It might be cleaner if the driver had
a special device node just for the controller, instead of making these
device nodes do double duty.  It has been like that since the 2.2
linux kernel timeframe.  At that time, device major and minor nodes
were statically allocated at compile time, and were in short supply.
Changing this behavior at this point would break lots of userland
programs.
.FE