Introduction
The main purpose of disk drives is to store and retrieve data. The important factors are capacity (store lots of data), data access speed (access the data quickly), and data integrity (the data you write is the data you read).
We will use the allegory of health as we look at Log Page and Sense data in this article.
As the drive does its job storing and retrieving data it keeps track of its health, or how well it is doing its job.
A healthy drive may show no or few occurrences of error correction or retries, while a sick drive may indicate that it has to frequently retry writes or reads.
A healthy drive will have a “normal” temperature, while a sick drive may have a fever.
A dying drive may show a trend of more and more errors.
All of these things will be reported by the drive via Log Page data and/or Sense Data.
Overall Health – Log Pages
As the drive is running it keeps track of things such as
- are writes and reads happening “easily”?
- has it been necessary to work harder to write/read (error correction or retries)?
- what is my temperature?
- have self-tests been run? How did they do?
- are background scans running? How are they doing?
- is my cache being used efficiently?
- and more.
These things are all recorded in the drives Log Pages.
Structure of Log Pages
Log Pages are structured like this –
Log Page (General subject such as Read Errors)
- Log Parameter – under the general subject, more specific.
- Log Parameter n…
For example, you probably will have Log Page 3 – the Read Errors Counter Page
Log Page 3- Read Error Counter Page
- Parameter 0 = Errors corrected without substantial delays (ECC)
- Parameter 1 = Errors corrected with possible delays (retries)
- Parameter 2 = Total Re-reads
- Parameter 3 = Total Errors corrected
- Parameter 4=Total times corrections algorithm processed
- Parameter 5=Total bytes processed
- Parameter 6=Total errors uncorrected (hard errors)
The Log Page (3) is the general top level – Read Errors.
The Parameters under the Page are the details.
List of Log Pages
Log Pages are officially documented in the INCITS T10 Technical Committee (T10.org), and specific information can be found in the SBC (SCSI Block Command) and SPC (SCSI Primary Command) documents. See the Appendix to download these documents.
From the SBC-4 document, here is a list of Log Pages:
As you can see, there can be a wealth of information available to check up on the drives health.
How to read Log Pages using STB Suite
There are a number of ways to view a drives Log Pages using the STB Suite.
The first way is to use STB Original mode.
Select the drive you are interested in in the Device view, right-click on the drive and choose View Log Pages. You will see something similar to this:
STB asks the drive to report what Log Pages it has and displays them in the left side of the menu. You may see more or fewer Log Pages – each drive will report all the Log Pages it has.
Double-click on any Log Page and all of the Parameters under that Page will be displayed in the right side.
Here we double-clicked on Page 03, the Read Error Counter Page –
And you see all of the different Parameters and their values.
For the most part the Log Page/Parameter values will increment up or accumulate, usually until they are reset.
In the disk testing environment this allows you to
- Clear the Log Pages – set all values back to zero
- run some tests…
- View the Log pages and see what happened during your test.
In this way you can start your test with all Log page data zero’d out. You run your test, then look at the Log Pages again. All Log Page/Parameters will show what happened during your test run.
If you are testing drives that come in from customers that are reported to have problems you can view and record all Log Pages so you have an indication of what state the drive was in as it came from the customer.
This can be very useful in troubleshooting the drive.
For example, let’s say you get a problem drive back from your customer. They just say “this drive fails in our RAID”.
You look at and save all Log Pages and note that there are a large number of uncorrected errors.
You clear the Log Pages – setting all Page/Parameter data back to zeros, then run your test using DMM.
You note that in your test environment the drive does not show uncorrected errors.
This might lead you to the conclusion that the drive itself in your test environment (steady power, known good cables/backplanes/etc, good cooling) doesn’t have the problems that your customer reported.
Perhaps the customer environment has cooling or vibration issues? Or perhaps the customer has a drive enclosure with a flakey drive slot?
Or perhaps your testing with DMM does show the same trend of lots of uncorrected errors.
Either way, Log Pages can help you determine if a reported problem is actually with the drive or instead is with the customers environment.
The second way to see Log Page data is to…
Double-click on the drive in the Device menu. This will bring up the Drive Information window. Choosing the Error Data tab will show the Write and Read error counter contents –
And the third way to work with Log Page data is in DMM, using the Save Log Pgs test step. When this test step is run in a DMM test sequence it will record all Log Page/Parameter data into the DMM .log file(s), in the same format as method # 1 above.
In addition, you can clear all Log Pages before you start testing in DMM. This is done either in the Pre-Test Actions by checking the Clear LOG PAGES box, or within a Test Sequence by using the Clear Log Pgs test step, or finally by setting the Post-Test Action Clear LOG PAGES check box.
Log Page Summary
The summary about Log Pages is
- The drive records information about its health as it runs
- this health information is stored in Log Pages
-You can gauge a drives health by examining Log Page data, in particular by looking at trends, such as a drive needed to do more and more retries in order to correctly read data.
Sick Drive – Sense Data
If the drive cannot complete a command correctly it will report its problem using Sense Data.
In particular, if a command results in a CHECK CONDITION, the computer will automatically issue a REQUEST SENSE command and the drive will return Sense Data describing the error.
Structure of Sense Data
Like Log Page data, Sense Data is reported by a hierarchy, in this case of three values –
Sense Key
Sense Code
Additional Sense Qualifier (ASQ)
Optional – “Information” data
The STB Suite will generally report Sense Data like this – 0x02/0x00/0x00 , in other words
Sense Key/Sense Code/ASQ.
As an example, Sense Data 02/04/01 is interpreted as “Not Ready, Logical Unit Not Ready, becoming ready”. In other words the drive is spun down and is in the process of spinning up.
As you can see Sense Data is very valuable when you are trying to determine the specifics of a drive error.
List of Sense Data
The list of Sense Key/Code/ASQ definitions is a large one. See the Appendix for a link to the documents showing Numerical Order list of common Sense Data.
Note: The Sense Data format allows for drive manufacturers to define their own “vendor unique” sense data, so it is always best practice to have the manufacturers SCSI Command Reference documentation to be certain that you are correctly interpreting your drives Sense Data
How Sense data is read
As mentioned above, in 99% of the time, when a drive has a command error the test system Host Bus Adapter will automatically issue a REQUEST SENSE command to the drive and the drive will return Sense Data detailing the error.
Where will Sense Data be displayed?
1. In STB Suite original mode command errors will generally cause a pop-up window to appear with the Sense Data shown –
2. Also in Original mode you can always go to the top menu Disk->Commands->Information Functions->View Request Sense choice and see the last Sense Data collected –
Note: There are other optional ways the Sense Data may be formatted. Read the T10 documentation to learn the low-level details of the way Sense Data can be formatted.
3. DMM
DMM will record Sense Data for all command errors in the DMM .log files –
——————————————————————————
07/30/2014 09:04:19 TEST 2 of 3:
Grown Defect List Count Test
Stop-on-Error Type: Stop Current Test
07/30/2014 09:04:19 Worker ID: 1
07/30/2014 09:04:19 CDB = B7 0D 00 00 00 00 00 00 00 08 00 00
Check Condition: Key/ASC/ASCQ = 05/20/00
Complete Sense Data:
70 00 05 00 00 00 00 0A 00 00 00 00 20 00 00 00
00 00
>>> Current Test Aborted <<<
07/30/2014 09:04:19 *** FAIL ***
Summary
Log Page data gives you a view of a drives health.
Sense Data gives details when a drive has a command error.
Appendix
Here is a link where you can download the T10 Block and Primary Command documents –
www.stbsuite.com/downloads/T10B-PDocx.zip
For the numeric-order Sense Data list see document SPC4r37.pdf Annex F.
For Log Page information see document SBC4r02.pdf Section 6.4