STB Suite

March 2011

Ask Dr. SCSI – Write your first app using the Developer Toolbox

The real Dr. SCSI Q. “Can you help me get started writing my own tests using the Developer Toolbox API? I don’t know where to start!”

A. I sure can. I have added a step-by-step guide on how to get started writing your very own C project using the Developer Toolbox API. The guide below will walk you through the basics of getting the libraries linked properly and perform a basic task like gathering your device information.

Get started with the guide here!

For more information or if you would like something more complex, please let me know how I can help get you started!

 

 

Interpreting Errors in the Test Log
NT Status Errors - InterpretingWhen an error occurs in your test, how do you know what the error really is when it says something like “(Error Code 0x0000045d)”

In the new STB Suite release there has been hundreds of error codes interpreted into the Log Files so you can quickly see what the error translates into.

In this example we can see that “Error Code 0x0000045d” translates to: “The request could not be performed because of an I/O device error. ”

Write Test; Random Access; for 3 Minutes

Fixed-Length Transfers of 4 (0x0004) Blocks

Start Block: 0

Data Pattern: Decrementing

Queue Depth = 2

FUA = OFF

02/25/2011 11:12:55 CDB = 2A 00 04 4B 11 5B 00 00 04 00

IO Error: No Error (0x00000000)

Status: SRB= 0x01 HBA= 0x00 Tgt= 0x00

The request could not be performed because of an I/O device error. (Error Code 0x0000045d)

>>> Current Test Aborted <<<

02/25/2011 11:12:55 *** FAIL ***

——————————————————————————

You can view the full list of NT Status Errors that are incorporated into the DMM log files here.

 

 

 

 

Disk Drive Troubleshooting 101
Disk Testing 101

Introduction

Disk drives are complex devices, marvels of mechanical engineering and real-time computing magic.

For example, the heads of a rotating magnetic disk physically fly over the moving platters, flying at a height of as little as 3 nanometers at a speed over 128 mph! No wonder bumps and drops can do so much damage – just like flying an extremely fragile airplane into metal-hard ground at 128 mph! Not only can the airplane (drive head) be damaged, but the ground (platter surface) can be dug-up, furloughed, damaged. The magnetic coating on the platter is like a thin layer of top soil – scraping it away scrapes away data.

Bottom line – be gentle with your disk drives. You may be under pressure to get a large number of drives tested by the end of the day – but take your time. Move them gently and slowly. Never move a drive while it is spinning. Never drop a drive or bump it into anything. And keep the drive cooled while testing – never power up a disk drive without some kind of fan to move air around the drive to draw away the heat it generates.

All disk drives have a built-in computer to control all the physical operations of the drive as well as dealing with data encoding/decoding, queuing, transferring data in and out – a marvel of real-time computing!

Disk drives can work perfectly, or they cannot work at all, or they can “sort of” work, work marginally or poorly. The goal of basic disk drive testing is to:

  1. Determine if the drive is working at all or not
  2. Determine if the drive itself can tell us if it has had a problem in the past
  3. Determine if the drive can reliably store and retrieve data
  4. Determine if drive settings are appropriate for the intended use of the drive and,
  5. Determine the performance characteristics of the drive.

 

Is the drive working or not?

Determining if the drive is alive or not is simply a matter of connecting it to a test system and checking that the drive spins up, is “online”, and can report its capacity. Using the STB Suite Original mode look at the device window or click the Scan System button to scan all of the storage controllers in your test system. Do you see the drive? Does it report a valid capacity? Does the drive information (manufacturer name, drive part number, firmware version) look reasonable?

If the drive does not show up at all on the test machine you must check all cabling and power for the drive test fixture. See if there are indicator LEDs on the drive that show any activity. Listen to see if you can hear that the drive is spinning, or gently feel if you can sense vibration from the drive.

If the drive does not report a reasonable capacity, for example if it reports it has zero blocks or a negative number of blocks than the drive may need a low-level format before continuing testing.

If the drive information is jumbled or wrong – for instance if the drive is a “SEAGATE” drive but the STB Suite is reporting it as “SEAGGGG” you may have a dead drive or you may have a cabling/termination problem. Try moving the drive to a different slot or connector or bus to see if the problem goes away.

Here is what you should see:

Capacity Ok

 

Can the drive tell us it has had problems?

Disk drives will store historical data which can be retrieved and analyzed. SCSI/FC/SAS drives store this type of information in LOG PAGES. ATA/SATA drives store this information in SATA SMART data.

 

SCSI/SAS/FC Log Pages

A quick way to see an overview of some of the more important Log Pages is to double-click on the drive in the device selection window to bring up the Device Information display. Select the Error Data tab and you will see historical data describing how much data has been read and written and the number of and type of errors that have happened during reads and writes –

Device Information

As a general rule – uncorrected errors are always a bad thing to have. Uncorrected errors usually will cause the LBA in question to be marked as bad.

For SCSI/SAS/FC drives this will mean an increase in the drives G defect list. On the Device Information display click the Statistics tab and note the number of G List defects –

Grown Defects

As another general rule – good disk drives don’t have any grown defects.

For a detailed “raw” data view of every log page a drive has you can right-click on the drive in the device selection window, then from the Quick Command list choose View Log Pages

View Log Pages

 

Be sure to use the Browse button to select a log page definition file – the file “default.dat” is usually fine for any disk drive. The available Log Pages are shown on the left of the display, double-clicking on a Log Page will display that pages parameters on the right.

Note that you can save all of this information to a file. A good thing to do as you test drives is to build a database of the drives you’ve tested.

ATA/SATA SMART Information

This same type of historical error data is found in the SMART data for ATA and SATA drives. To view and save this info go to the STB Suite main menu ATA/SATA->Commands->View SMART Data choice. Select the drive of interest from the lists to the right and the SMART data will be displayed on the left.

Note: to learn about how to interpret SATA SMART data look at http://en.wikipedia.org/wiki/S.M.A.R.T.

The top of the display will show all SMART attributes and will indicate pending problems

ATA SMART Information

And at the bottom of the display you will see attributes which may indicate the actual number of errors or counts such as Power-On Hours, etc –

SMART Power On Errors

 

Can the drive reliably write and read data?

Obviously a disk drive must be able to reliably store and retrieve data. The best way to test these two functions is to run a test which first writes a known data pattern to every block on the drive. Then every block on the drive is read and using data compare each block of data is checked to insure that it is exactly the same as what was written.

Obviously writing and reading the entire drive will take some amount of time. Can you reliably determine if the drives write/read functionality is OK by checking less than the entire drive? Technically, probably not – what if there is a problem with a block which you didn’t test? Statistically – maybe yes. The choice is up to you – balancing the accuracy of your results with the time it takes to complete a test.

The good news in this regard is the STB Suite Disk Manufacturing Module (DMM) is extremely efficient and fast – it can test many drives at once. DMM will tell you exactly how long a given test is going to take to complete so as you test more drives you will soon learn how many drives per hour or day you will be able to test to your company’s specification.

Another choice to be made concerning write/read testing is the access method. The most basic access method is sequential – the test starts at LBA 0 and progresses sequentially through to the last block. Another access method is Random. As the name implies, random access moves through the drives blocks in a random manner. An advantage of random access is that it will generate more vibration in the drives under test, which will stress the drive harder. A new STB Suite access method is CPAM. CPAM is a method which creates random access and also guarantees that every block on the drive will be accessed once and only once.

Getting started with DMM is covered in earlier articles and in videos on our web site at :http://www.scsitoolbox.com/Training/

The STB Original mode also has a number of canned tests, more appropriate for testing a single drive at a time. Select a drive then click the top menu Disk-Tests choice pulldown to see a list of available tests – for example, the Quick QC test checks write/read functionality at the beginning, middle, and end of the drive. The Quick Drive Profile Test will show a good overview of the drive –

Quick Drive Profile

 

Is the drive set up appropriately?

SCSI/SAS/FC drive behavior can be specified by Mode Page settings. To see the most common or important Mode Page settings go back to STB Original mode, double-click on your drive, and choose theMode Page tab

Mode Pages

Settings such as enabling or disabling read ahead and write caching are shown, to change any settings click the Change/Edit Mode Pages button.

Mode Select Page

Note: in general you will want Write Caching (WCE) to be ON. This will greatly increase the write speed of the drive.

There are many settings available via Mode Pages. DMM has a feature whereby you can set up a golden drive with all mode pages set the way you or your customer defines – and then during DMM testing each drive under test will automatically have all of its Mode Pages set to match your golden drive.

What is the performance profile of your drive?

For a quick look at the performance of an individual drive you can select it in STB Original mode, then run a test and watch the real-time performance with Drive Watch – here is a view of the write performance of a drive:

Transfer Rates and Errors

DMM will reveal real-time performance metrics as well as logging them to log files. Here is a similar example to the above – a sequential write test :

Test Status

Note that DMM tells you how long the current test step will take to complete – experiment with this and you will quickly get a feel for estimating how long any given test is going to take to complete.

And the DMM log file for this drive shows:

DMM Log

Summary

Basic testing to confirm overall drive health is easily done with the STB Suite. Use Original mode to examine single drives at a time in detail/depth, and use DMM to test many drives at a time.

STB Original Mode Advantages:

  • Extreme depth of detail available to examine each and every drive setting
  • Versatile set of tests to get a quick snapshot view of drive health
  • STB Suite DMM Advantages
  • Multi-threaded high speed multi drive testing
  • Can test multiple HBA’s/controllers simultaneously
  • Extremely detailed logs generated for each drive under test
  • Easy to define any type of test sequence – test sequences can be saved and reloaded

 

 

 

Schedule your GoToMeeting with STB Today!

STB Training - GoToMeetingDo you have questions about how to best use the STB Suite in your business? STB is happy to work with you in an interactive “live” environment to help you get the most out of your Toolbox. The cost? If you are a current Performa customer it is free! The commitment? Training sessions run between 30 and 60 minutes.

Here is a list of some recent customer training sessions that STB has conducted – live, interactive web sessions presented by STB programmers:

  • SSD Manufacturing Testing
  • How to troubleshoot tape drive problems
  • DOD disk purging
  • Multi-drive SATA firmware downloading with the STB Suite
  • Compliance testing

Contact Jeremy Wolfe at (720) 249-2641 today to schedule your own custom training session!