In this Issue:

iSCSI – How to connect to a iSCSI target

Introduction

24 years ago our company and our product was called “SCSI Toolbox” because SCSI was the up and coming storage protocol.

In the years since then we’ve changed the name from SCSI Toolbox to Storage Toolbox, as SCSI became one of many storage protocols supported. Like PATA, SATA, Fibre Channel, SAS, and as this article discusses, iSCSI which stands for Internet SCSI, or SCSI over ethernet.

If you are working with iSCSI storage you will be happy to know that all of the tests and features of the STB Suite will work with iSCSI devices!

You can connect your STB Suite test system to iSCSI devices via a dedicated hardware iSCSI adapter, or you can use the built-in-to-Windows iSCSI initiator software initiator.

This article describes how to use the Microsoft iSCSI Initiator to connect a iSCSI target.

STEP 1: Launch Microsoft iSCSI Initiator

From your desktop, there should be a desktop icon for the Microsoft iSCSI Initiator, as in the picture below:

STEP 2: Click on the “Available Targets” tab, as in the picture below:

STEP 3: Log On to the Target

In the picture above, you can see we have several iSCSI Targets, one is already connected, while others are inactive. Go ahead and click on one of the “Inactive” targets, and then click the button “Log On…”. If all goes well, you should now be connected to that target (see the picture below)

STEP 4: Verify Device Manager shows your connected iSCSI devices

Bring up Device Manager and open up the “Disk drives” folder

Click here to see the iSCSI devices once they’re connected.

Ask Dr. SCSI – DoD-5220/NIST disk purge

Q. Dear Dr. SCSI – My customer requires that as part of our test procedure we “execute a DoD-5220/NIST disk purge”.

My understanding is that that type of purge requires 3 full write passes of the entire drive. I just tried a 4TB drive and can see that it is going to take over 6 hours per pass – 18 hours total!

Is there anything I can do to meet the customer requirement but in a faster way?

A. Yes, you can cut your old purge time down to one-third of the previous because of a change made to the DoD5220/ NIST specification in 2012!

Previously the DoD 522-.22-m specification defined an acceptable purge method as this (3-pass method):

2006 – Overwrite all addressable locations with 1)a character, 2)its complement, 3)then a random character and verify..

Clearly three passes was specified at that time. Actually the argument could be made for four passes, since there is the vague requirement to “verify”. Does that mean verify every block on the drive? One block? The official specification doesn’t make this very clear.

But that lack of clarity doesn’t matter now because in 2012 the specification was changed to this –

2012 – If neither of the first two options is supported, use the native read and write interface to write least a single pass with a fixed data value, such as all zeros. Multiple passes or more complex values may alternatively be used.

You can download the newest specification here – nist-sp-800-88-rev1.pdf

Appendix A of this document has all the details – here is a repeat from that appendix –

SCSI Hard Drives This includes SCSI, SAS, Fibre Channel, etc. Clear: Overwrite media by using organizationally approved and validated overwriting technologies/methods/tools. The Clear pattern should be at least a single pass with a fixed data value, such as all zeros. Multiple passes or more complex values may alternatively be used.

The new specification document describes the newer purge methods such as the SANITIZE and CRYPTO ERASE commands, but this article is mainly about the question of “how many passes does the DoD 5220 spec require” – and the answer to that is “one pass”. You can do more passes if you like, but one pass will meet the DoD requirement.

Note: all current purge commands such as the SANITIZE and CRTYPTO commands are supported by the STB Suite. In the next STB Suite release an option to specify either a “1-pass DoD” or a “3-Pass DoD” purge will be supported.

Linux Disk Manufacturing Engine (DME)

The Linux DME (Disk Manufacturing Engine) was originally created in 2003 to serve the needs of high-volume mass storage manufacturing and integration companies.

The typical feature set required by this type of customer includes:

The ability to run industry standard tests
High volume (typically hundreds of disks at a time)
The ability to use inexpensive test stations
High throughput – typically 90% of bus max bandwidth

What is the Linux DME?

The Linux DME is a GUI-less Engine which executes STB Suite DMM test sequences on multiple disk drives. Being a command-line program means that the Linux DME may be run on any computer running the Linux operating system. There is no requirement for any type of graphics console – Linux DME can be run across a network on “headless” computer systems.

LinuxDME_workflow

How is Linux DME invoked?

Linux DME is started from a command-line, passing command-line parameters to describe:

Which test sequence file to run
Which drives to test
Where log files will be written
Whether to loop the test sequence by time or repeat count

Command-line Example:

LinuxDME_cmdLineParams

Output Log File For a 10 Test Sequence Example:

--------------------------------------------------------------------------------
libLinuxPSSL.so Version: 8.8.1.140124
dmm_silent Version: 8.8.1.140124

Test Date: 04/27/2016 14:39:02

Script Filename: ReadWrite_QD1_thru_QD16-v872.seq

sysname: Linux
nodename: localhost.localdomain
machine: x86_64

Device Info: 2:1:0 ATA ST4000NM0053-1C1 SS03 Serial: Z1Z08LYE

View the Full Linux DME Test log available online here.

Linux DME Features:

Extremely high I/O throughput
Executes industry standard STB Suite test sequences
Individual test threads for each drive under test
Detailed text log files
CSV log files
GUI-less command-line program runs on any Linux computer
C++ source code included!

Other Linux Tools:

The STB Suite includes the following tools for the Linux platform:

The LinuxDME
The Linux DTB (Developers Toolbox) Library
- A .so library with hundreds of functions, from single commands through full multi-threaded tests

The Linux SCSITool
- A command-line utility for single-drive test and firmware download

Summary:

The Linux DME piece of the STB Suite provides a flexible solution for high volume disk testing.

Flexible detailed text and csv logs provide easy results import into any database or reporting package.

Linux DME is written using the STB Suite DTB (Developers Toolbox) for easy modification or extension.

Full source code for the DME is included!

SATA Drive Errors

A Few Common Questions About Drive Errors

Here are a few common questions which come into our support department about disk drive errors. In particular this article will discuss SATA disk drives.

1. What causes errors?

2. How does the drive deal with errors?

3. How do you check for errors?

4. Can errors be repaired or removed?

5. How many errors is too many?

6. Can I reset the error counters?

Drive Error Types – What Causes Errors?

Physical Damage

There is a common and fairly accurate analogy that a magnetic disk drive (rotating platter(s) with head(s) “flying” over the platter(s), is like a super-fast airplane flying hundreds of miles per hour at a very low height – say 10-20 feet.

If in this analogy there is a 50 foot tall boulder or debris in the path of the airplane there will be a “crash”. There can be damage to the airplane (disk heads), and there can be damage to the boulder (disk platter).

This is a very bad thing, physically scraping your data off of the drive platter, which can lead to more crashes, etc. This debris can appear in the drive as the drive ages, or if the drive has been subject to physical abuse or impact.

Another way that physical damage can happen is if the drive is running and is “jarred” or impacted. Think of bumping or even dropping your laptop computer as it is running. The impact can travel to the disk drive and cause the disk heads to “slap” into the media, causing physical damage, pieces of the media to flake off, etc. Remember, a head-slap is similar to an “airplane flying into a boulder” slap – probably a very bad thing!

“Bit Rot”

Bit Rot is the term for when during the aging of the drive media, heads, or circuitry it has areas which change. These changes may be very slight, but they may make areas of the drive “weaker” or less able to hold the magnetic pattern (your data) written on those areas. Or perhaps there was a very fast and slight power glitch just as the data is being written or read from the drive. The media may not have any physical damage, but the data on isn’t correct.

How does the drive deal with errors?

When data is written to a disk it includes error-checking data (ECC) which the drive can use to:

check that the data being read is still valid, and
possibly correct the data if it is no longer valid.

ECC methods can detect a single bad bit of data out of the entire block (normally 512 bytes or 4096 bits). That’s pretty impressive error detection and correction!

If the data cannot be recovered using ECC methods then the computer or drive will probably retry the read or write operation a few times to see if the error was just a fluke.

If the data still cannot be recovered the drive will try to reallocate the bad block – moving the data from the bad block to one of the drive’s spare blocks.

Differences between SAS/SCSI/FC and SATA

Since this article is specifically about SATA drives we won’t take too much time to discuss SAS/SCSI/FC drives. Other than to point out that SAS/SCSI/FC drives allow the user to adjust the retry and ECC process. SATA drives to not allow these changes or adjustments to be made.

Differences Between “Consumer” and Enterprise” Drives

Enterprise class drives are typically configured in the server into RAID arrays. RAID arrays can deal with errors without help from the drive, and so they don’t need or want the drive to take the time to try to do its own error correction. Drive error correction methods take finite amounts of time.

In fact a SAS/SCSI/FC drive will show you various types of error counts – error correction which took a lot of time (probably retries) and those that took less time (probably ECC) –

Even non-RAID type of enterprise applications may need the drive to treat errors differently from one case to another. For example, in a system that is collecting video data which is coming in very fast and can only be captured once may want to ignore all types of error correction and just try to capture all of the data – errors or not.

Versus a disk drive holding your bank balance, where speed is not anywhere near as important as data perfection.

As mentioned above, SAS/SCSI/FC drives allow you to adjust these error correction methods while SATA drives do not.

Which is why there are “desktop” SATA drives and “enterprise” SATA drives – the drives will have different firmware to deal appropriately with errors in the intended application or use of the drive.

How to Monitor or Check for Drive Errors

To check for drive errors on a SATA drive you need to look at the drive’s SATA SMART DATA ATTRIBUTES, in particular ATTRIBUTES 5 (Reallocated Sector Count – sectors or blocks which have already been reallocated) and 197 (Current Pending Sector Count – sectors waiting to be reallocated. In the STB Suite you can see these ATTRIBUTES by using the top menu ATA/SATA->Commands->View SMART Data function. In multi-drive mode (DMM) you use the SMART test step to record the ATTRIBUTES to the log files and also to screen or fail drives which exceed your chosen thresholds for these counts.

Can Errors be repaired or removed?

SATA drives will automatically check for errors every time a READ or a WRITE operation is executed. If the READ/WRITE fails then the drive will try to recover the data and possibly mark the sector as bad and reallocate it to a spare sector. If that happens you should see the SMART ATTRIBUTES 5 and/or 197 increment.

Note: this repair or reallocation is automatic in SATA drives, so a good way to scan a drive to try to get any bad sectors reallocated is to use the STB Suite to do a simple Sequential Read test to the entire drive – all blocks.

How Many Errors is Too Many?

In my personal opinion and use I would never use a drive to hold valuable data if it had any reallocated sectors.

That means I personally would reject any drive with > 0 reallocated sectors. That’s just my opinion – here is my reasoning:

If the drive reallocated a sector it will always be because of a “problem”. That problem could be as benign as a power fluctuation during the read or write – where there really isn’t anything in the drive causing the problem. OTOH, that error could be because the drive has been impacted while running, causing head slap and media damage. You can’t know why a sector was reallocated, only that it was reallocated. I would not take the chance. But of course the threshold settings to reject a drive for this cause is 100% user decided and defined in the STB Suite.

Can I reset the Error Counters?

In a word – “no”.

Summary

In one sense SATA drives are simple as far as dealing with discovering and trying to correct or repair errors. There are really no user accessible adjustments to change. A simple sequential read test will discovery and hopefully correct any errors found.

A DMM test sequence to do this would look something like this:

Note: This is a non-destructive test – it will not damage, change, or overwrite any user data.

Use the SMART test step to record all of the pre-test SMART ATTRIBUTE values to the .log files.
Do a Sequential Read test of the entire drive, exactly like this –
Then do another SMART test step – so you can compare ATTRIBUTES 5 & 197 to see if any new bad sectors were discovered and remapped. Your test sequence should look like this –

What is Performa?

Performa is the STB Suite annual support and maintenance plan.

In most cases each purchase of the STB Suite includes 12 months of Performa coverage.

What does that coverage include?

Updates to the STB Suite
- There are typically two major updates to the STB Suite per year. In between these major updates there are typically a number of maintenance updates which will be used to fix bugs and occasionally introduce new features.With Performa coverage you are entitled to all of these.
Product Support
- Performa coverage provides you with contact with our development team, to answer questions, discuss changes or improvements, etc. With decades of storage experience our support team is willing and able to help you.Our World-class support typically responds to email support issues within one hour!
New License discounts
- SCSI Toolbox now offers attractive discounts on new licenses when you keep your licenses covered by the Performa program.

June 2016