What is the SCSI Toolbox JEDEC Application?
SCSI Toolbox’s new JEDEC application implements the Endurance/Stress Testing of solid state drives as outlined in the JEDEC documents JES218A and JES219. These documents specify an extremely complex I/O generation pattern, consisting of various transfer sizes, each transfer size requiring a specific probability, and targeting particular sections of the drive with varying probabilities, and aligning transfers that are 4K or larger to be aligned on 4K-boundaries. In addition to these complex requirements, when testing more than 1 drive, each drive must have their targeted sections shifted by 5%. The details of all these requirements will be discussed in various sections below.
What does the SCSI Toolbox JEDEC Application do?
The SCSI Toolbox JEDEC Application issues 72 different types of I/Os!! Each type of I/O consists of following type of information:
- Write or Read
- Transfer Size (1 block all the way thru 128 blocks)
- Section of drive to target (first 5% of drive, next 15% of drive, final 80% of drive)
- Probability of the I/O, that is how often or frequent the I/O must be issued
Below are some examples of what each I/O “looks” like:
Example 1: Write, 1 Block, target first 5% of drive, probability = 1%
Example 2: Read, 4 Blocks, target section 5-to-20% of drive, probability = .15%
Example 3: Write, 8 Blocks, target final 80% of drive, probability = 6.7%
How are we getting these probabilities? In the JEDEC documents they specify that the first 5% of drive must be targeted 50% of the time, the next 15% of the drive must be targeted 30% of the time, and the final 80% of the drive must be targeted 20% of the time. In addition to these probabilities, the probabilities assigned to each transfer size is as follows:
Transfers of 1 Block must have 4% probability
Transfers of 2 Blocks must have 1% probability
Transfers of 3 Blocks must have 1% probability
Transfers of 4 Blocks must have 1% probability
Transfers of 5 Blocks must have 1% probability
Transfers of 6 Blocks must have 1% probability
Transfers of 7 Blocks must have 1% probability
Transfers of 8 Blocks must have 67% probability
Transfers of 16 Blocks must have 10% probability
Transfers of 32 Blocks must have 7% probability
Transfers of 64 Blocks must have 3% probability
Transfers of 128 Blocks must have 3% probability
And finally, Writes must have 50% probability while Reads must also have 50% probability.
Putting all of these together, let’s see how we got the probability in our three examples above.
Example 1: The Write must occur with 50% probability, 1 block transfers must occur with 4% probability, and the first 5% of the drive must occur with 50% probability. Multiplying these out we get
(0.5) * (0.04) * (0.5) = 0.01 (which is 1% probability)
Example 2: The Read must occur with 50% probability, 4 block transfers must occur with 1% probability, and the section of the drive from the 5% mark to the 20% mark of the drive must occur with 30% probability. Multiplying these out we get
(0.5) * (0.01) * (0.3) = 0.0015 (which is 0.15% probability)
Example 3: The Write must occur with 50% probability, 8 block transfers must occur with 67% probability, and the final 80% of the drive must occur with 20% probability. Multiplying these out we get
(0.5) * (0.67) * (0.2) = 0.067 (which is 6.7% probability)
How does the SCSI Toolbox JEDEC Application guarantee all the assigned probabilities?
The SCSI Toolbox JEDEC Application issues 72 different types of I/Os, which means the application must guarantee 72 probabilites, one for each I/O!! The engineers at SCSIToolbox have developed a function that uniquely maps random numbers to each of the 72 I/Os. It is beyond the scope of this document to describe how this function works, but it suffices to say that each of the 72 I/Os are chosen “randomly” with their assigned probabilities. One cannot guess what sequence of I/Os will be generated. As an example, after running the application for say 1,000,000,000 I/Os (1 billion I/Os), approximately 6.7% of these 1 billion I/Os (or 6,700,000 I/Os) will fit the profile described in Example 3.
On the screen I see the column “Data Errors” – what is it?
Data Errors are errors occurring from either a data miscompare or from the inability of the drive to actually retrieve the data from a block on the drive (that is the drive reports a check condition with sense key of 0x03). The “Data Errors” column on the main display indicates the number of these data errors. This number will almost always be 0. By way of an example, if qualifying 31 Enterprise Class drives with a TBW Rating of 100, you can only afford 1 data error over ALL 31 drives! If qualifying 31 Client Class drives with a TBW Rating of 100, you can afford 22 data errors.
How do I utilitize the “Update TBW Rating Information”?
The information display after clicking the “Update TBW Rating Information” is to give guidelines for how many drives you must test and the number of “data errors” and “errors” you can afford to get to achieve your desired TBW Rating. The TBW Rating is the number of terabytes you must write to each and every drive under test and satisfy certain mathematical equations specified in JEDEC document JES218A – in particular these are equations (2) and (3) from section 7.1.1. After clicking “Update TBW Rating Information”, the information display applies equations (2) and (3) to the specific information you have supplied (which is what type of drives you are qualifying, which would be either “Client Class” or “Enterprise Class”, and the targeted TBW Rating) and then displays the number of drives you must qualify if you “allow” for getting a certain number of “data errors” and “errors”.
It is helpful to see a concrete example of using Equations (2) and (3). In our example we will assume you are qualifying “Client Class” drives and want to achieve a TBW rating of 100. The two equations become
(2) UCL(errors) <= .03 * SS
(3) UCL(data-errors) <= 100 * 8 * 10^12 * 10^(-15) * SS
Here SS = Sample size (i.e. the number of drives to test)
Equation (2) indicates that if you get 0 errors, so the UCL value is 0.92, that you must satisfy the equation
(2) 0.92 <= .03 * SS (or 30.6 <= SS)
This means you MUST test atleast 31 drives!!
Now we must also satisfy Equation (3). For 0 data-errors, the UCL value is 0.92, so the equation becomes
(3) 0.92 <= 100 * 8 * .001 * SS (or 1.15 <= SS).
So Equation (2) is more “restrictive” in this case in that it indicates you must test 31 drives, while satisfying Equation (3) indicates you must test atleast 2 drives. Summarizing, to satisfy both Equations (2) and (3) with 0 data errors and 0 errors means you must test 31 drives!
Another question that arises is “Can I have any errors at all?” The answer is yes you can. In our example above, you can get 22 data errors and still satisfy both Equations (2) and (3). The reason for this is that UCL(22) = 23.89 (retrieved from Table 2 section 7.1.1) and so Equation (3) becomes
(3) 23.89 <= 100 * 8 * .001 * 31 (or 23.89 <= 24.8)
If you followed this example, note you cannot afford to get 23 data-errors since UCL(23) = 24.92.
There is hundreds of combinations of # of drives to test, data-errors, errors. When you click “Update TBW Rating Info” it simply displays three of these combinations. The three combinations it displays are
- Number of Data-Errors = 0, Number of Errors = 0, Number of drive to test
- Number of Data-Errors = 1, Number of Errors = 0, Number of drive to test
- Number of Data-Errors = 0, Number of Errors = 1, Number of drive to test
What is the typical usage of the SCSI Toolbox JEDEC Application?
The typical usage is to exercise a set of drives by stressing these drives using the I/O pattern generation described in the above sections (that is to generate 72 I/O profiles, each profile varying its transfer length, and where on the drive it should target, and assigning probabilities to each profile). This type of Endurance/Stress Testing is described in JEDEC documents JES218A and JES219.
First and foremost, the SCSI Toolbox JEDEC Application is designed to implement the extremely complex Endurance/Stress Testing described in JES218A and JES219. A second usage of the application is to gather statistical information on the drives under test so that the manufacturer can get a TBW (TeraByte Write) rating for their particular class of drives. The statistical information gathered is the number of Data-Errors for each drive under test, and the total number of Data-Errors for all drives. It also gathers error information for each individual drive and the total number of errors for all drives. This statistical information is continually fed into mathematical equations (2) and (3) from JEDEC document JES218A and determines whether or not the desired TBW Rating is achievable. When the TBW Rating is NOT achievable, a RED light will be displayed.
Can I use the SCSI Toolbox JEDEC Application for anything other than getting TBW Rating?
The SCSI Toolbox JEDEC Application can also be used to stress test a drive. The application does NOT simply issue a bunch of Writes to a drive and then a bunch of Reads to a drive. The application consists of 72 different types of I/Os! These 72 different I/Os are randomly distributed across the drive, with each I/O profile having a certain guaranteed number of times it will be utilized. The application truly stresses the drive!