Can you tell me how to test if vibration in my enclosure could be causing problems?

vibrationtestingx350

I saw a video on the internet illustrating how vibration can effect drive performance and even cause errors –

You and the video are absolutely correct – vibration from or within a drive enclosure can cause problems. Here’s how to use the STB Suite to test for this.

To do this test we are going to use two simultaneous instances of STB Suite DMM. One instance will select all drives minus 1 and execute a random access test on those drives. That will set up the right circumstances for maximum vibration within the enclosure as each drive randomly seeks.

While those drives are shaking themselves and the enclosure we will execute a Sequential Write and then Read test on the remaining drive. This drive should be a SAS, SCSI, or FC drive so we can monitor the drives Log Pages for various types of uncorrected and corrected errors.

 

Here are the steps to accomplish this vibration testing:

  1. Load up your enclosure with your normal load of drives. We recommend that at least one of the drives be a SAS, SCSI, or FC drive because these types of drives have much better error reporting than SATA drives. It’s OK for only one drive to be SAS/SCSI/FC.
    In this example we have a total of 8 drives.
  2. Start an instance of the STB Suite in Multi-Drive Mode (DMM)
  3. Now run a quick test to get an error baseline picture of your main test drive. Select the SAS/SCSI/FC drive that will be running the sequential test and run this short test sequence –
  4. Run this test on one drive. This will record what type of errors this drive will report during a sequential test with no induced vibrations. We will run this baseline test on Target 264.
  5. Now start another instance of STB Suite – Multi-Drive Mode. Leave the first instance up and running.
    We will use DMM Instance #1 to induce the vibration by running Random seek tests on all but one drive.

    Define the Random test in DMM #1 like this:
  6. Select all but one drive in DMM #1. We will leave the first drive at Target 263 not selected.
  7. Now in DMM #2 define the sequential Write and Read test, like this

    Note that the Write and Read test uses the default transfer size of 128 blocks and does NOT have the Advanced Option FUA Set turned on.
  8. Select the drive that was not selected in step 6 – Target 264
  9. Start the test on DMM #1, then start the test on DMM #2
  10. When the Sequential test on DMM #2 finishes open the DMM Log File for drive 264 and examine the Write and Read Error Log Page entries. This .Log file will have the first baseline Log Page results, measured while the enclosure is not vibrating, and then at the end of the file it will have the Log File results from when the other drives were vibrating the enclosure.

Here are the Write and Read Error Count Log page baseline values –

And here are the Write and Read Error Count Log Pages while the other drives were vibrating –

In our case there is a very small increase in the number of corrected errors while the enclosure is vibrating. In this case it would probably be  prudent to run the above tests for a longer period of time. We recommend that if you do see a small difference like this that you change the test definitions to run for the same length of time that your standard disk tests run for, and see what the increase in errors is over that period of time.

If you see that the second set of log page Write & Read Error entries have significantly more errors, particularly various corrected errors, then you likely do have a vibration issue in your enclosure.

And if you see an increase in UnCorrected errors during vibration run the test again for a longer time. If you still get Uncorrected errors during vibration that indicates that there is a serious vibration issue with your enclosure.

If your normal testing does any screening on Log Page error Page/Parameters be certain that you take this vibration issue into account so you won’t be fooled by errors caused by enclosure vibration instead of actually drive problem errors.