Wednesday, July 6, 2011

NetApp Experience: Think on your feet (Part 3)

One of the little known intricacies of FAS systems is the 3-5 second rule.  It's kind of like the rule where your food hits the floor, except instead of germs on your food, you get a panic'd filer. 

The 3-5 second rule is not science.  It's not best practice.  It's not in a white paper.  It's just experience.  What the rule says is this: you can unplug a shelf cable and plug it back in before 3-5 seconds pass.  Now, as a computer engineer, the 40% ambiguity is frightening to me.  But the principle is sound: if a shelf loses contact with the controller or the rest of the shelves, it will tolerate the momentary issue.  

I saw this in action recently.  A shelf to shelf cable to a new DS14 shelf had been plugged in from A module to B module to the existing loop (it should be A module to A module).  The system was live, with disk autoassign turned off.  We quickly unplugged the shelf to shelf cable from the B module and plugged it into the A module.  What we observed was a momentary error stating that the ESH module was blocking traffic, which quickly reconciled.  This did cause an amber light to turn on for half an hour, which apparently lags the actual state of the machine significantly.  

Nuance.

No comments:

Post a Comment