Monday, May 23, 2011

NetApp Experience: Think on your feet (1)

Volatile /etc/rc file (CL)
This was an interesting one.  When you make a change to the configuration on a NetApp system, it will take effect immediately.  The important thing to realize is that it won’t be permanent unless you save this configuration (by making an identical change to the /etc/rc file), which effects the change to the /etc/rc file.  Unsaved changes are reverted back to the pre-change state any time the memory is cleared, e.g. reboot, power off, etc.  For this customer, we were hot adding*1 expansion FC PCI cards to add a couple stacks of shelves, and literally walked onto a landmine.  

What had happened is the customer had made significant changes to the network settings on the system but not saved them.  When we brought the first CPU module back up, the customer found that it was unresponsive although we could find no problem with it.  The customer and tech lead made the decision to move forward with the change to the second CPU module, at which point the entire system became unresponsive.  This is because the changes the customer had made to the network settings were completely reverted upon reboot, causing a 15 minute outage while we tracked down the problem.

This problem was particularly tough to decipher because there was nothing wrong with the actual system – the issue was invisible to anyone but the admin who had made the changes, who was not on site.  

Take aways: 
- Definitely take a look at the /etc/rc file and make sure it lines up with the current settings.
- Possibly start off by saving the current configuration and backing it up.  I’ll have to look into the pros and cons on this – anyone with thoughts feel free to add in the comments.

*1 A hot add of PCI cards isn’t really a hot add, since it requires you to fail over and shut down one of the CPU modules at a time.  This does require a small outage (30-120s) for the fail back – the failover is just a blip.

No comments:

Post a Comment