IT engineering and a little bit of hacking: May 2011

Tuesday, May 24, 2011

NetApp Training Brain Dump: Useful Links

Netboot basics:
http://www.wafl.co.uk/netboot-a-netapp/

Max drives for the 16TB aggregate limit:
http://now.netapp.com/NOW/knowledge/docs/ontap/rel73/html/ontap/rnote/rel_notes/reference/r_oc_rn_feat73_aggr-size-max-drives.html

VIF's Explained:
http://netappsky.com/netapp/netapp-vif-survival/

Fibre Cable Labels explained (translate to english at top right):
http://www.fcoe.ru/index.php?option=com_content&task=view&id=312&Itemid=46#addcomments

NetApp Training Brain Dump: SnapMirror/SyncMirror (Data Replication)

In project planning, it's well known that there are three competing constraints: time, money, and scope. You constantly negotiate with stakeholders to squeeze as much as you can out of each, but in the end you're dealing with a reality that most of your job is navigating those constraints. Engineers are extremely familiar with this reality, even if they've never heard of the Triple Constraint Triangle:

Credit: Wikipedia

SnapMirror is Netapp's software that handles data replication from one system to another, giving you another copy of your information. This can be done for DR, for backup, to provide quick access, to spread out CPU utilization, to minimize traffic across distances, or for any number of reasons. Businesses find this service invaluable, and the data replication industry is expected to grow from $2.7B in 2007 to $4.4B in 2011. SnapMirror comes in synchronous, semi-synchronous (CP Forwarding), and asynchronous modes.

SyncMirror vs Sync Snapmirror: there are a couple important distinctions here. For one, SyncMirror is only used as the replication between two FAS systems in a MetroCluster. Also, SyncMirror works at the aggregate level, whereas Sync Snapmirror operates with volumes and qtrees.

In data replication, a big advantage of SnapMirror is NetApp's implementation of network compression (only available in async mode), which allows you to speed up the transfer while reducing bandwidth utilization by compressing the data before transmission on the source side and decompressing it before write on the destination. In this, you find another constraint triangle: bandwidth utilization, speed of transfer of compressed data, and CPU utilization. In order to compress data that is being transferred at high rates, the CPU has to increase the number of calculation operations per Gb transferred. If you keep transfer rate low and steady, the CPU utilization will stay correspondingly low.

Credit: Me!
(Please note that this graph demonstrates a
relationship, and is not accurate to actual system
statistics)

Obviously, you want to keep transfer rate high as possible but the other two as low as possible. The effect of this is for every increase in transfer rate, there is either a corresponding increase in CPU or bandwidth utilization, or both.

Quick hits:

SnapMirror can replicate at the volume and qtree level.
Consider using a WAN compression device (e.g. RiverBed SteelHead) instead of SnapMirror to compress ALL SAN traffic (don't use both). SnapMirror compression obviously just compresses SnapMirror traffic. WAN compression devices handle latency/packet loss more efficiently, as well.
NetApp advertises compression rates of 3.5:1 for Oracle, 2.7:1 for home directory, and 1.5:1 for Exchange. YMMV.
Checkpoints are once per 5 minutes. If the transfer is aborted/interrupted, it will begin replication again at the last checkpoint.
In sync mode, writes to the source NVRAM are immediately transferred to the dest NVRAM. This is called NVLOG forwarding. After a 25s NVLOG Forwading timeout, the process is relegated to semi-sync status.
Consistency Points are when the contents of NVRAM are flushed to the local disk, which occurs in certain situations, e.g. the NVRAM of the source is half full. CP's are also generated every 10s. These cache dumps are forwarded to the dest: a 1 min timeout in this process will relegate the replication to async status.
You can obviously transition back into sync from async.
Initial SnapMirror replication is very disk and CPU intensive, partially due to the amount of data, partially due to background processes like deswizzling. Subsequent mirroring of the same data has a drastically lower impact.
Considerations for sizing of volumes are important. Flexclones/snapshots introduce complications for this process.
Things you need to be careful of:

Changing source/dest volume names
Changing source/dest volume sizes
Change hostnames
Changing ONTAP versions
Deleting/creating luns/snapshots/etc on either side

Sources:

Async: http://www.netapp.com/us/library/technical-reports/tr-3446.html

Sync and Semi: http://media.netapp.com/documents/tr-3326.pdf

Monday, May 23, 2011

NetApp Experience: Think on your feet (2)

Shelf Add Issue (AS) Setup:

Amber light on one controller when we got there. Autosupports indicate that there is traffic intended for the partner's FC ports bouncing off one of the controllers, indicating zoning may be misconfigured.
Found a DS14 shelf powered on but connected to nothing (!?). We added this to an existing loop after consulting with a very happily surprised customer.
Added 4 port HBA’s.

OS upgrade issue: The OS upgrade would not take. We were finally able to effect the update using the software update -r, which stopped the system from automatically rebooting. After a manual reboot, the system worked just fine. Our running theory at this point is that the backup kernel re-asserted the previous OS upon automatic reboots, and by rebooting manually we disrupted this process.

Disk issue: After hot adding a 6 shelf stack of DS14’s, Loop A could see only 2 disks in shelf 5, and Loop B could see only 12 disks in shelf 5. This behavior was exhibited by both controllers. Error observed:

"[FAS3XXX: fci.device.invalidate.soft.address:error]: Fibre Channel adapter 4a is invalidating disk drive 4a.1 (0x0d000001) which appears to have taken a soft address. Expected hard address 93 (0x45), assigned soft address 1 (0xe8).

[FAS3XXX: config.NotMultiPath:warning]: Disk 3a.93 and other disks on this loop/domain are not multipathed and should be for improved availability"

Resolution:
We attempted to re-seat the ESH modules, to no effect. NGS recommended removing/re-inserting the disks one by one. This allowed the system to reset the soft ID’s and determine hard ID’s. Per NGS:

"Usually, soft address assignments occur when there is a shelf ID conflict. A mechanism is designed to read the shelf ID from the corresponding select switch by performing a shelf power ON and then recording the shelf ID in memory.

This will record the status of the select switch, in case it is changed during the shelf running time. It is possible that the data recorded in the memory was corrupted for some reason, which lead to a situation where the newly inserted ESH4 is provided the same two shelf IDs. There is a possibility that a disk did not accept the hard address provided by ESH4.

If there is one disk in the shelf, ESH4 will stop using the hard addresses and then allows the HBA card to assign soft addresses. Such a possibility is higher when there are many disks.”

Conclusion: If that didn’t make sense to you, you’re not alone. I feel that description is pretty unlikely – memory corruption in a shelf module? One clue we took note of is there was a shelf-to-shelf cable that wasn't quite happy with how it was seated when we hot added the stack, and we needed to push it in further. It’s conceivable that this connection was intermittently able to communicate, and the system saw two shelf 5’s, one of which was bouncing on and off-line. Either way, good learning experience!

NetApp Experience: Think on your feet (1)

Volatile /etc/rc file (CL)

This was an interesting one. When you make a change to the configuration on a NetApp system, it will take effect immediately. The important thing to realize is that it won’t be permanent unless you save this configuration (by making an identical change to the /etc/rc file), which effects the change to the /etc/rc file. Unsaved changes are reverted back to the pre-change state any time the memory is cleared, e.g. reboot, power off, etc. For this customer, we were hot adding*1 expansion FC PCI cards to add a couple stacks of shelves, and literally walked onto a landmine.

What had happened is the customer had made significant changes to the network settings on the system but not saved them. When we brought the first CPU module back up, the customer found that it was unresponsive although we could find no problem with it. The customer and tech lead made the decision to move forward with the change to the second CPU module, at which point the entire system became unresponsive. This is because the changes the customer had made to the network settings were completely reverted upon reboot, causing a 15 minute outage while we tracked down the problem.

This problem was particularly tough to decipher because there was nothing wrong with the actual system – the issue was invisible to anyone but the admin who had made the changes, who was not on site.

Take aways:

- Definitely take a look at the /etc/rc file and make sure it lines up with the current settings.

- Possibly start off by saving the current configuration and backing it up. I’ll have to look into the pros and cons on this – anyone with thoughts feel free to add in the comments.

*1 A hot add of PCI cards isn’t really a hot add, since it requires you to fail over and shut down one of the CPU modules at a time. This does require a small outage (30-120s) for the fail back – the failover is just a blip.

NetApp Training Brain Dump: NCDA Notes

Notes on this test:

- Understanding what can replicate from x86 to x64 is important
- Heavy emphasis on commands, especially on replication.
- Heavy emphasis on SnapMirror and SyncMirror
- Heavy emphasis on SnapVault
- You need to understand how qtrees relate to SnapMirror/SnapVault/permissions.

Friday, May 13, 2011

NetApp Training Brain Dump: Aggregate/RAID Group/Shelf Planning Calculator

Capacity Planning Calculator
Part of understanding how to implement a NetApp system is figuring out the layout of your disks. Doing the math a couple of times definitely helps, but at some point it's nice to have a Capacity Planning Calculator. So here it is, a quick and easy calculator in excel format to help you figure out the relationship between your RAID Group size and quantity, shelves, drive sizes, and spare requirements.

Please note that although this table is true to NetApp's documentation, there's no way for this to be 100% accurate in this without using NetApp's actual ONTAP code, which I don't have access to :-) This calculator is meant for planning in a simple, understandable format. Technical notes below, but I highly recommend reading this before working with this tool.

Capacity Planning Calculator Link (not a virus I promise): http://www.box.net/shared/l8v66b8jzx
I password protected portions of the calculations to make it clear what you can edit and keep life simple. If you wish to improve upon or edit this sheet, the password is netapp

Enjoy!

Courtesy: me!

Notes (You're gonna want to read these):
- Disk manufacturers reserve 7% of space to account for failed sectors.
- WAFL reserves 10%
- Fields you may alter are marked white. Do not change fields marked grey.
- Two drives per RG are reserved for RAID DP. You may edit the number of spares you wish to keep.
- All numbers are in TB. Convert your drive size to TB (e.g. 300GB = 300GB/1024GB = .293TB).
- The largest 15k SAS disk available as of 5.13.2011 is 600GB
- The largest FC disk available as of 5.13.2011 is 750GB
- Aggregates size limits do not count space lost to parity, spares, or disk reserve. Click here for NetApp documentation (NOW login required)
- Assumes full shelves.
- If you want a super deep dive into space reservation with ONTAP, try here: http://rogerluethy.wordpress.com/2011/01/14/play-with-netapp-numbers/

Friday, May 6, 2011

NetApp Experience: Controller Panic

Was shadowing a shelf add recently and got to observe a pretty hairy situation. Here's the rundown:

11:00pm

A DS14mk4 shelf was added to a production HA FAS6080 running ONTAP 7.3.3. The shelf was intended to be shelf 2 in the loop, but the shelf ID was still set to 1 when it was added.
Panic and Failover occurred from the controller who owned all the disks on that shelf.
New shelf ID is set to 2, the correct ID.
The partner node assigned soft ID's to the new disks.
The partner did not recognize all of the real shelf 1's disks, and began rebuilding.
As many as 8 disks began rebuilding in bay 27, 28, or 29 of several loops. Seems like this client keeps their spares in the last couple bays of the second disk shelf per loop, or the first bay in the third shelf.

Errors generated by adding the shelf with a wrong shelf ID, in chronological order:

fci.device.invalidate.soft.address adapterName="0a" deviceName="0a.0 (0x04000000)" hardLoopId="17"
scsi.cmd.selectionTimeout deviceType="Disk" deviceName="0a.17"
disk.ioFailed deviceName="0a.17"
scsi.cmd.noMorePaths deviceType="Disk" deviceName="0a.22"
scsi.cmd.noMorePaths deviceType="Disk" deviceName="0a.23"

03:00am: Disks completed rebuilding. 8 disks on real shelf 1 still not being recognized.

04:00am: FSE arrives onsite.

04:30am: 20 minute outage action plan developed:

Shut down all systems accessing the data.
Disable protocols.
Halt both controllers (take them offline).
Reboot disk shelf 1 and 2.
Boot up controllers.

06:00am: No action taken. Customer and NetApp decided to let the system stay stable into production hours and address it the next night.

10:00pm: Action plan started (shut off systems accessing the data, etc).
10:19pm: Both controllers shut down.
10:28pm: Both controllers up and functioning normally.

Notes:

No outage occurred until the controlled failback.
No data loss occurred.
We have no insight into the effect on performance.

Take aways:

Having lots of spares can pay off.
Make sure there are no more than two disks per RG on any one shelf.
Human error is much more likely than mechanical failure or software bug to cause a major disruption.

Tuesday, May 3, 2011

DATA ONTAP 8.0 Simulator

Installing DOT 8.0 Simulator on two laptops:

Laptop #1: Windows 7 Professional x64 SP1 on Dell D630 w/ T7300 2 Duo proc
Laptop #2: Windows 7 Ultimate x86 on HP 8510w w/ T9300 2 Duo CPU

Funny things is, I got the exact same results on both. Here was the timeline.

- Installed the Simulator: http://now.netapp.com/NOW/download/tools/simulator/ontap/8.0/ (login required)
- Installed VMware player: http://downloads.vmware.com/d/info/desktop_downloads/vmware_player/3_0 (free login required)
- When I went to run the machine, I got a prompt telling me to enable "Virtualization Technology" and "Trusted Execution" in the BIOS. Rebooted, enabled VT (Trusted Execution not an option this hardware), save and exit, boot up.

Then I got a curious prompt: "Continue without 64-bit support?"

Courtesy: Me

Options:

If you click on the link, it'll take you here: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003945, informing you that x64 guest OS's are only supported by specific x64 hosting proc's. My knowledge in current CPU architectures isn't enough to know whether laptop#1's T7300 is covered under "Intel EM64T VT-enabled processors (experimental support)."
If you press yes, the machine begins to boot and then shows you "BTX Halted."
If you press no...well, you don't have much of a simulator.

But RFTM (linked below)*1 says you have to actually shut down and not just restart after your BIOS VT change. Shut down, came back up, works great on both laptops! On to the next challenge! Perhaps I'll read the instructions this time.

Note: Make sure your BIOS is up to date. Dell BIOS v16 for laptop#1 gave me trouble, v17 worked fine.

Config:

Open VMware Player, highlight the machine. In the Virtual Machine Settings (lower right), remove Network adapter 1. Set Network Adapter 3 to "Bridged."
Boot up, Control-C when you can. Choose option 4, agree to the prompts in the middle to the process by typing "y"
Let it boot up again, it'll zero out the disks.
Let it boot up once more, it'll take you through setup.
I wasn't able to get VMware tools installed properly.

Done! Continue reading here for how to get it on your network.

Sources
*1 http://now.netapp.com/NOW/knowledge/docs/simulate_ontap/Simulate_ONTAP_8.0.pdf (login required)
Thanks to: getgreenshot.org for the picture above.

Monday, May 2, 2011

NetApp Training Brain Dump: ONTAP 7.3 Boot Menu Option Flowchart

One of the cool things about ONTAP is that it has a very intelligently designed boot process that gives engineers different contexts that are optimized for what you're trying to do: CFE, Diag mode, Maint mode, special boot menu, etc. You can think of the each of these as a room inside garage of sorts - lots of tools, no interference, no distractions. You can click here for more information on these menus.

The only thing is, navigating these contexts and knowing which door to walk through is difficult for people who learn visually. Since I couldn't find an official version of this online (correct me if I'm wrong!), I generated one myself. I tried to leave out details in the menu that don't relate to changing contexts - you can find information on the menus themselves elsewhere in my blog. Please note that this is ONTAP 7.3 specific, ONTAP 8 has a dramatically different landscape.

Courtesy: Me!
(click the picture for zoom)

Suggestions/corrections always welcome!

Props:

openoffice.org for flowchart software

pdfcreator.com for converting my .odg to .pdf (my screenshots were coming out poor quality)

http://convert.neevia.com/pdfconvert for converting my .odg to .jpg.