IT engineering and a little bit of hacking: 2011

Saturday, December 31, 2011

BJJ

A few criticisms and notes:

Jon: Roll like there's two feet of water on the floor. Roll less jerky, flow, let the jiu jitsu do what it wants. Don't lay a web.

Me: Don't wait when someone passes your guard: get the underhook. Don't turtle - it's a transition, not a position. Watch your ankles. Work on spider. Work on de la riva. Look for seminars at gracie barra.

Thursday, December 15, 2011

Tournament Recap

Notes:

Work on stand up.
Work on attacks from guard (sweep, choke) and how to attack someone who's stalling.
Work on balance from side control.
Work on getting out of turtle.
Elevator sweep technique.

Wednesday, December 7, 2011

BJJ Tournament 3

I'm a few days away from my third tournament. Tournament 1 I tried blue belt NAGA @199, lost twice in a row to the same guy - I wasn't tight enough or patient enough. Tournament 2 @210, I went 7-3 in white belt and absolute (all belts) - I did great against most people, but got gassed at the end because my competitors hadn't done as many matches as me.

Tournament 3 is going to be @199 again, and I'm cutting weight from 208. The over-hydrate process hasn't been too bad, but here's a few notes:

Nutrition Plan:
32 ounces = 1 quart
4 quarts = 1 gallon
256 ounces = 2 gallons
Aiming for 1500 calories/day, 1 banana per day.

My plan:
Day T-5: no carbs, 1.5 gallons of water. Sodium.
Day T-4: no carbs, 2 gallons of water. Sodium.
Day T-3: no carbs, 2 gallons of water. (only got 210 ounces) Sodium.
Day T-2: no carbs, 2 gallons of water. Sodium.
Day T-1: no carbs, 1 gallon of water before 6pm, no eating/drinking after. Basketball game at 7pm to get a sweat in. no sodium.
Weigh-in Day: no meals/drinking til weigh ins at 6pm. 2 energy bars allowed during the day to keep functioning at work. Sauna option if my weight isn't down far enough. No sodium.

Pro Tips:
- Gum helps fight dry mouth.
- Drinking a ton of water sucks. Pee as often as you can (teaches your body to process quickly), and I use crystal light to make the water palatable.
- Your body can process about 32 ounces/hour AT MAX, and sustaining that rate is dangerous. Be disciplined, spread that liquid out over the day.
- Don't drink water in the last couple hours before bed - it'll keep you up all night!
- Processing this much cold water during the winter can lower your body temperature. Just keep this in mind.

Fight game plan:
- Aggression
- Go for leg triangles as the end game every time, but don't telegraph it. Do the opposite.
- Fake leg submissions to get them out of position.
- Fight like hell to get out of side control when I get into it.
- Roll from turtle
- AVOID LEG LOCKS by keeping tabs on my ankles.

I'm feeling a lot more confident in my ability to pull off arm triangles and armbars right now, and my omoplata is coming along. If I get an omoplata, I'll look to set up a triangle rather than sweep/submit.

From guard, I'll pull him in and look for rubber guard, kimoras (fake the guillotine), and bump sweeps. From his guard, I'll look at the sky and use the ankle break. From his back, I'll focus on RNC but consider armbars and especially bow and arrow. In guard pass, I'll focus on sucking his legs in, keeping my head down, and controlling his hips.

Thursday, December 1, 2011

ONTAP 8.1: What you care about.

DATA ONTAP 8.1: What's new that you care about.

Hugely improved parallel processing software. You should see big improvements in what your CPU's are capable of.
In-place, online 32-64 bit aggregate conversion. Woot!
160TB aggregate limits on 3200 and 6200 series.
Support for SnapMirror across clusters.
Support for SAS shelves in 7-mode metrocluster.
Cluster mode support for NFSv4.
Cluster-mode support for dedupe, 7-mode and Cluster-mode support for compression.
Cluster-mode support for SAN (FC and iSCSI).
Cluster-mode support for non-disruptive OS upgrades.
Cluster-mode support for FlashCache.
Cluster-mode support for SnapMirror?
Cluster-mode support for up to 24 nodes (only 4 if SAN).

Interesting: compression runs before dedupe.

Wednesday, November 30, 2011

AT&T and iPhone

Since my AT&T contract is up, I'm considering upgrading my iPhone 3GS. The rep told me I could save money on my plan by going to a 2GB data limit ($25 instead of $45). I need to keep text and voice as unlimited - my experience is that you'll end up going over and spending that money anyway, so you might as well use it.

Last time I checked, I never used anywhere near 2GB/month...so I checked again, and here's what I found:

Credit: Me and AT&T

My data usage on the same 3GS I've always had has shot through the roof. That may be my job (switched in April), that may be a change in my behavior, that may be software eating up data passively. I don't know - I need to break this down further.

I went to their website | wireless | usage & recent activity | "more details" under "Data" | Selected "Filter Data Usage by: Internet/MEdia Net" and then sorted by transmission size. I found 0.833GB in November from 32 entries in 18 days of usage. I have a great wireless connection at home, so I'll try to utilize that better for the next 12 days and see what the results are.

Wednesday, November 16, 2011

NetApp Insights: Sequential Reads

In the spirit of customizing NetApp technology for your workload and needs, one thing to talk about is sequential reads. The vast majority of customers don't have to worry about this because WAFL is designed to meet 99% of normal use cases, but in some unique situations it pays to optimize.

For example, if your workload is 20 minutes of random writes followed by 20 minutes of sequential reads, and you don’t use any reallocate technology, you will likely see declining performance on your NetApp system over time. Who has this workload? Basically nobody. But for less extreme cases, you may find performance boosts through customizing your volume for sequential reads like this guy, who got a 6% throughput increase.

Here are a couple steps you can take:

set vol option read_realloc.
option wafl.optimize-write-once=false.
schedule a regular reallocate scan.

About read reallocation (Quote)

For workloads that perform a mixture of random writes and large and multiple sequential reads, read reallocation improves the file's layout and sequential read performance. When you enable read reallocation, Data ONTAP analyzes the parts of the file that are read sequentially. If the associated blocks are not already largely contiguous, Data ONTAP updates the file's layout by rewriting those blocks to another location on disk. The rewrite improves the file's layout, thus improving the sequential read performance the next time that section of the file is read. However, read reallocation might result in a higher load on the storage system.

Also, unless you set vol options vol-name read_realloc to space_optimized, read reallocation might result in more storage use if Snapshot copies are used. If you want to enable read reallocation but storage space is a concern, you can enable read reallocation on FlexVol volumes by setting vol options vol-name read_realloc to space_optimized (instead of on). Setting the option to space_optimized conserves space but results in degraded read performance through the Snapshot copies. Therefore, if fast read performance through Snapshot copies is a high priority to you, do not use space_optimized.

Read reallocation might conflict with deduplication by adding new blocks that were previously consolidated during the deduplication process. A deduplication scan might also consolidate blocks that were previously rearranged by the read reallocation process, thus separating chains of blocks that were sequentially laid out on disk. Therefore, since read reallocation does not predictably improve the file layout and the sequential read performance when used on deduplicated volumes, performing read reallocation on deduplicated volumes is not recommended. Instead, for files to benefit from read reallocation, they should be stored on volumes that are not enabled for deduplication. The read reallocation function is not supported on FlexCache volumes.

If file fragmentation is a concern, enable the read reallocation function on the original server volume. (/Quote)

More reading here. Interesting quote from this additional reading: "For write-intensive, high-performance workloads we recommend leaving available approximately 10% of the usable space for this optimization process." Luckily that 10% includes any white space, including unused space carved out for thick provisioned LUN's.

Tuesday, November 15, 2011

NetApp Insights: RAID Group and Aggregate Setup

Since I'm always looking out for the underdog, I'm aware that many customers can't afford to buy 20 shelves of disks at a time. And even though they are smaller customers, getting in at a ground floor with great technology and earning loyalty from a customer is a big priority for any smart company.

If you've just invested in a small NetApp deployment, here are the questions you should be asking yourself:
1. How can I get the most usable space out of my investment?
2. How can I ensure full redundancy and data protection?
3. What configuration will squeeze the most performance out of this system?
4. Where are my performance bottlenecks today in #3's configuration?
5. How long will it take to saturate that bottleneck, and what will be my plans to expand?

I'm going to discuss the configuration options that will both maximize your initial investment and set you up for success in the long term. Be aware that this is a textbook study of tradeoffs between stability, scalability, space, and performance.

A few basics:
1. Each controller needs to put its root volume somewhere. Where yo put it makes a big difference when working with <100 disks.
a. For an enterprise user, the recommended configuration is to create a 3 disk aggregate whose only responsibility is to hold this root volume, which requires no more than a few GB's of space. If you only purchased 24 or 48 disks, you could understandably consider this to be pretty wasteful.
The rational behind this setup is that you isolate these three OS disks from other IO, making your base OS more secure. More importantly, if you ever have to recover the OS due to corruption, the checkdisk process will only have to run against 3 disks rather than several dozen. Lastly, if your root vol ever runs out of space, expect a system panic. By creating a 3 disk aggregate, you protect it from being crowded out by expanding snapshots.
b. For a smaller deployment, another option would be to create an aggregate spanning those 24-48 disks and have the root volume reside there. This is a valid option that is taken by many customers.

2. Each RAID Group (RG) has 2 disks dedicated to parity. Consider this when looking at space utilization.

3. You typically want to avoid having a mixed ownership loop/stack. What this means is within a stack of shelves, do your best to have the disks only owned by a single controller. This is not always achievable right away, but could be after an expansion.
4. Before creating a RG plan for any SAN, one should read TR-3437 and understand it thoroughly. It covers everything you need to know.

Scenario: You purchase a FAS3100 series cluster with 2 shelves of SAS 450GB, no FlashCache. Here are a few of the options available. Note: these IOP numbers are not meant to be accurate, just illustrate the relative merits of each configuration.
1. Create two stacks of 1 shelf. Create two RG's per shelf of 11 and 12 disks each, combined in one aggregate. Leave one spare disk per shelf. Results:
Usable space: 12.89TB
IOPS the disks can support: 175 IOPS/disk * 19 data disks = 3325 IOPS each controller.
Advantages: Easily expandable (existing RG's will be expanded onto new shelves, improving stability), full controller CPU utilization available, volumes on each controller are shielded in terms of performance from volumes on the other controller.
Disadvantages: Lower usable space, lower IOPS available for any one volume, no RG dedicated to root volume.

2. Create two stacks of 1 shelf. Create 1 RG per shelf of 23 disks each, combined in one aggregate. Leave one spare disk per shelf. Results:
Usable space: 14.26TB
IOPS the disks can support: 175 IOPS/disk * 21 data disks = 3675 IOPS each controller.
Advantages: Higher usable space, lower IOPS available for any one volume, full controller CPU utilization available, volumes on each controller are shielded in terms of performance from volumes on the other controller.
Disadvantages: Lower IOPS available for any one volume, no RG dedicated to root volume, lower data protection because of the large RG size, lower stability when expanded because the entire RG is located in one shelf.

3. Create 1 stack of 2 shelves for an active/passive config. Create 4 RG's (14, 14, 15, 3), with the large RG's combined in one aggregate and the 3 disk RG in another. Leave one spare disk per shelf. Results:
Usable space: 12.39TB
IOPS the disks can support: 175 IOPS/disk * 37 data disks = 6475 IOPS for only the active controller.
Advantages: High IOPS available for any one volume, volumes on each controller are shielded in terms of performance from volumes on the other controller, expandable (existing RG's will be expanded onto new shelves, improving stability).
Disadvantages: Lower usable space,  no RG dedicated to active controller root volume, only half the CPU power of the cluster used.

4. Create 1 stack of 2 shelves for an active/passive config. Create 3 RG's (22, 21, 3), with the two largest in one aggregate and the 3 disk RG in another. Leave one spare disk per shelf.  Results:
Usable space: 13.08TB
IOPS the disks can support: 175 IOPS/disk * 39 data disks = 6825 IOPS for only the active controller.
Advantages: Highest IOPS available for any one volume.
Disadvantages:  Lower usable space, no RG dedicated to root volume on active controller, lower data protection because of the large RG size, lower stability when expanded because the entire RG's are located in two shelves, only half the CPU power of the cluster used.

5. Create 1 stack of 2 shelves for an active/passive config. Create 4 RG's (20, 20, 3, 3), with the two largest in one aggregate and two root aggregates of 3 disks.. Leave one spare disk per shelf.  Results:
Usable space: 11.89TB
IOPS the disks can support: 175 IOPS/disk * 36 data disks = 6300 IOPS for only the active controller.
Advantages: RG dedicated to root volumes, high IOPS for active controller.
Disadvantages:  Lower usable space, lower data protection because of the large RG size, lower stability when expanded because the entire RG's are located in two shelves, only half the CPU power of the cluster used.

Here's the break down:

Credit: Me!

This post is long enough already so I'll keep the conclusion short: understand the requirements of your application, and use the examples above to help customize NetApp systems to meet those specs at a low price.

NetApp Insights: FlashCache Doesn't Cache Writes

Below: brilliant article on why NetApp designed its cache product to only affect reads.

http://communities.netapp.com/community/netapp-blogs/efficiency/blog/2011/02/08/flash-cache-doesnt-cache-writes--why

Update: Since writes in ONTAP require several reads to accomplish (updating bitmaps, etc), flashcache can speed up writes. But since that data is often in system memory already, this is complicated and depends on the situation.

Friday, November 11, 2011

BJJ Tournament Progress

One of the tough parts of competing isn't cutting weight: it's how cutting weight affects your training.

I'm down to 204.2lbs this morning, and when you take less calories in than you burn your body feels tired and achy and sore. I wonder if it would be beneficial to add 300 calories, and also add 300 calories of exercise? Is it the calorie deficit that matters in the way your body feels, or the overall nutrition?

Either way, at 11% body fat I certainly have more pounds to cut comfortably without worrying about impact on health too much. It seems I can lose a half a pound a day without pushing myself too hard.

Since I can weigh in the day before (Dec 9th at 6pm), I can lose 3 lbs of water weight, and then gain it back overnight. Which means I really should be 203lbs in the morning!

In the meanwhile, I've made a lot of progress escaping side control. I need to spend time working on my guillotines and darce's next. Booyah!

Wednesday, November 9, 2011

NetApp Experience: MetroCluster Disk Fail

I'm working on a case for a MetroCluster right now. The situation was started because the customer was doing power maintenance, and shut off one of the two PDU's in the rack. In a metrocluster, between the system and the shelves there are redundant fibre channel switches, but in this case the switches had only 1 power supply per switch, but they were spread across the two PDU's. This means that one of the two switches went offline during the work.

The system failed 9 disks in this case, and all 9 of them were being addressed over the switch that went down. NGS has found iSCSI errors over the switch that stayed up during that time. The only firmware that is backrev'd is ESH firmware, so we're gonna have to dig deeper for a solution.

Those 9 disks failing caused a RAID Group to fail, which caused an aggregate to fail. We got the system back up and running by re-seating each disk slowly, with 90 seconds between each action.

Update: Here's what NGS had to say. (Quote)

The ports on the brocade switches are not locked:

>>> [-1][01] CRITICAL [01][01][00]:switchname: Port[4] has loop port NOT LOCKED

The ports not being locked can cause a number of instability issues and is the most likely cause of the issue seen. The information on how to lock these can be found in the following document: http://media.netapp.com/documents/tr-3548.pdf

"Not enough power supplies are present...to satisfy disk drive and shelf power requirements."

The logs are erroneous, there's been burts opened to correct this warning, but one PSU should not cause the disk shelf any issues other than it takes longer to get the disks spun up since it will do this in increments.

"Cluster monitor: takeover of eg-nascsi-a02 disabled (unsynchronized log)"
"Cluster Interconnect link 0 is DOWN"

It wouldn't surprise me if the syncmirror lost sync during this period based on the switch issues they experienced. The ports not being locked can cause a large number of unusual errors.

(End Quote)

Making sense of it: Brocade's ports are categorized as E, F, L, G, etc. An L port is a loop port, which means the switch will only initiate loop traffic. Locking a port as an F port means that the switch won't begin treating the ports in a point-to-point relationship. Directly from Brocade's documentation:

Credit: Brocade Fabric OS Reference 2.6

Here's a case of a NetApp customer working through this issue.

Predictions

Very soon, Facebook will become automatic. Your phone will keep track of the people you have encountered through the day in chronological order, and prompt you to friend them if you hang out with a person for a time or get very close to them.

Monday, November 7, 2011

NetApp Insights: Ifgrp

Few cool points on ifgrps:

- Can you add a new port to an existing ifgrp? Yes! If the new port is not connected to something though, the it must be configured down at first. It will come up in the ifgroup when you've plugged it in. ifgrp add .
- Can you remove a port from an existing ifgrp? Yes! Not live though. You must first configure the ifgrp down, and then you can ifgrp delete .
- How many switches can a ifgrp span? I don't know.
- Can you add a second IP to an existing vFiler? Yes! But each vFiler can live in only one IPspace.

Monday, October 31, 2011

Brazilian Jiu Jitsu

I've decided to begin including one of my other passions into this blog: Brazilian Jiu Jitsu. I also train boxing on the side, but my BJJ is where my heart is.

Recognizing how impactful a tool writing can be to learning and progression, I'm going to begin adding my insights to the sport's evolution. I probably won't talk much about technique and moves (youtube videos will always be better than print for that) but more about the culture, mindset, progression, and strategies.

My second BJJ tournament was a couple weekends ago: I went 7-3, placed 2nd in my weight class twice, and took 3rd for absolute no-gi.

Notes from that tournament: eat more in the morning. Bring more Gatorade. Bring a coach! Understand the rules better. Work on guillotines, work on side control escapes. It's better to have energy against bigger guys than be tired against smaller guys. Take body fat from 11% to 7%. Cardio is everything! Work on armbars from guard. Work on stand up (esp darce and guillotine).

Next tournament is Dec 10th, and I need to drop to 200 lbs. I'm walking around at about 208 right now...let's get to work!

Finding Misaligned VMDK's

Received a useful email from a colleague today:

"Starting with ONTAP 7.3.5, ONTAP has a nice feature to help identify misaligned VMDKs...this feature will be helpful as we roll out 7.3.6P1. At the end of the "nfsstat -d" command, you will see a section named "Files Causing Misaligned IO's". This will have a list of files that are doing misaligned I/O, along with a counter that indicates the frequency at which this IO is happening. If you want to start the counters over again, you can use "nfsstat -z" to zero the counters.

Below is a snippet of this output from a filer (the VMDKs with high counter values), which has been having some performance problems lately. We have 18 VMs here doing a significant amount of misaligned IO since the upgrade was done on Saturday night (there are 48 VMs in total doing misaligned IO). We need to get these VMDKs aligned in order to help improve the write performance on this system."

Files Causing Misaligned IO's [Counter=48113], Filename=infra_pv_vms_v03_snap14/ds1/c111asz/c111asz_1-flat.vmdk [Counter=18865]

Thursday, October 27, 2011

NetApp Experience: Shelf Add => Disk Bypass

During a shelf add last night, we ran into another hairy situation. Turns out we didn't connect the new shelf smoothly enough when sliding the SFP into the port, which caused us to see this:

[Filer: esh.bypass.err.disk:error]: Disk 7b.50 on channels 7b/0d disk shelf ID 3 ESH A bay 2 Bypassed due to excessive port oscillations

[Filer: ses.drive.missingFromLoopMap:CRITICAL]: On adapter 0d, the following device(s) have not taken expected addresses: 0d.57 (shelf 3 bay 9), 0d.58 (shelf 3 bay 10), 0d.59 (shelf 3 bay 11), 0d.61 (shelf 3 bay 13), 0d.67 (shelf 4 bay 3), 0d.70 (shelf 4 bay 6), 0d.74 (shelf 4 bay 10), 0d.75 (shelf 4 bay 11)

[Filer: shm.bypassed.disk.fail.disabled:error]: shm: Disk bypass check has been disabled due to multiple bypassed disks on host bus adapter 0d, shelf 3.

[Filer: shm.bypassed.disk.fail.disabled:error]: shm: Disk bypass check has been disabled due to multiple bypassed disks on host bus adapter 0d, shelf 4.

[Filer: ses.exceptionShelfLog:info]: Retrieving Exception SES Shelf Log information on channel 0d ESH module A disk shelf ID 3.

In fcadmin device_map, this looked like this:

Loop 0d

Loop 7b

Note that each loop saw a different number of bypassed disks. Sysconfig -r, disk show -n, and vol status -f all came back normal. A little backstory here: ONTAP bypasses disks because in certain scenarios, a single disk can lock up an entire FC loop (read here for more info on this). This is not the same thing as failing a disk: there are various situations where the disk will just be ignored by the system.

This thread indicated to us that the fix would likely be slowly reseating the disks. You have to respect the filer: pulling and pushing a ton of disks consecutively may cause unexpected consequences, so wait at least 90s in between each action. Pull, 90s, reseat, 90s, pull another, etc.

An example of unexpected consequences is below: one disk reacted poorly to being re-seated, and failed. When the system re-scanned the disk after it was pushed in, we saw this:
[Filer: disk.init.failureBytes:error]: Disk 0d.70 failed due to failure byte setting
We attempted to reseat the disk again, to the same effect. The disk didn't show up in vol status -f. We also tried to unfail the disk, to no effect. Here's how we fixed it, with 90s in between each step:
1) pull failed disk
2) pull another disk
3) swap failed disk into other slot
4) swap other disk into failed disk's slot
5) disk show
6) priv set advanced
7) disk unfail 7b.71 (this is the slot the failed disk was in).

Thursday, October 20, 2011

NetApp Experience: ONTAP 8.0.2 Upgrade

Love this. Had a customer who upgraded ONTAP from 7.3 to 8.0.2 before my scheduled work, and things got interesting. Their SQL server couldn't see its LUNs, which is were the databases obviously reside. Unfortunately for them, they had placed their configuration files on a LUN as well, and part of that configuration was "what data lives on which LUN?" They just pointed SQL down the correct path to find the configuration file, and SQL did the rest.

Didn't take too long to figure out and didn't cause any production outage, so it was pretty enjoyable to watch. Talk about a *doh*!

Tuesday, October 18, 2011

Business Travel

Things I've learned so far:

Never pass up an opportunity to charge your laptop/phone.
Bring noise-canceling earbuds.
Always travel with at least $40 in cash.
Bring a second pair of pants.
"He who travels happily, must travel light"
Don't be a lemming: why rush to be first on a plane?
Never check a bag if you can avoid it.

How about you, any travel tips?

Wednesday, October 12, 2011

NetApp Experience: CIFS Error

Here's a new case: we have a filer that we're working on getting data off of to retire it, and ran into an ONTAP bug: this filer is unable to execute CIFS commands that will allow us to rename or offline volumes. This is a big problem for use because renaming and offline-ing volumes are part of our retirement process.

Part of the solution for this is called an "NMI reboot." NMI stands for non-maskable interrupt, which basically means the software in the computer is incapable of ignoring this reboot. You may be familiar with the small pinhole button on a lot of consumer hardware that would "hard reset" the system: that's it.

This system is a clustered FAS980 running ONTAP 7.2.7. The plan is to use that pinhole button to reset the filers one at a time: when a reset occurs, the system should failover, and no noticeable downtime should result. After the reset, we'll do a giveback, let everything settle, and repeat the process on the other system.

Friday, October 7, 2011

NetApp Experience: Bad Slot

Really very interesting things have happened lately. I had a shelf add that kicked out a ridiculous amount of errors for one disk on the new shelf:

disk.senseError:error]: Disk 2d.53: op 0x28:0000a3e8:0018 sector 0 SCSI:hardware error - (4 44 0 3)

diskown.RescanMessageFailed:warning]: Could not send rescan message to eg-naslowpc-h01. Please type disk show on the console for it to scan the newly inserted disks.

diskown.errorReadingOwnership:warning]: error 46 (disk condition triggered maintenance testing) while reading ownership on disk 2d.53

Disk 2d.53: op 0x28:0000a3f0:0008 sector 0 SCSI:hardware error - (4 44 0 3)
diskown.AutoAssignProblem:warning]: Auto-assign failed for disk 2d.53

The weird thing was that the messages just continued to loop rather than just fail the disk. We swapped a new disk into that slot, and the old disk into a different slot to see if the disk was bad: turns out, the slot is bad.

We also tried reseating shelf Module B on that shelf. NetApp Support informed me that "Module A handles communication to the even numbered disks by default, and Module B the odd disks." I don't think this is true.

We're working with the customer to find a good resolution for this. Since downtime is difficult to accomplish, we may try to swap out the shelf chassis while the system is running. We'll see :-)

Thursday, September 29, 2011

NetApp Insights: Shelf Shutdown

Now that we know we can perform a shelf reboot live, we got a bit adventurous.

The question we were trying to answer is "Could we replace/remove a shelf on a live system without causing downtime?" I used a 3160 cluster in the lab with 4 DS14s in a loop, slowly failed all the disks in shelf 3, and removed ownership on those disks. At that point, I could shut down/unplug that shelf at will, and neither system complained except noting that they were transitioning to single-path.

I doubt NGS will ever give the plan their full blessing, but it's good to know that it's ok from a technical standpoint.

Update 1: I also successfully swapped out a shelf chassis in this manner in the lab. The controllers were totally ok with a new serial number! No issues that I could find.

Update 2: NGS did in fact OK this action plan twice, but later completely backed out. There's concern that the system will keep the shelf registered in the OS somewhere. A possible solution for this is the perform a failover/give back for each node after the shelf removal, since failover/giveback includes a reboot.

Wednesday, September 28, 2011

An open letter to all University Presidents

An open letter to all University Presidents:

A budget breakdown my alma mater mailed to me showed that 62% of our budget is staff and faculty salary and benefits. While I applaud the transparency, that makes it pretty hard to look favorably on a donations request: tuition has increased there 29.36% since 2005, in the midst of the greatest economic crisis since the Great Depression.

You University Presidents no doubt have many reasons for this: the marketing perspective on price being perceived as value in competition with other universities, competition for good faculty, and I know that few if any students pay the full amount of tuition. But in the midst of staggeringly high unemployment, American philanthropists have wiser and more deserving places to put their means when your university asks for support.

During my 4 years internship, my CEO asked our entire company, himself included, to take a pay freeze. And we did it willingly because we understood the investment we were making in an institution we believed in.

Has your university asked for similar sacrifices from its employees? Or will the future bear the brunt of this generation's economic mistakes? More debt on the back of our youth is not the answer to your university's future.

Thank you

Thursday, September 22, 2011

NetApp Experience: Shelf Add => Disk Fail

One of my practices when performing a shelf add is to wait in between each step, specifically between unplugging and re-connecting any cables. My thought process on this has been that the system should be allowed time to settle to its new circumstance, specifically that the controller will need to recognize what paths it is now able to communicate to the disks on.

Digging deeper, one thing I recently learned is that the disk has two ports of communication (referred to as A and B) to the shelf modules, and they negotiate their paths from the disk to the shelf module to the fiber ports on controller. e.g., Disk 23 port A could be connecting through shelf module B to port 2c, and disk 23 port B could be connecting through shelf module A to port 1b.

All of that is important to understanding the serious issue that failed two disks in a production cluster recently. A single shelf (DS14mk2 750GB SATA) was connected MPHA to a clustered pair with this configuration:

disk 1d.18: disk port A to shelf module B to port 1d
disk 2b.22: disk port B to shelf module A to port 2b

After unplugging the cable from 1d to shelf module B, there was a 17 second delay and then this:

Cluster Notification mail sent: Cluster Notification from CONTROLLER (DISK CONFIGURATION ERROR) WARNING
Controller> scsi.path.excessiveErrors:error]: Excessive errors encountered by adapter 2b on disk device 2b.18.
Controller> scsi.cmd.transportError:error]: Disk device 2b.22: Transport error during execution of command: HA status 0x9: cdb 0x28:354a07b8:0048.
Controller> raid.config.filesystem.disk.not.responding:error]: File system Disk /aggr2/plex0/rg0/2b.22 Shelf 1 Bay 6 [NETAPP X268] is not responding.
Controller> scsi.cmd.transportError:error]: Disk device 2b.18: Transport error during execution of command: HA status 0x9: cdb 0x28:4f5e9748:0048.
Controller> disk.failmsg:error]: Disk 2b.22: command timed out.
Controller> raid.rg.recons.missing:notice]: RAID group /aggr2/plex0/rg0 is missing 1 disk(s).
Controller> raid.rg.recons.info:notice]: Spare disk 2b.27 will be used to reconstruct one missing disk in RAID group /aggr2/plex0/rg0.

Controller> diskown.errorReadingOwnership:warning]: error 23 (adapter error prevents command from being sent to device) while reading ownership on disk 2b.18

Analysis:
These two disks failed as a result of an HBA issue last night. When a path is disconnected, any disks that are owned over that path are engineered to use the redundant path. When we disconnected port 1d, the HBA in slot 2 produced errors that halted this normal renegotiation for two of the disks. Because the disks were not able to operate on the redundant path, the system considered the disks to be in a failed state and began reconstructing that data to spare disks. When this happened, we halted work to investigate and remediate. We'll probably just RMA the disks to reduce variables.

NetApp Support recommendation: Re-seat this HBA, which would require a failover/giveback to perform. Another option would be to replace the HBA (which is what we'll probably do).

Update: NGS (NetApp Support) has changed their minds and now think this is a disk firmware issue. This disk firmware was backrev'd a couple years still, and their explanation is that iSCSI errors caused by the firmware pile up over time and eventually cross a threshold and cause an HBA to consider that disk incommunicable. There's no warning on the system that this HBA can't talk to that disk, and all the traffic is routed through the redundant path.

In this case, we had two disks that were in this situation and when I unplugged the return path (the path they were active on) they tried to fail over to the other path and could not. NGS believes this was just a pure chance, struck by lightning situation.

I'll post the bug report on this soon: the gist of it is that between 40C and 50C, a latching mechanism can get "stuck" and error out, but will quickly recover. I'm skeptical of this because the highest temperature observed in this shelf was 36C.

2nd Update: As best as I can tell, the disk firmware update did the trick. We went through with shelf adds last night without seeing the same behavior. We did, however, see what we believe to be a separate issue.

Monday, September 19, 2011

NetApp Training Brain Dump: Experimenting

Quick notes on things I tested today:

If you run a disk fail command, you will have to wait a few hours for the data on that disk to be copied to a spare.

There is a -i trigger for the disk fail command that will immediately fail the disk, without copying the data.

If you have no spares and you have a disk that is not assigned and is not in use, you have to assign that disk to the controller before it will be used as a spare. If you have options disk.auto_assign on, it will have already been assigned to a controller. In either case, you won't need to add the disk to an aggregate: the system detects it as a spare and grabs it in the place of the failed disk.
To see how many failed disks you have, use vol status -f
To see how many spares you have, use vol status -s
If you want to see the status of your disks, disk show won't do it. You'll need to use disk show -v to see failed disks, and neither will show spare disks as being spare.
You can't resize an aggregate's RAID roups. You can however use aggr options raidsize to set the size for new RAID Groups that are created for this aggregate.

Thursday, September 15, 2011

NetApp Insights: Usable Capacity

I saw some documentation given to a customer that estimated that for 144 2TB SATA disks (294.9TB), the customer could expect 170TB usable. It also said for 288 450GB SAS disks (126.5TB) they should expect 91TB usable. That's a big loss from a client's perspective.

I've previously developed a calculator to make it easy to plan your RAID Groups and aggregates, but now I want to use that to take a closer look at where all that space actually goes. A NetApp PSC expert told me the general rule is for FC disks 70% of raw is usable, and for SAS/SATA you take off another 10-12%. But let's see if we can dig into that.

Computers measure base 2, but drive manufacturers measure base 10. This means if your drive is labeled 1GB, it's actually 1000MB, not 1024MB.
The fuzziest part: drive manufacturers reserve between 7% and 15% on each disk. Some of this is for parity, a lot of this is to account for failed sectors. I've observed 2TB SATA drives reporting 1.69TB or less for a loss of 13.5%, I'll use that for these calculations.
You lose some space due to WAFL/disk asymmetry. The basic idea is that a 4KB block doesn't fit neatly into the disk sectors, so there's some waste. Some of this is taken into account by the manufacturer's reserve, so I can't quantify this in our calculations.
You lose some space to right-sizing. Since each drive manufacturer's 2TB disk is a slightly different size, ONTAP right-sizes all disks to the lowest common denominator to avoid incompatibly sized disks later. I can't find any data on how much space you lose to this process.
For every RAID Group, you lose 2 disks to parity/double parity.
You need to account for spares obviously.
WAFL requires 10% of the usable space to run the file system.

So for our two scenarios above, here's what we find:
Scenario 1
288 450GB SAS Drives
Spare drives: 8
Parity drives: 28

Credit: me!

Scenario 2
144 2TB SATA Drives
Spare drives: 6
Parity drives: 20

Credit: me!

Analysis:

You lose a consistent 15% because of the drive manufacturer whether you use EMC or NetApp or any other vendor.
To accomplish NetApp's goal of data protection (spares, parity, WAFL striping), you lose another 18-23%.
When you factor in backups and snapshots, you'll lose even more space.
One bright side is that using NetApp's dedup and efficient snapshot technologies, you can end up regaining this lost space.

Notice I'm still a considerable way away from the estimates given to the customer: 7.8% low for the 450GB system and 5% high for the 2TB system. There's still some gaps in my numbers here, I would definitely appreciate any tips!

Tuesday, September 13, 2011

FlashCache Pros and Cons

Ran across a brilliant article over at The Missing Shade of Blue that brought up something I'd never considered: latency is the real way to measure speed. Throughput is a vital statistic to be sure, but if your throughput comes at the cost of latency you really have to consider that trade off.

Bit of background for noobs: obviously, fast data storage is more expensive than slow data storage. Data tiering is pretty much the same idea as storing things in your closet: the stuff you use more often are in the front (fast, expensive storage like Flash memory), the stuff you never take out can be hidden deep in the back (slow, cheap storage like SATA).

Some companies, like Compellant, run algorithms to see what data is being read/written to infrequently and move that data to SATA, while the frequently-used data is moved to your faster SAS drives. NetApp (and EMC after they saw the success NetApp achieved) short circuit this a bit by just adding a single, super-fast tier of cache.

NetApp FlashCache is 512GB of super-fast flash memory. EMC FAST Cache are actually SSD's. Frequently accessed data is kept here so that the system doesn't have to go all the way to the drives, which increases the amount of data packets per second (IOPS) you are able to write or read.

The point that really struck me is that some Cache products, which create a super high tier for your data, can kill you on write latency. It turns out that EMC FAST Cache either increases write latency because of how it's implemented or opens the spigot for max IOPS so wide that the rest of the system can't keep up, exposing other bottlenecks. I'm sure that at some point if you throttle down the IOPS you'll see the write latency settle down. You'll still get a marked increase in IOPS, without the write latency.

This doesn't by any means settle that Cache products have no place in your SAN (it's still a fantastic performance boost for the money), but it does mean you have to factor in this effect when making the decision.

Monday, September 12, 2011

Virtual Technology Industry Analysis

Here's a pretty awesome analysis of virtual tech customers compiled by Forrester. Here are the highlights:

Figure 1: VMware is dominating. 93% of virtual tech customers have VMware.
Figure 2: In the rough economy, SAN customers are focused on:

Space utilization (efficiency): 53%
Cost: 39%
Performance: 30%

Figure 3: SAN vendor.

44% EMC
38% NetApp
24% HP/Lefthand
22% IBM.

67% of customers have only one storage vendor in their datacenter. I think that this is because only the bigger players can afford to create price competition in their environment, or perhaps only they really benefit enough in pricing to make it worth their while maintaining two or more products.
Figure 5: Protocol:

76% FC
37% NFS (up from 18% two years ago)
23% iSCSI

Notice on the second bullet (customer focus), both 1 and 2 are about cost. Customers understand that a higher dollar amount can save money in the long run using dedup, great snapshot management, and overall space efficiency. This also reflects the tough economy, and predicts skinnier margins for the storage industry.

http://media.netapp.com/documents/ar-storage-choices-for-virtual-server-environments.pdf

Friday, September 9, 2011

NetApp Insights: NDR Shelf Reboot

Got to witness a NetApp expert at work yesterday as he did some tests on a pretty cool capability that I hadn't heard of before. In ONTAP 7.3.2, DS14 shelves, in certain hardware configurations (read the KB below), allow you to suspend IO to the shelf for a certain period of time so it can be rebooted without the system panic'ing.

The basic idea is this: normally, if a shelf disappears off the loop, the system would catch the error and panic. In this case, the system goes into a mode where it tolerates this for a certain period of time through a combo of queue'ing or suspending traffic to the affected disks. In practice, you will see affected volumes suspend traffic for a short period of time. After the shelf reboot is complete, entering the power_cycle shelf completed command takes the system out of that mode and returns it to normal error catching.

For certain configurations the shelf will actually reboot automatically, and for older hardware/software combos the system will give you 60 minutes to manually shut off the power to the shelf. The specs say to expect up to a 60s suspension in traffic: in our tests, the automatic reboot took 11s and the manual one took up to 45s.

Here's an example of the command that reboots shelf 3 on loop 6a: storage power_cycle shelf start -f -c 6a -s 3
And here's the syntax:

power_cycle shelf -h
power_cycle shelf start [-f] -c [-s ]
power_cycle shelf completed

Attempts to power-cycle a selected shelf in a channel or all the shelves in a channel.

'power_cycle shelf completed' command must be used, as and when instructed by the 'power_cycle shelf start' command.
-f do not ask for shelf power-cycle confirmation
-c if option -s is not specified power-cycle all shelves connected to the specified channel. if option -s is specified, power-cycle shelf id on specified channel.
-h display this help and exit.

One idea we tested in the lab was using this to change the shelf ID while the system is still online. The shelf we rebooted had the mailbox disks on it, which caused a panic and a failover. This may still be possible in some conditions, I'll update as we figure this out.

Some ideas I want to test out:

What happens if a shelf other than the one you specified in the command goes offline? Is the system targeted in its tolerance of shelf loss, or does its tolerance extend across all shelves?
For setting the shelf ID:

What difference does it make if none of the disks are owned/are spares?
What if none of the disks in the shelf are mailbox disks?

https://sa.netapp.com/support/,DanaInfo=kb.netapp.com,SSL+index?page=content&id=3012745

Note: This capability is not available for DS4243's.

Wednesday, September 7, 2011

NetApp Insights: MetroCluster

My main criticism of NetApp's MetroCluster implementation is the same as this guy's; it has single points of failure.

Let's rewind a bit. NetApp has a product called a Fabric MetroCluster, in which you pretty much pick up one controller out of your HA pair and move it to another datacenter (I'm simplifying things). It's a good implementation in that it spreads the reliability of a single system out across two datacenters and replicates in real time. It's a bad implementation in that it's still a single system.

Everything can fail, so in SAN, the name of the game is redundancy. This is why customers buy TWO controllers when they purchase a HA system, even though both controllers are going into the same datacenter: each controller has ~6 single points of failure, and if it goes down, you still need your data to be served. By providing a redundant controller, you can lose a controller and your customers won't even notice. That's why we refer to a HA clustered pair as a single system: the cluster is a unit, a team.

You don't have the same luxury (without massive expense) when you spread your cluster across two datacenters. The reason you geographically locate your SAN system in the same datacenter as your servers is that there's a large amount of traffic going back and forth from the SAN system to the servers. Trying to pump all that traffic through an inter-site link (ISL aka inter-switch link) requires a serious pipe, which is very expensive.

If your SAN system goes down, the DR plan is typically to failover the clustered servers at the same time as the SAN system, a complex and often risky procedure. By failing both over, you ensure the traffic does not need to travel over the ISL, which would likely create latencies beyond the tolerances of your applications. But a better solution is to make sure your SAN system is redundant in the first place so you don't need to fail over.

This is why NetApp's current MetroCluster implementation falls short: it has 6 single points of failure that would require you to either push all traffic through your ISL or fail EVERYTHING over. That's not, in my opinion, enterprise-class.

Good news though - looks like NetApp might be planning on fixing this to allow a clustered pair at each datacenter.

Tuesday, September 6, 2011

NetApp Experience: Shelf ID

Encountered something cool recently that totally stumped NetApp experts: a DS4243 shelf whose shelf ID had gone crazy. The ID was set to 19 when it should have been set to 11, and the 1 was blinking. The system recognized the ID as 19 and functioned normally, but the shelf would not respond to the shelf-ID selector button that should have allowed me to change it. There was a disk drive missing in slot 4: this turned out to be unrelated as far as I can tell. At the software level, ACP and everything else just saw the ID as 19! Steps I tried:

- Power cycle the shelf (no effect).
- Change shelf ID (Wouldn't respond).
- Reseat the IOM modules (no effect).
- Update firmware (no effect).
- Replacing missing drive (no effect).

Got on the phone with NGS, and at the end of the day there was nothing else we could try. They shipped out a new chassis and we swapped it out, placing the old disks, power supplies, and IOM modules into the new chassis. Set the new chassis's shelf ID and everything worked great!

Details for future reference:
1TB DS4243 with IOM3's hooked up to a 6080 cluster, MPHA. 2 stacks of 2 shelves.

Friday, September 2, 2011

NetApp Training Brain Dump: Snapshots

The concept here is that a snapshot can become as large as the original dataset in the volume (100%). Remember that the space occupied by data in the volume is the sum of the existing LUNS/Qtrees and any snapshots that exist in that volume. Empty space in the volume is ignored by snapshots.

Here's the important background idea: WAFL does not update-in-place when existing data is changed. This means that for a normal LUN that has no snapshots, when data changes, it is written to a new location (total space occupied increases) and then the old data is deleted (total space occupied goes back to pre-change levels).

Illustration: If in a LUN with 6GB of data a 4KB block is changed, the sum total of space occupied by data rises to 6GB + 4KB, then back to 6GB as the out of date 4KB block is deleted and reclaimed. WAFL handles this so quickly that your LUN effectively does not increase in size. This is a great advantage for WAFL because update in place can cause data corruption.

This concept is essential to understanding how snapshots work in ONTAP. Let's go back to our 6GB LUN with a 4KB change: WAFL writes the new 4KB data to new, unoccupied space and the snapshot is left occupying the space that would be otherwise deleted. So as data changes, it is not actually the snapshot that is allocated more space, but its existence means that the space that could be reclaimed is now solely assigned to the snapshot. So any data that is only assigned to the snapshot is considered occupied by the snapshot. In this example, the snapshot would be considered to be 4KB in size.

If you're a visual learner like me, checking out this diagram will help you picture the concept.

The size taken up by the snapshot increases in concert with the changes to the original LUN: 500MB of changes to the original LUN means that the snapshot will grow from 0 to 500MB in size. For 20GB volume that has 6GB of data (including LUNs and other snapshots), the next snapshot can grow as large as 6GB, making the sum total of the original data and the new snapshot 12GB.

You can find commands to control snapshots here.

Wednesday, August 31, 2011

Volume Fractional Reserve vs Snap Reserve

(Note: this won't make sense to you unless you already understand snapshots. Read here for a common sense explanation of how WAFL handles snapshots).

Fractional Reserve: the amount of space that is reserved for your snapshots to grow. In NetApp's words,
"A volume option that enables you to determine how much space Data ONTAP reserves for Snapshot copy overwrites for LUNs, as well as for space-reserved files when all other space in the volume is used. Fractional reserve is generally used for volumes that hold LUNs with a small percentage of data overwrite."

Snapshot Reserve: In NetApp's words, "a set percentage of disk space for Snapshot copies."

The concept here is that a snapshot can become as large as the original dataset in the volume (100%), and space needs to be reserved for that. Remember that the space occupied by data in the volume is the sum of the existing LUNS/Qtrees and any snapshots that exist in that volume. Empty space in the volume is ignored by snapshots.

Consequently, the Fractional Reserve is between 0% and 100% the size of the original space taken up by data in the volume for each snapshot, plus any space reserved for thick-provisioned LUNs.

Say volume volx is 20GB and has a single thin provisioned LUN with 6GB of data. If there is one snapshot of volx, the volume fractional reserve need not be larger than 6GB. This is not to say that 6GB is required, it is just the ceiling.

The best explanation on fractional reserve I've seen yet is by Chris Kranz: http://communities.netapp.com/groups/chris-kranz-hardware-pro/blog/2009/03/05/fractional-reservation--lun-overwrite

Quick comparison between Fractional Reserve and Snap Reserve:
With fractional reserve, changes to existing data (aka data overwrite) will be written first to blank space, and then to the fractional reserve space. When a LUN is space-reserved (aka fully allocated or thick), fractional reserve is where there is space specifically reserved for that LUN.

If fractional reserve is set to 100%, the fractional reserve space will be the sum of:
1) The size of all space-reserved LUN's
2) The max size of all snapshots

With snap reserve, new data cannot be written to the space reserved. It is solely for changes to existing data. This means if all the open space is occupied, the LUN cannot grow even if the snap reserve is not full. At that point the data in the LUN can change, but the space taken up by the LUN cannot change.

If you are using Qtrees, it makes sense to use snap reserve. If you are using LUNs, go with fractional reserve.

NCDA Notes: Perf Tools

Quick description of tools used for performance and monitoring in conjunction with a FAS system..

netstat lists open connections and statistics on them
ifstat lists NIC's and statistics on them
pkkt gathers data on the a single interface and the network traffic on that interface for analysis
statit used to produce/analyze statistics on component performance
stats low level perf data - network
sysstat high level perf summary: cp's, cpu, protocols, etc.
perfmon downloadable all-purpose performance tool. big gun.
netmon essentially a lightweight version of wireshark.
ethereal network analysis tool. Basic idea is it tries to grab all available packets and figure out what's going on.

Tuesday, August 30, 2011

NetApp Experience: Joining the FAS System to Active Directory

While enabling CIFS, you are asked if you want to join a domain. This can get a bit complicated depending on whether you have privileges to add computers to the domain, and specifically which OU you have rights to add a computer to. Unfortunately, ONTAP lists OU's in a very not-useful way, so if you arent' able to get the OU=Contoso,OU=com,etc syntax right you'll find yourself frustrated. Luckily, there's a backdoor way to accomplish this.

"By creating a computer account in Active Directory before the computer joins the domain, you can determine exactly where in the directory the computer account will be placed."*1

So go ahead and create the account in ADUC, and then join the domain using the CIFS setup technique like you normally would. Problem solved!

*1 http://ptgmedia.pearsoncmg.com/images/9780789736178/samplechapter/0789736179_CH02.pdf

Saturday, August 27, 2011

NetApp Training Brain Dump: Cabling Standards

I encourage anyone who owns a FAS system to read over either of these two documents and make sure their system is correctly cabled. It could save you a big headache in the future!

DS4243 Installation and Service Guide

http://now.netapp.com/NOW/knowledge/docs/hardware/filer/210-04708_B0.pdf

SAS Cabling Guide

http://now.netapp.com/NOW/knowledge/docs/hardware/filer/215-05500_A0.pdf

Feel free to email me if you need a copy but don't have a NOW account. My email address can be found in my "about me."

Friday, August 26, 2011

NetApp!

I'm happy to announce that I've received an assignment to work for NetApp as a PSE for the next 6 months. I'm looking forward to learning a ton surrounded by such smart people!

Friday, August 12, 2011

NetApp Training Brain Dump: Volume Snapshots

Here's something I bet you've never thought of before: how can a volume snapshot, which is a picture of state of the volume, be stored in the same volume? It's kind of like trying to take a picture of your entire self, hand included, while holding the camera.

Think about it. A volume snapshot creates pointers to all the blocks in the volume, and retains them no matter what happens to the active data. If you write a couple new blocks to identify within the volume that there is a new snapshot, along with the timestamp, etc, that has changed the volume. That data must be written somewhere new since the old space is all protected. The entire volume's space is protected. Every new snapshot would make the volume read-only until the volume is expanded.

Even crazier: since snapshots reside inside the volume, this means when you take a second volume snapshot, you're taking a snapshot of a snapshot. Whoa.

Solution:
Volume snapshots don't take a snapshot of the whole volume: white space is not included. This means that volume snapshots only protect the blocks that are currently occupied with data: the rest of the space carved out for the volume is not part of the deal.

Let's say you have a 20GB volume with a 1GB LUN inside. Take a volume snapshot: the largest that snapshot can get is 1GB, and if your LUN changes but does not grow, only 2GB of your 20GB will be occupied be data.

Let's say you have a 20GB volume with 4 1GB LUNs inside. Take a volume snapshot: the largest that snapshot can get is 4GB, since that is the sum total of space occupied by data. Let's say that happens, and our original 4 LUNs are all 1GB but all their data has changed. The total space occupied by data is 8GB. If you take a snapshot of the volume now, it will save the state of both the LUNs and snapshots in the volume, since snapshots do indeed count as data. The largest your new snapshot can grow is 8GB.

A note of interest is that the snapshot info (name of snapshot, date taken, block pointers (aka inode tables), file allocation, and metadata like ACL's) all resides inside the volume.

Wednesday, August 10, 2011

NetApp Training Brain Dump: Snap Command

This post will be pretty straightforward: describing the snap command, which is used to manage snapshots in Data ONTAP. Click here if you need more background on snapshots. I'll skip the stuff that isn't very useful and put the basics on top: there are more options than I'll go through below but I've done you the favor of leaving out the less useful information.

Basics:
snap create <volume name> <snapshot name>
snap delete <volume name>. Add -a to delete all snapshots belonging to the volume.

Advanced: Snap Restore
Note: read this if you need help understanding NetApp's command syntax.
snap restore [ -f ] [ -t vol | file ] [ -s snapshot_name ] [ -r restore_as_path ] vol_name | restore_from_path

Informational:

snap list <volume name>. Add -A to list all snapshots, add -l to show date created and retention policy.

snap delta <volume 1 name> <volume 2 name> . compares two snapshots and tells you the rate of change between them. Gives some really cool tables. Add -A to compare all snapshots for this volume.
snap reclaimable <volume name> <snapshot 1 name> <snapshot 2 name> this command can take awhile. It calculates the amount of space you can get back by deleting the snapshots you list.
snap rename <old snapshot name> <new snapshot name>

Systematic:
snap sched <volume name> <#weekly><#daily> <#hourly> @list. For each #, replace with an integer. ONTAP will keep that many snapshots online for that time period. For the @list, use military time to designate when to take hourly snapshots. For example, a 2 in the #weekly spot would create two snapshots every Sunday at 24:00. Daily snapshots are taken at midnight.
snap autodelete. allows the volume to automatically delete snapshots. The volume will delete them based upon triggers you set. This gets complicated quickly, involving what kinds of locked snapshots you'll allow to be autodeleted and in what case.
snap reserve <volume name> <%> reserves space for snapshots in the given volume.

Monday, August 8, 2011

NetApp Training Brain Dump: snapmirror.conf

ONTAP saves the information for what snapmirror relationships have been set up in a file called /etc/snapmirror.conf. What /etc/rc/ is to ifconfig, /etc/snapmirror.conf is to snapmirror initiate. The basic idea is that even if you set up a snapmirror relationship, that data replication will cease when the system restarts...unless it is in snapmirror.conf.

There are several amazing walkthroughs for this file, so I won't be redundant. Please check out the links below from the very talented Chris Kanz!

http://www.wafl.co.uk/tag/snapmirrorconf/

http://www.wafl.co.uk/snapmirrorconf-2/

Saturday, August 6, 2011

NetApp Training Brain Dump: /etc/rc

Interesting history:

"The letters stand for Run-Com, the name of a command
file on early DEC operating systems. The Unix system's
original "rc" file was /etc/rc, which executes commands
when the system "boots" (or starts up). The name spread
to the C shell startup file .cshrc, the Mail startup
file .mailrc, the Berknet initialization file .netrc,
and the Netnews startup file .newsrc. Programmers could
have chosen a better suffix (such as init) but they
wanted to retain a realm of mystery in the system."

http://www.anvari.org/fortune/Miscellaneous_Collections/305494_what-does-rc-stand-for-and-why-are-there-so-many-rc-files.html

Friday, August 5, 2011

NetApp Training Brain Dump: Clusters

Doing my best to translate tech-speak into common sense, one day at a time.

In NetApp, a cluster is two controllers that are both capable of accessing any disk in the system. When data is sent to a particular controller to be written to disk, that data is sent to the local cache, and then mirrored to the partner controller's cache. The purpose of this is for failover: if one controller goes down, the other controller can 100% emulate the failed one and not miss a beat.

This is called a "takeover". If one partner "panics" (fails), the other controller will take over its disks, its IP addresses, and its traffic. Pretty cool. It knows what IP addresses to spoof because when you set it up, you put the partner addresses in the /etc/rc folder. You typically want no more than 50% utilization on either controller, so that in the case of a failover, the surviving controller can handle the total sum of traffic.

When you are confident the failed controller is back up and operational, you can initiate a "giveback," in which the controller coming back online will re-sync with its partner's cache, and then resume owning disks, handling traffic, and getting it's IP's back. Givebacks take 1-2 minutes or so, during which the taken-over system is unavailable, and there are complications for people accessing files via CIFS. The giveback command is issued from the partner that took over the down controller.

There are a number of options you can configure to handle this behavior. You can:

Alter how file sessions are terminated in CIFS before a giveback, including warning the user.
Delay/disable automatic givebacks.
Have ONTAP disable long-running operations before giveback.
Not allow the up controller to check the down controller before initiating giveback (bad idea).
Allow controllers to take each other over in case of hardware failure, and specify what IP/port to notify the partner on.
In a metrocluster, change FSID's on the partner's volumes and aggregates during a takeover.
Change how quickly an automatic takeover occurs if the partner is not responding.
Disable automatic takeovers.
Allow automatic takeovers when a discrepancy in disk shelf numbers is discovered.
Allow automatic takeovers when a NIC or all NIC's fail.
Allow automatic takeovers on panic.
Allow automatic takeovers on partner reboot.
Allow automatic takeovers if the partner fails within 60s of being online.

The command used to initiate and control takeover/giveback is cf. Here are your main options

cf disable: disables the clustering behavior.
cf enable: enable the clustering behavior.
cf takeover takes down the partner and spoofs it. cf giveback allows the down controller to take back its functionality. ONTAP won't allow these to be initiated if the system fails checks for whether the action would be disruptive or cause data loss.
cf giveback -f: bypasses ONTAP's first level of checks as long as data corruption and filer error aren't a possibility.
cf takeover -f: allows the takeover even if it will abort a coredump on the partners. Add a -n to the command: ignores whether the partner has a compatible version of ONTAP.
cf forcegiveback: ignores whether it is safe to do a giveback. Can result in data loss.
cf forcetakeover: ignores whether it is safe to do a takeover. Can result in data loss. -d bypasses all of ONTAP's checks and initiates the takeover regardless of data loss or disruption. -f also bypasses the prompt for confirmation.
cf status will inform you of the current state of the clustered relationship.