Monday, September 22, 2014

SnapMirror vs SnapVault in CDOT

 In CDOT we’ve simplified the commands and protocol: to create a snapvault relationship, you run “SnapMirror create –type XDP.”  To create a SnapMirror relationship, you run “SnapMirror create –type DP.”  In fact, for CDOT we no longer call it snapvault, it is called XDP SnapMirror. 

e.g., vs2::> snapmirror create -destination-path vs2:dept_eng_dp_mirror2 -source-path vs1:dept_eng -type DP

The primary distinction between DP snapmirror and XDP SnapMirror is that snapvault allows you to keep more snapshots on the destination.  Essentially, XDP snapmirror is for long-term backups.  Other differences:
·         DP SnapMirror relationships can be reversed (swap destination and source)
·         DP SnapMirror can replicate every minutes, XDP SnapMirror once per hour.
·         DP SnapMirror destination volumes can be made read/write.

Do you have some datasets that would benefit from a quicker time to recovery or have stricter SLA’s?  If so, DP SnapMirror is the best choice.  

Monday, June 30, 2014

NetApp All Flash FAS!

Some great notes on designing an AFF:

  1. Don't mix any other disk type with an AFF (hence, all flash) in the HA Pair.  Feel free to mix within the cluster though!
  2. Don't use Flash Cache with AFF
  3. It's perfectly OK to put the root aggregate on SAS drives.
  4. Flash disks have the same MTBF as SAS disks, and expected life of 5 years. 
And of course, some performance data:



Documents: TR-4289, "All Flash FAS Technical FAQ"

Friday, May 16, 2014

Datacenter Migration Project: Lessons Learned

Some note I wanted to save for posterity on a datacenter migration project I lead several years ago:

What we did well:
  • Everything was set into the racks correctly on the first try (pretty incredible if you think about it).
  • Everyone arrived to the pre-meeting and customer sites on time (started at midnight).
  • Communication with the customer was consistent.
  • We got the backup filer up before 7am!
  • We got the equipment all up and handed over by 4:30pm  (our expected was 4pm and deadline was 7pm)
  • We handled several mini-crises in stride:
    • Motherboard death
    • Loop “login delay” issue
    • FC cable shortage
    • Loop combination


What we need to remember next time:
  • Starting at midnight and going all day is very, very different from starting at 8am and going all day.
  • It’s important for a team to be familiar with one another and how the work tasks will flow.
  • It’s important to have a good blend of experienced and inexperienced team members.
  • Fly the team into town the day before – You can’t expect someone to travel all day and then work all night.
  • Make sure the team understands how to read the documentation before you get onsite.
  • Check to see if the team is experienced with the specific technologies relevant.
  • Have a pre-job meeting (preferably with pizza) to explain expectations and game plan.
  • Make sure everyone is in the same hotel, close to the jobsite.
  • Make sure there is food/drink arriving regularly for the guys who are working.
  • Add an extra guy to do physical work if you have someone supervising/project managing.
  • Drills are important.
  • Get all of the rails into the racks beforehand if possible.
  • Plan to combine loops/stacks if the system is complex and spread out.  Don’t underestimate how long it takes to cable per loop!
  • Cabling loops between racks is significantly more time consuming.
  • Don’t expect to be able to salvage any/all cables that run under/over racks.  Was a total rat’s nest.
  • Have a rested, standby team member to handle hardware failures/support issues 
  • Make sure you’re aware of the plan for switches/patch panels that are in-rack, and for disconnecting the PDU’s.

Thursday, April 24, 2014

Performance Case

Just a performance case I worked recently.  These kinds of things can be instructive for people with similar problems, or those who learn by playing along at home.  Here's my email, sorry for the copy paste without context.

 Please take a look at the email chain below as the starting point for this conversation.  At this point, we’ve reviewed four sets of performance data gathered over the last two months and have closely correlated a spike in large-IOP-size traffic to our latency spikes.  This spike is both in number and size of IOPS, exceeding 32,000 IOPS for 15+ minutes at a time.  There is no single volume driving the traffic, as it appears to be increasing dramatically across the board.  

Here is a summary of the performance data from Thursday 4/3, please note the IOP ramp up and associated latency:
Start Time
CPU Busy
NFS Op/s
Read Op/s
Read Lat (ms)
Write Op/s
Write Lat (ms)
Net Sent (MB/s)
Net Recv (MB/s)
9:53p
52
7,127
1,256
2.32
5,731
0.63
27
50
9:57p
67
13,637
5,396
19.96
8,076
110.09
145
81
10:05p
99
31,119
17,272
22.91
13,697
341.18
371
205
10:13p
99
32,311
22,519
8.4
9,739
229.45
621
200
10:24p
99
23183
12,819
7.21
10,261
143.16
348
260

And here is the data from Thursday 2/13.
Period
CPU Busy
NFS Op/s
Read Op/s
Read Lat (ms)
Write Op/s
Write Lat (ms)
Net Sent (MB/s)
Net Recv (MB/s)
9:09p
78
12,574
5,562
3.19
6,926
1.05
200
141
9:18p
98
27,771
17,291
4.7
10,347
24.29
571
312
9:29p
98
33,050
21,460
9.11
11,507
125.85
650
352
9:38p
98
34,149
22,813
11.28
11,216
530.9
647
345

One thing that stands out in the data is a large, sudden increase in 64k+ IOPS.  I’ve adjusted the table to include a row for 64k IOPS and have highlighted the relevant statistic.
FAS6280 Maximum IOPS

Read/Write Mix
Avg IO Size
100/0
75/25
50/50
25/75
0/100
64k
61,000
43,000
32,000
26,000
22,000
32k
68,000
48,000
36,500
30,000
25,000
24k
74,000
51,000
39,500
31,500
27,000
16k
80,000
56,500
43,000
36,500
30,500
8k
85,000
63,000
50,000
41,500
36,000
4k
90,000
66,000
54,000
45,000
40,000


  The workload mix appears fine for most of the day but experiences large-IOP-size peaks that are outside our guidelines and cause some pain (38,000 IOPS 1pm 4/5,  45,000 IOPS 11pm 4/4, 37,000 IOPS 11am 4/3) .   I’d also make mention that ~10% of IO to this system is misaligned, which hinders us from achieving maximum performance ROI.  Lastly, this system is achieving 65-85% dedupe ratios, which is fantastic space conservation but adds to the overall workload.

  As discussed yesterday, here are our options:
·         Short term steps:
o   Stagger workloads (Symantec, et al)
o   Disable aggr snapshots (done)
o   Stagger dedupe
o   Case open on daytime snapshot correlated latency (done)
o   Update Data ONTAP
·         Long term solutions:
o   Add new disk to passive controller and balance workload or

o   Shift workload to a different or new HA pair

How to convert from 7-Mode to CDOT

Here is a publicly accessible document on converting your old 7-Mode systems to NetApp's new operating system, CDOT.

https://kb.netapp.com/support/index?page=content&id=1013517

An excerpt:
Perform the following steps to convert from Data ONTAP 7-Mode to Data ONTAP 8.0X Cluster-Mode
  1. Disable 'Cluster Failover' and reboot the node to the LOADER prompt. Do not perform a takeover.
  2. Boot each node to the LOADER prompt and ensure that the following variables are set:

    To convert from 7-Mode to Cluster-Mode:
    LOADER> set-defaults
    LOADER> setenv bootarg.init.boot_clustered true
    LOADER>
      setenv bootarg.bsdportname
     
  3. Boot the node with this command:
    boot_ontap
  4. When the nodes are booting, press CTRL+C to enter the Boot menu.
  5. At the Boot menu, select wipeconfig on each node.
    *******************************
    * Press Ctrl-C for Boot Menu. *
    *******************************
    How would you like to continue booting?
    (normal) Normally
    (install) Install new software first
    (password [user]) Change root/user password
    (setup) Run setup first
    (init) Initialize disks and create flexvol
    (maint) Boot into maintenance mode
    (syncflash) Update flash from backup config
    (reboot) Reboot node
    Please make a selection: wipeconfig

ONTAP 8.2 Commands

Measuring Read Sequential vs Random

If you want to measure an existing workload's seq vs random profile in IOPS, there's a pretty simple way.  Run this command (diag mode): stats show readahead.  The part you care about will look like this:

Table 1
readahead:readahead:seq_read_reqs.4K:71%
readahead:readahead:seq_read_reqs.8K:80%
readahead:readahead:seq_read_reqs.12K:78%
readahead:readahead:seq_read_reqs.16K:70%
readahead:readahead:seq_read_reqs.20K:75%
readahead:readahead:seq_read_reqs.24K:75%
readahead:readahead:seq_read_reqs.28K:78%
readahead:readahead:seq_read_reqs.32K:76%
readahead:readahead:seq_read_reqs.40K:77%
readahead:readahead:seq_read_reqs.48K:81%
readahead:readahead:seq_read_reqs.56K:79%
readahead:readahead:seq_read_reqs.64K:98%
readahead:readahead:seq_read_reqs.80K:86%
readahead:readahead:seq_read_reqs.96K:0%
readahead:readahead:seq_read_reqs.112K:0%
readahead:readahead:seq_read_reqs.128K:0%
readahead:readahead:seq_read_reqs.256K:0%
readahead:readahead:seq_read_reqs.384K:0%
readahead:readahead:seq_read_reqs.512K:0%
readahead:readahead:seq_read_reqs.1024K:0%

The first part is garbage: what we care about starts after the period.  The "56k" number is the approximate size of the IOP, and percentage is how many of that size were sequential during your measurement.  Ignore the first whole part, and focus on the percentile.  Just a glace shows you we have at least 75% seq read IOPS averaged across the IOPS sizes. That's good enough for most performance profiling questions.  

I don't think this is particularly useful, but I was asked recently the read random vs sequential throughput measurement as well.  We can also calculate that (roughly) here.  Notice this IOP size count:

Table 2
readahead:readahead:total_read_reqs.4K:94346
readahead:readahead:total_read_reqs.8K:34333
readahead:readahead:total_read_reqs.12K:9669
readahead:readahead:total_read_reqs.16K:8533
readahead:readahead:total_read_reqs.20K:19922
readahead:readahead:total_read_reqs.24K:4563
readahead:readahead:total_read_reqs.28K:4069
readahead:readahead:total_read_reqs.32K:12009
readahead:readahead:total_read_reqs.40K:10299
readahead:readahead:total_read_reqs.48K:5949
readahead:readahead:total_read_reqs.56K:4908
readahead:readahead:total_read_reqs.64K:2675102
readahead:readahead:total_read_reqs.80K:234257

You guessed it.  Multiply the corresponding # of IOPS times the size beside it, multiply that by the percentage that accompanies that IOP size in the first table.  For example:

readahead:readahead:total_read_reqs.32K:12009
readahead:readahead:seq_read_reqs.32K:76%

Calculation 1: 32*12,009 = 384,288Kb

This is your total Kb throughput at that IOP size.

Calcuation 2: 384,288Kb*76% = 292,058Kb

This is the sequential throughput at that IOP size.  Do each calculation for each size, then add all the Calc 1's up and all the Calc 2's.  Finally, divide the sum of the Calc 2's by the sum of the Calc 1's, and you have the percentage of the read throughput which is sequential!

CDOT and Show Mount

FYI, NetApp has a quick workaround to get show mount to work.  From the ReadMe:

This tool provides 7-Mode functionality in Clustered Data ONTAP for the "showmount -e" command as it is executed by an application as a workaround tool until the official ONTAP fix in 8.3.

It is a set of scripts which needs to be copied to the client machines on which the 'showmount -e' command will be executed and hence replacin the original showmount command binary.

The steps to use this showmout wrapper:

1.Move the existing /usr/sbin/showmount to /usr/sbin/showmount_old. This is very important step.

2.Copy the files [NaServer.py, showmount.py, NaErrno.py, NaElement.py, showmount] from showmount_export folder to /usr/sbin.

3.Update the showmount file with proper username and password for the storage virtual machine access

4.Execute showmount -e

Note: Please make sure the LIF "Role" is data and "Firewall policy" is mgmt, when you create the LIF from CLI or If you use system manager to create LIF (storage virtual machine -> select the svm -> configuration -> Network Interface -> create ) and make sure to select "Both" in "Role" screen of the LIF

5.Check the results are coming then try with OVMManager.


http://support.netapp.com/NOW/download/tools/showmount_plugin_cdot/

CDOT 8.2.1 Summary

Take a moment to familiarize yourself with this CDOT 8.2.1 documentation.  There is a TON new and improved in 8.2.1 over 8.2, as well as some cautions.  I’ve highlighted a few here.

https://library.netapp.com/ecmdocs/ECMP1368924/html/GUID-45F85A02-114C-4192-8F1B-A4F50996D307.html

Features:
  • Support for FAS8000 series
  • V-Series feature now called “FlexArray,” a non disruptive on-the-fly licensable feature.
  • Support for qtree exports
  • Storage Encryption support
  • Support for direct attach E-Series configurations
  • Non-Disruptive shelf removal support
  • Log and core dumps available via http:///spi/
  • SQL over SMB3 non-disruptive operations support
  • VMware over IPv6 support
  • Offbox antivirus support
  • Health monitoring of Cluster Switches
  • Increased Max aggr sizes
  • 32-64-bit aggr conversion enhancements
  • Automatic Workload Analyzer, which assesses how the system would benefit from SSDs (Flash Pool)
  • Support for “Microsoft Previous Versions” tab on files (8.2 and later)

Cautions:
  • Some Hitachi or HP XP array LUNs might not be visible.  “In the case of Data ONTAP systems that support array LUNs, if the FC initiator ports are zoned with Hitachi or HP XP array target ports before the storage array parameters are set and the LUNs are mapped to the host groups, you might not be able to see any LUNs presented to the Data ONTAP interface“
  • NFSv2 not supported. Windows over NFSv3 not supported.
  • Verify management software versions are compatible.
  • First VLAN configuration may temporarily disconnect the port.
  • LUN revision numbers change during upgrades.  Windows 2008, 2012 interpret these as new LUNs.  
  • Dedupe space considerations and clearing stale metadata for upgrades.
  • Cautions for proper cluster and vserver peering methods
  • Cautions for proper vol move methods

Tuesday, April 15, 2014

Aggregate Snapshot autodelete_base

Interesting aggregate snapshot name: autodelete_base

The autodelete_base snapshot was created as a fix for BUG 263339: Aggregate snapshot autodelete is leaving me with no snapshots. The filer will automatically create a snapshot called autodelete_base at the aggregate level when a snapshot is removed as a result of autodeletion. This ensures that there is always a new snapshot in the aggregate after a snapshot is deleted.


https://kb.netapp.com/support/index?page=content&id=2011516&locale=en_US

Tuesday, April 1, 2014

SANtricity Walkthough


 Quick walkthrough for anyone unfamiliar with SANtricity.  Names have been changed to protect the innocent, I apologize for the mspaint job but hey, this is a free blog!  You get what you pay for!


1. SANtricity home page

2. Right click an array.  Click manage.
3. Click “view storage array profile.”  You’ll see the chassis SN. These are sometimes missing, sometimes strange combos of letters and numbers, sometimes 70xxxx like FAS systems  When we fix the environment, they’ll all be 70xxx.

4. SANtricity tells you if the system is properly cabled:
"Tray path redundancy"


5. Click hardware tab.  You’ll see the controller firmware version.

6. Exit the view storage array profile.  Click hardware tab. 
Right click a disk you see these options.  This is where you manually fail a drive.

Also notice the menu bar at top has changed: there are a ton of options up there, 
including performance monitoring.

7. Scroll down, you’ll see the controllers.  Right click one, here are your options, 
like locate which blinks lights, or "place" gives you options to “fail over” the 
controller, reboot it, etc if it fails.


8. Configure lets you set the management port IP.


9. The port config window


10.   The tray’s components and their status

E-Series Notes


We've been digging into E-Series as fast as humanly possible, here are some of the basic things that have bee tough to flush out:


  • Impending drive failures: the best practice is to fail the drive manually immediately before replacing it
  • Upgrades:
    • Best practice is to update NVSRAM and Controller Firmware (CFW) at the same time
    • Watch out!  The SANtricity will start running checks and try to upgrade all of the arrays if you do this wrong.  Be careful.  Luckily, it should ask for the array password before upgrading, which gives you a chance to cancel.  
    • If you open the array, go to storage, right click the system, you'll find an "upgrade" option where you can update just that system.  Use this!
    • A precheck is run automatically to detect possible upgrade issues.
    • CFW upgrades are non-disruptive and should take 5-15 minutes per array
    • NetApp recommends not updating disk firmware unless you have a specific reason to
  • Changing array password if you forget it: https://kb.netapp.com/support/index?page=content&id=1013311&actp=search&viewlocale=en_US&searchid=1395157054885
  • Cabling document: https://library.netapp.com/ecm/ecm_get_file/ECMP1394868
    • Looks like each array model has a different cabling schema.
  • E-Series systems have default IP's on 
  •  Default Username: shellUsr, Default Password: wy300&w4

Monday, March 17, 2014

WWPN HBA Swap

Had a great question I wanted to share with you all (names omitted to protect the innocent).  The question was “When doing a motherboard swap or FC HBA swap, why do target WWPN’s not change?”

Answer: Basically WWPN’s are hard coded into the initiator ports but the WWPN’s for target ports are calculated based on the WWNN (see here) and retained statically by ONTAP.

To directly answer the question: “As long as the existing root volume is used in the head swap or upgrade, the same port-to-WWPN mapping applies. For example, port 0a on the replacement head has the same WWPN as the original head. If the new head has different adapter ports, the new ports are assigned new WWPNs.” 
Also interesting:

Tuesday, March 11, 2014

NetApp E-Series Introduction

This is going to be a bit rough and free-form, but this a crash course and "what you actually care about" into NetApp E-Series.  Enjoy!
  • SANtricity is the e-series “filerview” or “system manager.”  Install it on your laptop and directly plug into the system.  Alternatively, set it up like OnCommand on a management server and it will monitor your systems and allow you to manage them from there.
  • There is no “ONTAP.”  It’s SANtricity ES Storage Manager in format 10.86.xx and controller firmware in the format 07.86.xx
  • Systems in SANtricity are referred to as “arrays.”  Each array is two controllers, in one chassis.  The chassis SN is the identifying, unique feature.  Controllers also have SN's.
  • ESM (environment service module?) is the IOM for shelves.
  • Disks are laying in trays you have to slide out to access the disks.

In ONTAP there are two kinds of RAID, RAID4 or RAID6.  But there’s only one kind of aggregate.  In e-series, there are 2.  These are the 2 aggregate equivalents in e-series:

Volume Group: disks put together and assigned a RAID level.  8+2 is recommended.  Can lose 2 in RAID 6.  You can set this to RAID5 though.

Dynamic Disk Pool (DDP):  disks put together with RAID spread across the disks.  Can lose 2/RAID group.  Details below:

I highly recommend reading the key terms portion of pages 1, 65, 66.  Cleared a lot of things up for me!


Monday, February 17, 2014

Reading LUN Stats -o

netapp1> lun stats -o
    /vol/UCvol/UClun  (98 days, 0 hours, 46 minutes, 51 seconds)
 Read(kbytes)  Write(kbytes)  ReadOps  WriteOps  OtherOps  QFulls  PartnerOps PartnerKBytes 
     15940415      640772      30487379  8734    9767    0     30442488       15496708 

A few things about this are not immediately intuitive.  First, statistics labeled "Partner" are indirect IO, passed through the cluster interconnect.  This means both heads have to process the information, which is adds up to considerable overhead in terms of work and latency.  

Second, the statistics not labeled  "Partner" are totals: they include both the direct and indirect IOPS.  At first glance you'd see that ReadOps is a very close number (.1% off) to PartnerOps, which would indicate that the traffic is being load balanced across indirect paths.  

But that would be counting the same Ops twice.  If you add up ReadOps, WriteOps, and OtherOps, that's the total number of direct and indirect Ops.  If you then subtract PartnerOps, that resulting number (63392 Ops) is the number of direct Ops.  That's .2% direct IOPS, which indicates a serious pathing configuration issue.