IT engineering and a little bit of hacking: April 2014

Thursday, April 24, 2014

Performance Case

Just a performance case I worked recently. These kinds of things can be instructive for people with similar problems, or those who learn by playing along at home. Here's my email, sorry for the copy paste without context.

Please take a look at the email chain below as the starting point for this conversation. At this point, we’ve reviewed four sets of performance data gathered over the last two months and have closely correlated a spike in large-IOP-size traffic to our latency spikes. This spike is both in number and size of IOPS, exceeding 32,000 IOPS for 15+ minutes at a time. There is no single volume driving the traffic, as it appears to be increasing dramatically across the board.

Here is a summary of the performance data from Thursday 4/3, please note the IOP ramp up and associated latency:

Start Time	CPU Busy	NFS Op/s	Read Op/s	Read Lat (ms)	Write Op/s	Write Lat (ms)	Net Sent (MB/s)	Net Recv (MB/s)
9:53p	52	7,127	1,256	2.32	5,731	0.63	27	50
9:57p	67	13,637	5,396	19.96	8,076	110.09	145	81
10:05p	99	31,119	17,272	22.91	13,697	341.18	371	205
10:13p	99	32,311	22,519	8.4	9,739	229.45	621	200
10:24p	99	23183	12,819	7.21	10,261	143.16	348	260

And here is the data from Thursday 2/13.

Period	CPU Busy	NFS Op/s	Read Op/s	Read Lat (ms)	Write Op/s	Write Lat (ms)	Net Sent (MB/s)	Net Recv (MB/s)
9:09p	78	12,574	5,562	3.19	6,926	1.05	200	141
9:18p	98	27,771	17,291	4.7	10,347	24.29	571	312
9:29p	98	33,050	21,460	9.11	11,507	125.85	650	352
9:38p	98	34,149	22,813	11.28	11,216	530.9	647	345

One thing that stands out in the data is a large, sudden increase in 64k+ IOPS. I’ve adjusted the table to include a row for 64k IOPS and have highlighted the relevant statistic.

FAS6280 Maximum IOPS
	Read/Write Mix
Avg IO Size	100/0	75/25	50/50	25/75	0/100
64k	61,000	43,000	32,000	26,000	22,000
32k	68,000	48,000	36,500	30,000	25,000
24k	74,000	51,000	39,500	31,500	27,000
16k	80,000	56,500	43,000	36,500	30,500
8k	85,000	63,000	50,000	41,500	36,000
4k	90,000	66,000	54,000	45,000	40,000

The workload mix appears fine for most of the day but experiences large-IOP-size peaks that are outside our guidelines and cause some pain (38,000 IOPS 1pm 4/5, 45,000 IOPS 11pm 4/4, 37,000 IOPS 11am 4/3) . I’d also make mention that ~10% of IO to this system is misaligned, which hinders us from achieving maximum performance ROI. Lastly, this system is achieving 65-85% dedupe ratios, which is fantastic space conservation but adds to the overall workload.

As discussed yesterday, here are our options:

· Short term steps:

o Stagger workloads (Symantec, et al)

o Disable aggr snapshots (done)

o Stagger dedupe

o Case open on daytime snapshot correlated latency (done)

o Update Data ONTAP

· Long term solutions:

o Add new disk to passive controller and balance workload or

o Shift workload to a different or new HA pair

How to convert from 7-Mode to CDOT

Here is a publicly accessible document on converting your old 7-Mode systems to NetApp's new operating system, CDOT.

https://kb.netapp.com/support/index?page=content&id=1013517

An excerpt:

Perform the following steps to convert from Data ONTAP 7-Mode to Data ONTAP 8.0X Cluster-Mode

Disable 'Cluster Failover' and reboot the node to the LOADER prompt. Do not perform a takeover.
Boot each node to the LOADER prompt and ensure that the following variables are set:

To convert from 7-Mode to Cluster-Mode:
LOADER> set-defaults LOADER> setenv bootarg.init.boot_clustered true LOADER> setenv bootarg.bsdportname
Boot the node with this command:
boot_ontap
When the nodes are booting, press CTRL+C to enter the Boot menu.
At the Boot menu, select wipeconfig on each node.
******************************* * Press Ctrl-C for Boot Menu. * ******************************* How would you like to continue booting? (normal) Normally (install) Install new software first (password [user]) Change root/user password (setup) Run setup first (init) Initialize disks and create flexvol (maint) Boot into maintenance mode (syncflash) Update flash from backup config (reboot) Reboot node Please make a selection: wipeconfig

ONTAP 8.2 Commands

This could be useful to have handy: new/modified commands in 8.2

7-mode 8.2

https://library.netapp.com/ecmdocs/ECMP1368838/html/GUID-EE036B98-BE1B-4CFA-A3DD-85CF6C1AF0A1.html

All Commands in CDOT 8.2

https://library.netapp.com/ecm/ecm_get_file/ECMP1366832

Mapping 7-mode commands to CDOT 8.2

https://library.netapp.com/ecmdocs/ECMP1366830/html/index.html

Measuring Read Sequential vs Random

If you want to measure an existing workload's seq vs random profile in IOPS, there's a pretty simple way. Run this command (diag mode): stats show readahead. The part you care about will look like this:

Table 1
readahead:readahead:seq_read_reqs.4K:71%
readahead:readahead:seq_read_reqs.8K:80%
readahead:readahead:seq_read_reqs.12K:78%
readahead:readahead:seq_read_reqs.16K:70%
readahead:readahead:seq_read_reqs.20K:75%
readahead:readahead:seq_read_reqs.24K:75%
readahead:readahead:seq_read_reqs.28K:78%
readahead:readahead:seq_read_reqs.32K:76%
readahead:readahead:seq_read_reqs.40K:77%
readahead:readahead:seq_read_reqs.48K:81%
readahead:readahead:seq_read_reqs.56K:79%
readahead:readahead:seq_read_reqs.64K:98%
readahead:readahead:seq_read_reqs.80K:86%
readahead:readahead:seq_read_reqs.96K:0%
readahead:readahead:seq_read_reqs.112K:0%
readahead:readahead:seq_read_reqs.128K:0%
readahead:readahead:seq_read_reqs.256K:0%
readahead:readahead:seq_read_reqs.384K:0%
readahead:readahead:seq_read_reqs.512K:0%
readahead:readahead:seq_read_reqs.1024K:0%

The first part is garbage: what we care about starts after the period. The "56k" number is the approximate size of the IOP, and percentage is how many of that size were sequential during your measurement. Ignore the first whole part, and focus on the percentile. Just a glace shows you we have at least 75% seq read IOPS averaged across the IOPS sizes. That's good enough for most performance profiling questions.

I don't think this is particularly useful, but I was asked recently the read random vs sequential throughput measurement as well. We can also calculate that (roughly) here. Notice this IOP size count:

Table 2
readahead:readahead:total_read_reqs.4K:94346
readahead:readahead:total_read_reqs.8K:34333
readahead:readahead:total_read_reqs.12K:9669
readahead:readahead:total_read_reqs.16K:8533
readahead:readahead:total_read_reqs.20K:19922
readahead:readahead:total_read_reqs.24K:4563
readahead:readahead:total_read_reqs.28K:4069
readahead:readahead:total_read_reqs.32K:12009
readahead:readahead:total_read_reqs.40K:10299
readahead:readahead:total_read_reqs.48K:5949
readahead:readahead:total_read_reqs.56K:4908
readahead:readahead:total_read_reqs.64K:2675102
readahead:readahead:total_read_reqs.80K:234257

You guessed it. Multiply the corresponding # of IOPS times the size beside it, multiply that by the percentage that accompanies that IOP size in the first table. For example:

readahead:readahead:total_read_reqs.32K:12009
readahead:readahead:seq_read_reqs.32K:76%

Calculation 1: 32*12,009 = 384,288Kb

This is your total Kb throughput at that IOP size.

Calcuation 2: 384,288Kb*76% = 292,058Kb

This is the sequential throughput at that IOP size. Do each calculation for each size, then add all the Calc 1's up and all the Calc 2's. Finally, divide the sum of the Calc 2's by the sum of the Calc 1's, and you have the percentage of the read throughput which is sequential!

CDOT and Show Mount

FYI, NetApp has a quick workaround to get show mount to work. From the ReadMe:

This tool provides 7-Mode functionality in Clustered Data ONTAP for the "showmount -e" command as it is executed by an application as a workaround tool until the official ONTAP fix in 8.3.

It is a set of scripts which needs to be copied to the client machines on which the 'showmount -e' command will be executed and hence replacin the original showmount command binary.

The steps to use this showmout wrapper:

1.Move the existing /usr/sbin/showmount to /usr/sbin/showmount_old. This is very important step.

2.Copy the files [NaServer.py, showmount.py, NaErrno.py, NaElement.py, showmount] from showmount_export folder to /usr/sbin.

3.Update the showmount file with proper username and password for the storage virtual machine access

4.Execute showmount -e

Note: Please make sure the LIF "Role" is data and "Firewall policy" is mgmt, when you create the LIF from CLI or If you use system manager to create LIF (storage virtual machine -> select the svm -> configuration -> Network Interface -> create ) and make sure to select "Both" in "Role" screen of the LIF

5.Check the results are coming then try with OVMManager.

http://support.netapp.com/NOW/download/tools/showmount_plugin_cdot/

CDOT 8.2.1 Summary

Take a moment to familiarize yourself with this CDOT 8.2.1 documentation. There is a TON new and improved in 8.2.1 over 8.2, as well as some cautions. I’ve highlighted a few here.

https://library.netapp.com/ecmdocs/ECMP1368924/html/GUID-45F85A02-114C-4192-8F1B-A4F50996D307.html

Features:

Support for FAS8000 series
V-Series feature now called “FlexArray,” a non disruptive on-the-fly licensable feature.
Support for qtree exports
Storage Encryption support
Support for direct attach E-Series configurations
Non-Disruptive shelf removal support
Log and core dumps available via http:///spi/
SQL over SMB3 non-disruptive operations support
VMware over IPv6 support
Offbox antivirus support
Health monitoring of Cluster Switches
Increased Max aggr sizes
32-64-bit aggr conversion enhancements
Automatic Workload Analyzer, which assesses how the system would benefit from SSDs (Flash Pool)
Support for “Microsoft Previous Versions” tab on files (8.2 and later)

Cautions:

Some Hitachi or HP XP array LUNs might not be visible. “In the case of Data ONTAP systems that support array LUNs, if the FC initiator ports are zoned with Hitachi or HP XP array target ports before the storage array parameters are set and the LUNs are mapped to the host groups, you might not be able to see any LUNs presented to the Data ONTAP interface“
NFSv2 not supported. Windows over NFSv3 not supported.
Verify management software versions are compatible.
First VLAN configuration may temporarily disconnect the port.
LUN revision numbers change during upgrades. Windows 2008, 2012 interpret these as new LUNs.
Dedupe space considerations and clearing stale metadata for upgrades.
Cautions for proper cluster and vserver peering methods
Cautions for proper vol move methods

Tuesday, April 15, 2014

Aggregate Snapshot autodelete_base

Interesting aggregate snapshot name: autodelete_base

The autodelete_base snapshot was created as a fix for BUG 263339: Aggregate snapshot autodelete is leaving me with no snapshots. The filer will automatically create a snapshot called autodelete_base at the aggregate level when a snapshot is removed as a result of autodeletion. This ensures that there is always a new snapshot in the aggregate after a snapshot is deleted.

https://kb.netapp.com/support/index?page=content&id=2011516&locale=en_US

Tuesday, April 1, 2014

SANtricity Walkthough

Quick walkthrough for anyone unfamiliar with SANtricity. Names have been changed to protect the innocent, I apologize for the mspaint job but hey, this is a free blog! You get what you pay for!

1. SANtricity home page

2. Right click an array. Click manage.

3. Click “view storage array profile.” You’ll see the chassis SN. These are sometimes missing, sometimes strange combos of letters and numbers, sometimes 70xxxx like FAS systems When we fix the environment, they’ll all be 70xxx.

4. SANtricity tells you if the system is properly cabled:
"Tray path redundancy"

5. Click hardware tab. You’ll see the controller firmware version.

6. Exit the view storage array profile. Click hardware tab.
Right click a disk you see these options. This is where you manually fail a drive.
Also notice the menu bar at top has changed: there are a ton of options up there,
including performance monitoring.

7. Scroll down, you’ll see the controllers. Right click one, here are your options,

like locate which blinks lights, or "place" gives you options to “fail over” the
controller, reboot it, etc if it fails.

8. Configure lets you set the management port IP.

9. The port config window

10. The tray’s components and their status

E-Series Notes

We've been digging into E-Series as fast as humanly possible, here are some of the basic things that have bee tough to flush out:

Impending drive failures: the best practice is to fail the drive manually immediately before replacing it
Upgrades:

Best practice is to update NVSRAM and Controller Firmware (CFW) at the same time
Watch out! The SANtricity will start running checks and try to upgrade all of the arrays if you do this wrong. Be careful. Luckily, it should ask for the array password before upgrading, which gives you a chance to cancel.
If you open the array, go to storage, right click the system, you'll find an "upgrade" option where you can update just that system. Use this!
A precheck is run automatically to detect possible upgrade issues.
CFW upgrades are non-disruptive and should take 5-15 minutes per array
NetApp recommends not updating disk firmware unless you have a specific reason to

Changing array password if you forget it: https://kb.netapp.com/support/index?page=content&id=1013311&actp=search&viewlocale=en_US&searchid=1395157054885
Cabling document: https://library.netapp.com/ecm/ecm_get_file/ECMP1394868

Looks like each array model has a different cabling schema.

E-Series systems have default IP's on
Default Username: shellUsr, Default Password: wy300&w4

Pages

Thursday, April 24, 2014

Tuesday, April 15, 2014

Tuesday, April 1, 2014