Tuesday, November 17, 2015

CDOT Tips: InterCluster LIFs

When you see ICL or “InterCluster LIF,” think SnapMirror/SnapVault.  ICL’s are interfaces totally dedicated to replication between clusters, that’s all they can do.  Here’s a couple important things to think about:

1.       There are two kinds of replication in CDOT:
a.       IntraCluster: this means volume move from one node to the other within the cluster.  This occurs using SnapMirror via the private 10Gb cluster network.
b.      InterCluster: This means SnapMirror or SnapVault to another cluster, which occurs over TR’s shared network.
2.     All of a cluster’s SnapMirror and SnapVault to other clusters only occur via the ICL’s. 
3.       ICLs work on two kinds of ports: Role=Data or Role=InterCluster.  Remember, you can assign a role to a port so that only LIFs with compatible roles will ever land there.

4.       Create at least 1 ICL per node.  ICL’s can’t be failed over between nodes, they are dedicated to that node.

5.       In 8.2, a node’s ICL’s can only be in a single network.   In 8.3 IPSpaces are re-introduced and allow multiple ICL’s in multiple IPSpaces per node.

Tuesday, October 6, 2015

CDOT Tips: Network Troubleshooting

Here are some essential commands for digging into your CDOT network configuration:

How to show the status of your physical and virtual ports:
>net port show –node node1
Node   Port   Role         Link   MTU Admin/Oper  Admin/Oper Admin/Oper
------ ------ ------------ ---- ----- ----------- ---------- ------------
       a0a    data         up    1500  true/-     auto/full   auto/10000
              data         up    1500  true/-     auto/full   auto/10000
       a0b    data         down  9000  true/-     auto/-      auto/-
       e0M    node-mgmt    up    1500  true/true  full/full   auto/1000
       e0a    cluster      up    9000  true/true  full/full   auto/10000
       e0b    data         up    1500  true/true  full/full   auto/10000
       e0c    cluster      up    9000  true/true  full/full   auto/10000
       e0d    data         up    1500  true/true  full/full   auto/10000

How to show detailed info of a specific port:
>net port show –node node1 –port a0b
                         Port: a0b
                           Role: data
                           Link: down
                            MTU: 9000
Auto-Negotiation Administrative: true
   Auto-Negotiation Operational: -
     Duplex Mode Administrative: auto
        Duplex Mode Operational: -
           Speed Administrative: auto
              Speed Operational: -
    Flow Control Administrative: full
       Flow Control Operational: -
                    MAC Address: 02:a0:98:5b:40:17
              Up Administrative: true
                      Port Type: if-group
    Interface Group Parent Node: -
    Interface Group Parent Port: -
          Distribution Function: ip
                  Create Policy: multimode_lacp
               Parent VLAN Node: -
               Parent VLAN Port: -
                       VLAN Tag: -
               Remote Device ID: -

How to show the MAC addresses of all your ports:
>port show -node  node1 -fields mac
node               port mac
------------------ ---- -----------------
eg-si-clsn-e01-h02 a0a  02:a0:98:5b:40:16
eg-si-clsn-e01-h02 a0a-2003 02:a0:98:5b:40:16
eg-si-clsn-e01-h02 a0b  02:a0:98:5b:40:17
eg-si-clsn-e01-h02 e0M  00:a0:98:5b:40:2a
Note that when you create an ifgrp in 8.2.x, the member ports will inherit the MAC address of the ifgrp.  E.g., a0a will have the same MAC as e0a and e0c. 

How to ping from a specific LIF:
  If you use the ping command, there’s no obvious way to see which port or LIF it’s emitting from.  You can use this command to specify which node or LIF you want to test:
>Network ping –node node1
>Network ping –lif-owner vserver1 –lif nfs_data_lif
There are also some cool options like number of packets, allow fragmentation, etc.

How to turn up/down a port:
>set advanced
> network port modify –node node1 –port e0a –up-admin true

How to list who your port is connected to:
>node run -node node1 -command cdpd show-neighbors

Tuesday, September 15, 2015

CDOT Tips: Replication

Following the “Data has mass” train of thought, here’s how we move and protect data in CDOT.  We have three types of replication:
  •           SyncMirror: Synchronous replication.  For redundancy.
  •           SnapMirror (DP): Replication to a 1-minute granularity.  For disaster recovery.
  •           SnapVault (XDP): Replication to a 1-hour granularity.  Think backups.

In CDOT we’ve simplified the commands and protocol: to create a SnapVault relationship, you run “SnapMirror create –type XDP.”  To create a SnapMirror relationship, you run “SnapMirror create –type DP.”  Here’s an example of setting up SnapMirror via CLI.

vs2::> snapmirror create -destination-path vs2:dept_eng_dp_mirror2 -source-path vs1:dept_eng -type DP

The primary distinction between DP SnapMirror and XDP SnapMirror is that SnapVault allows you to keep more snapshots on the destination.  Essentially, XDP SnapMirror is for long-term backups.  Other differences:
  •           DP SnapMirror relationships can be reversed (swap destination and source)
  •           DP SnapMirror can replicate every minutes, XDP SnapMirror once per hour.
  •           DP SnapMirror destination volumes can be made read/write.

Do you have some datasets that would benefit from a quicker time to recovery or have stricter SLA’s?  If so, DP SnapMirror is the best choice.    Now, with many systems and many volumes being replicated, how do you keep track of it all?  In our no-cost tool OCUM 6.0, there is a “Protection” tab that allows you to setup, change, remove, restore, and monitor all your replication relationships. 

Because ONTAP is our single-platform operating system, this also means you can replicate to the cloud (Cloud ONTAP in AWS or Azure), to an all-SATA NetApp “backup” system, or any other NetApp you have.  This “ONTAP everywhere” ubiquity is part of why ONTAP manages more exabytes than any other storage operating system in the world.

Thursday, August 6, 2015

CDOT Tip #11: Performance on the Fly (QOS)

QOS is the performance management system within ONTAP.  It doesn’t just stop a volume from going above a certain level of performance, it also monitors and manages performance.  So it’s your one-stop shop for most perf questions!

You can use ‘qos statistics latency show’  to see volume-level latency numbers.   
-          The left most column ‘Latency’ is total latency from request received until response to the client
-           ‘Network’ is the time it takes to traverse the network stack inside ONTAP
-          ‘Cluster’ will show latency due to indirect IO
-          ‘Data’ is ONTAP itself
-          ‘Disk’ is time to leave the controller and hit the disk
cluster1::> qos statistics latency show -iterations 100 -rows 3
Policy Group           Latency    Network    Cluster    Data       Disk       QoS
-------------------- ---------- ---------- ---------- ---------- ---------- ----------
-total-                110.35ms   110.02ms        0ms   327.00us        0ms        0ms
vs1vol0                167.82ms   167.22ms        0ms   603.00us        0ms        0ms
vol1                   117.76ms   117.56ms        0ms   191.00us        0ms        0ms
vol2                    44.24ms    44.05ms        0ms   190.00us        0ms        0ms
-total-                 38.89ms    38.63ms        0ms   256.00us        0ms        0ms
vol2                    64.47ms    64.20ms        0ms   266.00us        0ms        0ms
vol1                    27.28ms    27.03ms        0ms   253.00us        0ms        0ms
vs1vol0                 23.72ms    23.47ms        0ms   249.00us        0ms        0ms
-total-                409.81ms   409.65ms        0ms   169.00us        0ms        0ms

A simpler view with IOPS and throughput can be found using qos statistics performance show.’
    cluster1::> qos statistics performance show -iterations 100 -rows 4
Policy Group           IOPS      Throughput   Latency
-------------------- -------- --------------- ----------
-total-                    79     1296.00KB/s   337.41ms
_System-Best-Effort        25           0KB/s        0ms
vol1                       24       96.00KB/s   193.72ms
vol2                       18     1152.00KB/s   750.98ms
vs1vol0                    12       48.00KB/s   707.38ms
-total-                   109        1.99MB/s   133.27ms

Please note that the –total- line indicates the system-wide performance.  Very useful!

Monday, July 27, 2015


I spent some time putting information together on SAN on CDOT.  
-          Basic Architecture
o   CDOT uses NPIV to virtualize WWPNs.  This means any node can take traffic for any vserver, and ALUA optimizes the paths.
o   When zoning, utilize these vWWPNs and not the physical WWPNs.
o   Each SVM gets its own IQN (iSCSI) or WWNN (FC).
o   8-node cluster limit today
o   LUNs can be moved to new volumes non-disruptively inside a cluster
o   Limits
§  CDOT 8.2: 8,192 LUNS per node | 49,152 LUNS per cluster | 2,048 iSCSI sessions per node
§  CDOT 8.3: 12,288 LUNS per node | 98,304 LUNS per cluster | 8,192 iSCSI sessions per node
§  Linux host (8.2): 2,048 devices (# LUNS * # paths) | 16TB LUNs
§  Windows (8.2): 255 LUNs per host, 2TB LUNs (MBR) or 16TB (GPT)
o   FAS80X0 support 4-port 8Gb cards or 2-port 16Gb cards.
-          Useful commands:
o   Network interface show (will show FCP lifs)
o   Fcp show
o   Igroup show
o   Lun show
o   system node run -node cluster1-01 fcp topology show
o   Useful SAN Setup how-to (8.1 but very applicable): https://kb.netapp.com/support/index?page=content&id=1013341
-          Foreign LUN Import: 8.3 feature, no license required.  8.3.1 enables the online import using redirect.  Cutover happens first: pause IO to foreign SAN, present LUN to NetApp.  Foreign LUN shows up as a disk in ONTAP (think V-series) and then map this disk to the original host to serve the data.  Bring everything online, data is copied to NetApp while serving data.  Once it’s all moved, shut down foreign SAN. 

You can find the 8.2 SAN config guide here:
And 8.2.1 SAN Config Guide updates:
8.3 SAN config Guide:

Friday, July 24, 2015

NetApp IT: Why Clustered Data ONTAP is the Key to Unlocking the Hybrid Cloud

"Business has changed for NetApp IT with the adoption of the hybrid cloud model. The key to our strategy is NetApp® clustered Data ONTAP®, the foundation for our on-premises storage environment and an integral part of our hybrid cloud model. Clustered Data ONTAP enables our hybrid cloud model because of its ability to enhance the scale-out architecture and non-disruptive operations of clustered Data ONTAP."


Monday, July 20, 2015

Blog Views

My blog just crossed 60,000 views.  How cool is that?  Sure, that's small potatoes today with billions of people online, and I'm sure a blog with cats or bikinis would do that many views in a day...but in the absence of clickbait on my site, you can be assured that everyone reaching my blog is an engineer of some kind, looking for technical answers.

It's a very small world of data storage engineers and they make huge purchasing decisions.  That so many in this small market are landing on my blog and learning NetApp's technology here is pretty awesome.

CDOT Tip: PerfStat GUI

Running perfstat via command line or linux server can be inconvenient.  The good news is you can run perfstat via the GUI for 7-mode or CDOT on a Windows system to quickly set up, gather data, and upload it to NetApp!

Once you’ve gathered the data, upload to the case via upload.netapp.com.   This uploads the data into the right system, crunches the numbers, and the support engineer is notified.  Attaching the perfstat to the case will transfer the file but not upload it into our performance system, and unfortunately some TSE’s won’t notice the file attached to the case.   There are instructions here: https://kb.netapp.com/support/index?page=content&id=1010090&actp=search&viewlocale=en_US&searchid=1430246746983

Wednesday, July 15, 2015

Always On Dedupe

We're finding more and more clients with transient data who want to dedupe all the time.  With more powerful systems and especially the advent of flash, you can set dedupe to run all the time with little to no performance impact.  Here's a primer:

You can set it to always on like this:

Create a schedule object:
cluster:>> job schedule cron create –name per_minute –dayofweek * –hour * –minute 1

Create a dedupe policy linked to that schedule:
cluster:>> volume efficiency policy create –vserver -policy “Always_On_Dedupe” –type schedule –schedule per_minute –qos-policy background –enabled true

Assign that dedupe policy to your volume:

cluster::> volume efficiency on -volume -vserver -qos-policy Always_On_Dedupe

Those commands might be a bit off, I’m writing free-hand here.  It would be easier to use system manager to do it via GUI:

And here are some of the published results:

Design, testing, and results document: 

Command Reference Link:

Monday, July 13, 2015

CDOT Tip #10: CDOT 8.3.1 Features

8.3.1 has a big payload for a minor release and will be out early fall. Here are the Top 10 you care about!

  1. AFF Performance improvements: Reads are 600-900us faster over 8.3.1 from re-engineering the IO path for SSD’s.  
  2. All Flash FAS Configuration
    1. Inline compression always-on by default, with same performance as 8.3 without compression
    2. Zero block detection and dedupe by default
  3. SVM DR (Storage Virtual Machine Disaster Recovery)
    1. This replicates an entire vserver from one cluster to another, including exports, IPs, etc, allow for easy failover to DR site.  Think SnapMirror for vservers!
    2. Automated change management, automated setup and provisioning. 
    3. Retains CIFS shares, NFS exports, permissions, names, data, network config, certificates, QOS policies, and much more.
  4. Additional CDOT MetroCluster configurations, including AFF, FlashPool, and 200km stretch
  5. Cluster Peering Enhancements: This enables CDOT to snapmirror or SVM DR to multiple other clusters in multiple IPspaces.
  6. Encryption for Cloud ONTAP, a CDOT virtual instance available in AWS or Azure.
  7. Foreign LUN Import enhancement: reduces downtime required to migrate LUNs onto NetApp by mirroring dataset and redirecting traffic.
  8. Upgrades to built-in system manager, including AFF-specific changes, enabling ONTAP upgrades from the GUI, and easier network config.
  9. Usability enhancements:
    1. Enhancements for audit log management
    2. Support of LDAP and NIS user authentication for cluster access 
    3. Support for the banner and Message of the Day (MOTD)
    4. New FlashPool caching policies
    5. Automated Workload Analyzer (AWA) volume-level reporting to predict the impact of larger FlashPool/FlashCache.
  10. Protocol enhancements:
    1. Support for dynamic DNS
    2. Support for Windows NFSv3 clients
    3. Support for SMB encryption for data transfers over SMB
    4. Support for configuring a guest UNIX user
    5. Support for mapping the administrators group to root
    6. Enhancements to SQL Server and Hyper-V over SMB solutions

Read more at 8.3.1 release notes: https://library.netapp.com/ecm/ecm_get_file/ECMP12456155

CDOT Tip #9: Commands

You can find the command document for CDOT 8.3 here:  https://library.netapp.com/ecm/ecm_get_file/ECMP12452955

Also, CDOT 8.3.1RC1 is out, with 8.3.1 expected early this fall.  More to come on this soon!

Monday, June 1, 2015

CDOT Tip #8

In order to clarify CDOT’s networking architecture, here’s a basic explanation of each object involved.

1.       Node SVM: aggregates, disks, and ports belong to the node Storage Virtual Machine.
2.       Data SVM: volumes, qtrees, and data LIFs belong to the data SVM.
3.       Ports: You issue commands to the three types of ports the same way.  Those three types are:
a.       Physical Port: at the Physical Port level you can set the MTU or Flow Control. 
b.      Ifgrp (interface group): these are now named in the convention “a0a.”  Ifgrps are made up of ports and exist for redundancy and load balancing.  You can set all the port properties here (changes will override the member ports). Ifgrps have these properties: Role, MTU, Flow Control, Duplex, and load balancing policy.
c.       VLAN: You can assign a VLAN to a port or ifgroup, which creates a virtual port.  VLANS have mostly the same properties as Ifgrps.
4.       LIF (logical interface): a LIF has a Name, an IP address, netmask, Role and a Home Port.  A LIF belongs to a Failover Group and a Routing Group
5.       Failover Group: list of ports a LIF is allowed to be on.  You usually want one for each node management, one for cluster management, and one for 10GbE data.
6.       Routing Group:  these allow a SVM to have different gateways for different VLANs or networks.  A Routing Group has these properties:  a name, address/mask combo (in CIDR notation), role, and metric.   Name data routing groups starting with a d, intercluster routing groups with an i, and cluster network routing groups with a c.

One thing you’ll notice is that Role is now important.  There are several Roles you can assign: Management, Data, Cluster, and Intercluster.  You’ll need to make sure the ports, LIFs, Failover Groups, and Routing Groups are in harmony as to their Role setting.

Also, one last point: SnapVault and SnapMirror are performed using the node’s Intercluster LIFs.  Your data SVMs will not have any intercluster LIFs. 

CDOT Tip #7

Security is always a top priority, so here are some things to consider as CDOT gets rolled out at TR.
1.       Full disk encryption (aka NetApp Storage Encryption)
2.       Non-returnable disk (NRD) entitlement
3.       SafeNet: the replacement for the DataFort data encryption devices is SafeNet StorageSecure.  SafeNet can do file encryption, key management, logging and auditing, and DB/APP encryption.
4.       RBAC – CDOT implements a command specific control, meaning you can give a group or user access to a single command or command tree.  For example, you can give someone access to just “network interface” or even more restricted “network interface show.”
5.       Firewall!  You can set system level, vserver level, and per interface firewall policies. 
6.       Use SSH (disable telnet and rsh)
7.       Alter ssh encryption algorithms per SVM
a.       Aes256-ctr, Aes192-ctr, Aes128-ctr
b.      Diffie-Hellman group exchange sha256
c.       Command: Security ssh show/security ssh modify -vserver -key-exchange-algorithms  - ciphers
6.       Reduce the default Config cli session time-out
a.       Command: System timeout modify 10
7.       SSL/TLS
a.       FIPS mode federal information processing standards 
b.      TLS only!  Command:  System services web modify -sslv3-enabled false
8.       Lock down export/share policies
a.       According to subnet
b.      NFS/CIFS ACL's
9.       Implement Off-box Antivirus
10.   Fpolicy: file based event notification.
a.       Based on file type, share/export, volume. 
b.      Allows you to monitor blocked access attempts.

11.   Log events to external syslog server (event command set)

CDOT Tip #6

Here are some recent CDOT questions we’ve fielded:

Q: ”I reviewed the volume move portion of the ‘CDOT replication Guide’ (attached). I don’t see anything in document about migrating an SVMs root volume to another node/aggregate along with volumes it host. Just wanted to confirm that there are no issues using ‘volume move’ to migrate an SVM root vol?” 
A: Root Volumes for data vservers are no problem, go ahead and move them.  Node volumes are not Vol movable, however. They belong to the physical node.

Q: “The standard vol language will be UTF8 in CDOT, but we have many 7mode volumes which are set to ‘en_US’ or ‘C’. The destination volumes will inherit the same language as the source. Is there any way to convert the volume languages ‘en_US’ or ‘C’ to UTF8 once they have been migrated to CDOT?” 
A:  No - You cannot change the volume language after it is set; thus, if you want it to be something specific, it needs to be changed on the 7-Mode volume prior to migrating with the 7-Mode Transition Tool.

Q:  “Can I have a backup (snapvault) configured on a volume which is currently the destination of a 7mode to cdot migration?” 
A:   No - During a TDP snapmirror (snapmirror from 7 to C), you cannot cascade the destination volume.

Q: “The Oracle Database Team wrote a script to take snapshots of their oracle volumes in cdot. They have a similar script for 7mode. In 7mode, they take a snapshot via the script and we set retentions on the filer so that aged snapshots can be deleted. But, I can’t figure out how to create a policy which does NOT create a snapshot, but will delete aged snapshots. Any ideas how we can achieve this?”
A: Your best bet would be to estimate the space per snapshot, turn on snap autodelete, then size the volume/snapreserve to hold that many snapshots.  You could also use an outside script or snapcreator.

Q: “Is there a command to see snapmirror lag in CDOT?”
A: Snapmirror show with the –fields option is what you’re looking for!  There are a lot of fields you can include, from lag time to last-transfer-end-timestamp. 

CDOT Tip #5

In 7-mode, configuration is largely stored in files: there are /etc/rc and /etc/exports, etc.  In CDOT, configuration is stored in optimized databases.  Here’s a quick primer on the main ones and what they do:

Replicated Database (RDB):
¡  Consists of four independent replication units:
     Mgwd: management gateway
     VLDB: volume location database
     Vifmgr: VIF (LIF) manager
     BCOM: Blocks Configuration Object Manager
¡  Uses data replication service for cluster configuration
     Platform for single system image management
     Synchronizes configuration data (e.g. volumes, LIFs)
¡  Stored in each node’s root volume
¡  RDB processes run in user space on each node
¡  Do not manipulate directly  - only via CLI/System Manager/ZAPI

And some relevant admin tips:
     Snapmirror, ems logs and others
¡  /mroot/etc/log/mlog or
     RDB logs (bcomd, mgwd, vifmgr, vldb)
     Core dump files               
¡  Event Viewer, cluster CLI event command or remote Web access are preferred methods to systemshell
¡  Access to any node’s mroot and logs, from any node
¡  Via systemshell (can ftp logs)
¡  Preferred: read-only Web access to log and core dump directories – available if node is online or taken over by its HA partner
¡  Setup KB 1013814: How to enable remote access to a node’s root volume in a cluster

CDOT Tip #4

Cool trick – You can watch traffic to an individual file in CDOT 8.3 with QOS policy groups:

cdot::qos policy-group> create -policy-group file_iops -vserver tdnas1                

cdot::qos policy-group> show
Name             Vserver     Class        Wklds Throughput 
---------------- ----------- ------------ ----- ------------
file_iops        tdnas1      user-defined 0     0-INF

Now you set this policy group per file:

cdot::> file modify -vserver tdnas1 -volume datastore_volume -file windows.vmdk -qos-policy-group file_iops

Then, you can do a perf analysis on the qos policy group

cdot::qos statistics performance> show -policy-group file_iops
Policy Group             IOPS      Throughput    Latency
-------------------- -------- --------------- ----------
file_iops                   1           0KB/s        0ms
file_iops                   1        0.00KB/s     2.00ms

Note: you only see the file_iops policy-group when doing the statistics command while there is actual traffic on the file with that policy group. otherwise you’ll only see –total- which reflects the whole cluster. 

CDOT Tip #3

Let’s tackle some networking!  Networking is admittedly not my strong suit, so please pepper me with questions if you see something amiss.  Some of these are general recommendations that may not be applicable to you, but they’re good to at least have reference to.
  • Remember, each lif type needs a routing group (mgmt, data, etc)
  • If you create a temporary IP address it may create a temporary routing group.  Make sure you go back and clean it up.
  • Remember to create the ifgrp before your lifs.  It’s a pain to go back!
  • If the switch port is type access, our ifgrps can’t have vlans. We recommend using switch port type trunk even if there’s only 1 vlan to allow for future flexibility
    • switchport trunk encapsulation dot1q
  • Portfast on
  • Disable IP fastpath
    • ::> node run -node * -command "options nodescope.reenabledoptions ip.fastpath"
    • ::> node run -node * -command options ip.fastpath.enable off
  • Disable flow control on all non-Unified Target Adapter (UTA) network interfaces and their associated switch ports
    • ::> net port modify -node -port   -flowcontrol-admin none
  • Create per-network/VLAN failover groups and modify network interface failover-group setting accordingly
    • ::> failover-groups create -failover-group -node -port [-vlan_id]
    • ::> network interface modify -vserver -lif -failover-group
  • You can do a net int show and use –fields pick field names (like routing-group) that aren’t showed by default.  Very useful
  • You can choose which lif to use when pinging.  This is a fantastic testing tool!  net ping -lif-owner svm -lif smlif -destination gwaddress
  • You can’t create a 2-node cluster unless the cluster network is up.  So if you’re setting up a switchless cluster, make sure you connect from each 10Gb port directly to the other node’s IC port.
  • If you’re setting up a switchless cluster, you need to follow these instruction on both nodes.
    • ::> set advanced      (y) 
    • ::*> network options switchless-cluster modify -enabled true

Bonus tip!
If you need to halt one controller without impacting the HA partner, there is no longer a cf disable option.  Use halt with an inhibit takeover switch (there’s also a restart inhibit takeover switch).

CDOT Tip #2

Here’s a nugget that is relevant to anyone who does CDOT implementations and surprised me when I found it out last year.  Essentially, cluster ha modify should only be run on 2 node clusters, because cluster ha (high availability) is different than storage failover.

Storage failover modify –mode non-ha: This is for Single Node Clusters.
Cluster ha modify –configured true: This is for Two Node Clusters only.  This disables Epsilon (the system of tie-breaking for 4+ node clusters). 
Storage failover modify –mode ha: This controls what we normally understand as takeover/giveback (cf enable).  It is applicable for any 2+ node cluster.  This  must be configured correctly for “cluster ha modify” to work. 

Lastly, if you have a 2-node cluster and you have set cluster ha modify to true, you will need to manually set it to false when you add nodes to grow the cluster.

Bonus link: Here is a description of which ONTAP files (q, e, I, m) are compatible with each platform.  In general, the ‘q’ file is what you will use, in the form “814P1_q_image.tgz.”  The .zip is the older format, for use with upgrades from 7.x systems, while the .tgz is used for upgrades from 8.x systems.  Lastly, netboot is for booting systems from an image on your laptop/server. 

CDOT Tip #1

CDOT Tip #1: When you’re looking up a CDOT system in ASUPs, remember that “hostname” correlates to node name, while “cluster name” is the name of the cluster admin vserver. 

When you look up a node (hostname), it displays the fitness dashboard.  Look below: the “Cluster Name” field is a hyperlink that takes accesses the cluster admin vserver.

When you click the hyperlink, it takes you to the cluster dashboard!  This lists all the vservers and nodes for that cluster.

Bonus: you can find 7-mode to CDOT command mapping here: https://library.netapp.com/ecm/ecm_download_file/ECMP1196780

Friday, May 29, 2015

SAP Project

A lot of lessons learned in a SAP project I've lead over the past few months.  Here's the environment:

  • 7-Mode 8.1.3 FAS6220's with 2TB Flash Cache and SAS disks
  • DB2 on AIX 7.1 on IBM P-Series
  • SAP on SUSE Linux
  • Migrating from HDS FCP to NetApp NFS

A really cool part of this: In their QA environment, they had 6 full copies of every database, taking a 30TB production environment to 200TB used in QA.  This also means that refreshes required a full re-copy of the database, which had performance impact.  Using FlexClones (thin, writable snapshots), we dropped QA capacity to 100TB used and negated the performance impact entirely.

Now that we have the basics laid out, here are a few important things we figured out.  About AIX:
  1. Turn on Selective ACKS in AIX.  This was a huge performance improvement for us.
  2. We saw a 40% throughput improvement upgrading from AIX 6.1 to AIX 7.1  Highly recommend it.
  3. Be careful with AIX LACP etherchannels.  We saw some very strange throughput drops, almost like port flapping, on a LPAR using a LACP etherchannel.  The client saw some errors related to it, recreated it, and we saw significant performance improvements.
  4. We didn't see any improvement using mount options like CIO or RBR.
  5. We settled on these mount options:  bg,hard,intr,rsize=65536,wsize=65536,timeo=600,vers=3,proto=tcp,rw

About DB2:
  1. Our migration plan (from HDS to NetApp) was using DB2 backup/restore.  This avoids slow log rolling and is the safest route for data integrity. 
  2. Make sure you distribute your DB2 datafiles onto multiple controllers and multiple volumes.  Think of each volume as a thread: the more threads, the better CPU parallel-ization, the better your performance.
  3. When performing a DB2 backup and restore, make sure you backup and restore to multiple controllers and multiple volumes: same reason as above.
  4. In order to reduce the number of SnapMirror relationships and improve the RPO, we combined all Transaction Logs and Archive Logs into 1 volume each per controller.  We put 
About FlexClones:
  1. When you FlexClone from a SnapVault destination snapshot, it inherits the SnapVault relationship.  You resolve this by breaking that relationship and restoring the qtree.
  2. In 7-mode, you can't FlexClone from a SnapMirror destination snapshot.
About SnapMirror:
  1. Single-threaded SnapMirror relationships can have a window size of 7MB, 14MB for multi-threaded.  Very good for long distance Snapmirror.
  2. You can multi-thread a SnapMirror relationship.
  3. Keep an eye on the SnapMirror Maximums.
  4. With a logs change rate of ~10MB/s per controller, we were able to accomplish a 1-minute SnapMirror interval over a 38ms RTT WAN.
More to come!