Monday, June 1, 2015

CDOT Tip #8

In order to clarify CDOT’s networking architecture, here’s a basic explanation of each object involved.

1.       Node SVM: aggregates, disks, and ports belong to the node Storage Virtual Machine.
2.       Data SVM: volumes, qtrees, and data LIFs belong to the data SVM.
3.       Ports: You issue commands to the three types of ports the same way.  Those three types are:
a.       Physical Port: at the Physical Port level you can set the MTU or Flow Control. 
b.      Ifgrp (interface group): these are now named in the convention “a0a.”  Ifgrps are made up of ports and exist for redundancy and load balancing.  You can set all the port properties here (changes will override the member ports). Ifgrps have these properties: Role, MTU, Flow Control, Duplex, and load balancing policy.
c.       VLAN: You can assign a VLAN to a port or ifgroup, which creates a virtual port.  VLANS have mostly the same properties as Ifgrps.
4.       LIF (logical interface): a LIF has a Name, an IP address, netmask, Role and a Home Port.  A LIF belongs to a Failover Group and a Routing Group
5.       Failover Group: list of ports a LIF is allowed to be on.  You usually want one for each node management, one for cluster management, and one for 10GbE data.
6.       Routing Group:  these allow a SVM to have different gateways for different VLANs or networks.  A Routing Group has these properties:  a name, address/mask combo (in CIDR notation), role, and metric.   Name data routing groups starting with a d, intercluster routing groups with an i, and cluster network routing groups with a c.

One thing you’ll notice is that Role is now important.  There are several Roles you can assign: Management, Data, Cluster, and Intercluster.  You’ll need to make sure the ports, LIFs, Failover Groups, and Routing Groups are in harmony as to their Role setting.


Also, one last point: SnapVault and SnapMirror are performed using the node’s Intercluster LIFs.  Your data SVMs will not have any intercluster LIFs. 

CDOT Tip #7

Security is always a top priority, so here are some things to consider as CDOT gets rolled out at TR.
1.       Full disk encryption (aka NetApp Storage Encryption)
2.       Non-returnable disk (NRD) entitlement
3.       SafeNet: the replacement for the DataFort data encryption devices is SafeNet StorageSecure.  SafeNet can do file encryption, key management, logging and auditing, and DB/APP encryption.
4.       RBAC – CDOT implements a command specific control, meaning you can give a group or user access to a single command or command tree.  For example, you can give someone access to just “network interface” or even more restricted “network interface show.”
5.       Firewall!  You can set system level, vserver level, and per interface firewall policies. 
6.       Use SSH (disable telnet and rsh)
7.       Alter ssh encryption algorithms per SVM
a.       Aes256-ctr, Aes192-ctr, Aes128-ctr
b.      Diffie-Hellman group exchange sha256
c.       Command: Security ssh show/security ssh modify -vserver -key-exchange-algorithms  - ciphers
6.       Reduce the default Config cli session time-out
a.       Command: System timeout modify 10
7.       SSL/TLS
a.       FIPS mode federal information processing standards 
b.      TLS only!  Command:  System services web modify -sslv3-enabled false
8.       Lock down export/share policies
a.       According to subnet
b.      NFS/CIFS ACL's
9.       Implement Off-box Antivirus
10.   Fpolicy: file based event notification.
a.       Based on file type, share/export, volume. 
b.      Allows you to monitor blocked access attempts.

11.   Log events to external syslog server (event command set)

CDOT Tip #6

Here are some recent CDOT questions we’ve fielded:

Q: ”I reviewed the volume move portion of the ‘CDOT replication Guide’ (attached). I don’t see anything in document about migrating an SVMs root volume to another node/aggregate along with volumes it host. Just wanted to confirm that there are no issues using ‘volume move’ to migrate an SVM root vol?” 
A: Root Volumes for data vservers are no problem, go ahead and move them.  Node volumes are not Vol movable, however. They belong to the physical node.

Q: “The standard vol language will be UTF8 in CDOT, but we have many 7mode volumes which are set to ‘en_US’ or ‘C’. The destination volumes will inherit the same language as the source. Is there any way to convert the volume languages ‘en_US’ or ‘C’ to UTF8 once they have been migrated to CDOT?” 
A:  No - You cannot change the volume language after it is set; thus, if you want it to be something specific, it needs to be changed on the 7-Mode volume prior to migrating with the 7-Mode Transition Tool.

Q:  “Can I have a backup (snapvault) configured on a volume which is currently the destination of a 7mode to cdot migration?” 
A:   No - During a TDP snapmirror (snapmirror from 7 to C), you cannot cascade the destination volume.

Q: “The Oracle Database Team wrote a script to take snapshots of their oracle volumes in cdot. They have a similar script for 7mode. In 7mode, they take a snapshot via the script and we set retentions on the filer so that aged snapshots can be deleted. But, I can’t figure out how to create a policy which does NOT create a snapshot, but will delete aged snapshots. Any ideas how we can achieve this?”
A: Your best bet would be to estimate the space per snapshot, turn on snap autodelete, then size the volume/snapreserve to hold that many snapshots.  You could also use an outside script or snapcreator.

Q: “Is there a command to see snapmirror lag in CDOT?”
A: Snapmirror show with the –fields option is what you’re looking for!  There are a lot of fields you can include, from lag time to last-transfer-end-timestamp. 

CDOT Tip #5

In 7-mode, configuration is largely stored in files: there are /etc/rc and /etc/exports, etc.  In CDOT, configuration is stored in optimized databases.  Here’s a quick primer on the main ones and what they do:

Replicated Database (RDB):
¡  Consists of four independent replication units:
     Mgwd: management gateway
     VLDB: volume location database
     Vifmgr: VIF (LIF) manager
     BCOM: Blocks Configuration Object Manager
¡  Uses data replication service for cluster configuration
     Platform for single system image management
     Synchronizes configuration data (e.g. volumes, LIFs)
¡  Stored in each node’s root volume
     /mroot/etc/cluster_config/rdb/
¡  RDB processes run in user space on each node
¡  Do not manipulate directly  - only via CLI/System Manager/ZAPI

And some relevant admin tips:
     Snapmirror, ems logs and others
¡  /mroot/etc/log/mlog or
     RDB logs (bcomd, mgwd, vifmgr, vldb)
     command-history.log
     Core dump files               
¡  Event Viewer, cluster CLI event command or remote Web access are preferred methods to systemshell
¡  Access to any node’s mroot and logs, from any node
¡  Via systemshell (can ftp logs)
¡  Preferred: read-only Web access to log and core dump directories – available if node is online or taken over by its HA partner
¡  Setup KB 1013814: How to enable remote access to a node’s root volume in a cluster


CDOT Tip #4

Cool trick – You can watch traffic to an individual file in CDOT 8.3 with QOS policy groups:

cdot::qos policy-group> create -policy-group file_iops -vserver tdnas1                

cdot::qos policy-group> show
Name             Vserver     Class        Wklds Throughput 
---------------- ----------- ------------ ----- ------------
file_iops        tdnas1      user-defined 0     0-INF

Now you set this policy group per file:

cdot::> file modify -vserver tdnas1 -volume datastore_volume -file windows.vmdk -qos-policy-group file_iops

Then, you can do a perf analysis on the qos policy group

cdot::qos statistics performance> show -policy-group file_iops
Policy Group             IOPS      Throughput    Latency
-------------------- -------- --------------- ----------
file_iops                   1           0KB/s        0ms
file_iops                   1        0.00KB/s     2.00ms


Note: you only see the file_iops policy-group when doing the statistics command while there is actual traffic on the file with that policy group. otherwise you’ll only see –total- which reflects the whole cluster. 

CDOT Tip #3

Let’s tackle some networking!  Networking is admittedly not my strong suit, so please pepper me with questions if you see something amiss.  Some of these are general recommendations that may not be applicable to you, but they’re good to at least have reference to.
  • Remember, each lif type needs a routing group (mgmt, data, etc)
  • If you create a temporary IP address it may create a temporary routing group.  Make sure you go back and clean it up.
  • Remember to create the ifgrp before your lifs.  It’s a pain to go back!
  • If the switch port is type access, our ifgrps can’t have vlans. We recommend using switch port type trunk even if there’s only 1 vlan to allow for future flexibility
    • switchport trunk encapsulation dot1q
  • Portfast on
  • Disable IP fastpath
    • ::> node run -node * -command "options nodescope.reenabledoptions ip.fastpath"
    • ::> node run -node * -command options ip.fastpath.enable off
  • Disable flow control on all non-Unified Target Adapter (UTA) network interfaces and their associated switch ports
    • ::> net port modify -node -port   -flowcontrol-admin none
  • Create per-network/VLAN failover groups and modify network interface failover-group setting accordingly
    • ::> failover-groups create -failover-group -node -port [-vlan_id]
    • ::> network interface modify -vserver -lif -failover-group
  • You can do a net int show and use –fields pick field names (like routing-group) that aren’t showed by default.  Very useful
  • You can choose which lif to use when pinging.  This is a fantastic testing tool!  net ping -lif-owner svm -lif smlif -destination gwaddress
  • You can’t create a 2-node cluster unless the cluster network is up.  So if you’re setting up a switchless cluster, make sure you connect from each 10Gb port directly to the other node’s IC port.
  • If you’re setting up a switchless cluster, you need to follow these instruction on both nodes.
    • ::> set advanced      (y) 
    • ::*> network options switchless-cluster modify -enabled true


Bonus tip!
If you need to halt one controller without impacting the HA partner, there is no longer a cf disable option.  Use halt with an inhibit takeover switch (there’s also a restart inhibit takeover switch).

CDOT Tip #2

Here’s a nugget that is relevant to anyone who does CDOT implementations and surprised me when I found it out last year.  Essentially, cluster ha modify should only be run on 2 node clusters, because cluster ha (high availability) is different than storage failover.

Storage failover modify –mode non-ha: This is for Single Node Clusters.
Cluster ha modify –configured true: This is for Two Node Clusters only.  This disables Epsilon (the system of tie-breaking for 4+ node clusters). 
Storage failover modify –mode ha: This controls what we normally understand as takeover/giveback (cf enable).  It is applicable for any 2+ node cluster.  This  must be configured correctly for “cluster ha modify” to work. 





Lastly, if you have a 2-node cluster and you have set cluster ha modify to true, you will need to manually set it to false when you add nodes to grow the cluster.


Bonus link: Here is a description of which ONTAP files (q, e, I, m) are compatible with each platform.  In general, the ‘q’ file is what you will use, in the form “814P1_q_image.tgz.”  The .zip is the older format, for use with upgrades from 7.x systems, while the .tgz is used for upgrades from 8.x systems.  Lastly, netboot is for booting systems from an image on your laptop/server. 

CDOT Tip #1

CDOT Tip #1: When you’re looking up a CDOT system in ASUPs, remember that “hostname” correlates to node name, while “cluster name” is the name of the cluster admin vserver. 


When you look up a node (hostname), it displays the fitness dashboard.  Look below: the “Cluster Name” field is a hyperlink that takes accesses the cluster admin vserver.

When you click the hyperlink, it takes you to the cluster dashboard!  This lists all the vservers and nodes for that cluster.



Bonus: you can find 7-mode to CDOT command mapping here: https://library.netapp.com/ecm/ecm_download_file/ECMP1196780