Tuesday, May 24, 2011

NetApp Training Brain Dump: SnapMirror/SyncMirror (Data Replication)

In project planning, it's well known that there are three competing constraints: time, money, and scope.  You constantly negotiate with stakeholders to squeeze as much as you can out of each, but in the end you're dealing with a reality that most of your job is navigating those constraints.  Engineers are extremely familiar with this reality, even if they've never heard of the Triple Constraint Triangle:
Credit: Wikipedia
SnapMirror is Netapp's software that handles data replication from one system to another, giving you another copy of your information.  This can be done for DR, for backup, to provide quick access, to spread out CPU utilization, to minimize traffic across distances, or for any number of reasons.  Businesses find this service invaluable, and the data replication industry is expected to grow from $2.7B in 2007 to $4.4B in 2011.  SnapMirror comes in synchronous, semi-synchronous (CP Forwarding), and asynchronous modes.

SyncMirror vs Sync Snapmirror: there are a couple important distinctions here.  For one, SyncMirror is only used as the replication between two FAS systems in a MetroCluster.  Also, SyncMirror works at the aggregate level, whereas Sync Snapmirror operates with volumes and qtrees.

In data replication, a big advantage of SnapMirror is NetApp's implementation of network compression (only available in async mode), which allows you to speed up the transfer while reducing bandwidth utilization by compressing the data before transmission on the source side and decompressing it before write on the destination.  In this, you find another constraint triangle: bandwidth utilization, speed of transfer of compressed data, and CPU utilization.  In order to compress data that is being transferred at high rates, the CPU has to increase the number of calculation operations per Gb transferred.  If you keep transfer rate low and steady, the CPU utilization will stay correspondingly low.

Credit: Me!
(Please note that this graph demonstrates a
relationship, and is not accurate to actual system
statistics)
Obviously, you want to keep transfer rate high as possible but the other two as low as possible.  The effect of this is for every increase in transfer rate, there is either a corresponding increase in CPU or bandwidth utilization, or both.

Quick hits:
  • SnapMirror can replicate at the volume and qtree level.
  • Consider using a WAN compression device (e.g. RiverBed SteelHead) instead of SnapMirror to compress ALL SAN traffic (don't use both).  SnapMirror compression obviously just compresses SnapMirror traffic.  WAN compression devices handle latency/packet loss more efficiently, as well.
  • NetApp advertises compression rates of 3.5:1 for Oracle, 2.7:1 for home directory, and 1.5:1 for Exchange.  YMMV.
  • Checkpoints are once per 5 minutes.  If the transfer is aborted/interrupted, it will begin replication again at the last checkpoint.
  • In sync mode, writes to the source NVRAM are immediately transferred to the dest NVRAM.  This is called NVLOG forwarding.  After a 25s NVLOG Forwading timeout, the process is relegated to semi-sync status.
  • Consistency Points are when the contents of NVRAM are flushed to the local disk, which occurs  in certain situations, e.g. the NVRAM of the source is half full.  CP's are also generated every 10s.  These cache dumps are forwarded to the dest: a 1 min timeout in this process will relegate the replication to async status.
  • You can obviously transition back into sync from async.
  • Initial SnapMirror replication is very disk and CPU intensive, partially due to the amount of data, partially due to background processes like deswizzling.  Subsequent mirroring of the same data has a drastically lower impact.
  • Considerations for sizing of volumes are important.  Flexclones/snapshots introduce complications for this process.
  • Things you need to be careful of:
    • Changing source/dest volume names
    • Changing source/dest volume sizes
    • Change hostnames
    • Changing ONTAP versions
    • Deleting/creating luns/snapshots/etc on either side
    Sources:
    Async: http://www.netapp.com/us/library/technical-reports/tr-3446.html
    Sync and Semi: http://media.netapp.com/documents/tr-3326.pdf

    No comments:

    Post a Comment