Wednesday, June 15, 2011

NetApp Training Brain Dump: SnapVault

Data replication (DR) is a pretty easy concept: you copy stuff from this system to another.  Ta-da!  
Backups are a pretty easy concept: you copy stuff from this system to another spot (either on the same system or another).  Ta-da!

Where you'll need to concentrate your brainpower is on the nuances of the implementation, both in terms of what a DR/Backup product does and how you need to use that to keep your data protected.  Here's a quick run down of SnapVault, NetApp's flagship DR/Backup product*1:

SnapVault can copy volumes or Qtrees.  You use this product to copy data from one FAS system (primary) to another (secondary, usually cheaper storage system) by setting up a schedule with retention periods.  SV will delete ("roll off") old backups as they get too old based upon how you set it up ("only keep 3 latest copies...").

SV does a basic full copy at first, which obviously takes a bit.  After that, it copies snapshots based upon your schedule, which essentially are incremental backups*2.  This is the most space-efficient method of preserving data, since only the data that has changed is replicated after the full backup.  

The built-in options for SV are hourly, daily, and weekly backups.  Monthly and other options are available but require specialized scripts.  SV obviously gives you flexibility to decide to keep a different number of snapshots on the primary than the secondary.  All backups are read only, but LUNS/Qtrees can be presented normally and used as read only.  

This setup is pretty darn space efficient, extremely fast, protects your data from a lot of dangers, gives you flexibility to set up backup architectures (e.g., present a read-only LUN from the backup to a backup server to send the data to tape), and saves you money by allowing the low-use backup data to be on cheaper storage.  

Digging a little bit deeper, here's how SV works: 
  1. You set up a schedule for SV to take snapshots on your primary.
  2. You set up a schedule for SV to replicate those snapshots at a certain rate to the secondary
  3. SV does a full copy to the secondary
  4. SV does a snapshot on the primary
  5. From the secondary, SV pulls the primary's first snapshot.
  6. SV continues doing snapshots on the primary
  7. On the secondary, SV compares the current snapshot it has to the newest snapshot and replicates any changes
  8. repeat steps 6 and 7 indefinitely
Remember, we're talking about snapshots vs regular incrementals: this means that the initial snapshot being replicated (step 5) will have no data at all, since none of the data has changed.  But by step 7, the initial snapshot has grown significantly.  It is not this, the first snapshot, that is replicated: it is the latest one.  But the latest snapshot on the primary will itself have less data in it than has changed since the last time the secondary received a snapshot replication, so the secondary must find the changes since the last snapshot replication and copy those.  Draw this out for yourself if you have trouble picturing it.  

Credit: me!

*1: Don't confuse SnapVault with SnapMirror.  SnapMirror does a one to one copy for failover, while SnapVault just copies data for retention. 

*2: refresher: full backups are complete copies, incremental backups are "what's changed since the last full or incremental," and differential backups are "what's changed since the last full."   Differential backups obviously become larger and waste more space as the time goes on since the latest full.

Great, honest convo back-and-forth about SV's strengths and weaknesses:

Official NetApp SnapVault Doc: tr-3487.pdf 

No comments:

Post a Comment