Friday, May 29, 2015

SAP Project

A lot of lessons learned in a SAP project I've lead over the past few months.  Here's the environment:

  • 7-Mode 8.1.3 FAS6220's with 2TB Flash Cache and SAS disks
  • DB2 on AIX 7.1 on IBM P-Series
  • SAP on SUSE Linux
  • Migrating from HDS FCP to NetApp NFS

A really cool part of this: In their QA environment, they had 6 full copies of every database, taking a 30TB production environment to 200TB used in QA.  This also means that refreshes required a full re-copy of the database, which had performance impact.  Using FlexClones (thin, writable snapshots), we dropped QA capacity to 100TB used and negated the performance impact entirely.

Now that we have the basics laid out, here are a few important things we figured out.  About AIX:
  1. Turn on Selective ACKS in AIX.  This was a huge performance improvement for us.
  2. We saw a 40% throughput improvement upgrading from AIX 6.1 to AIX 7.1  Highly recommend it.
  3. Be careful with AIX LACP etherchannels.  We saw some very strange throughput drops, almost like port flapping, on a LPAR using a LACP etherchannel.  The client saw some errors related to it, recreated it, and we saw significant performance improvements.
  4. We didn't see any improvement using mount options like CIO or RBR.
  5. We settled on these mount options:  bg,hard,intr,rsize=65536,wsize=65536,timeo=600,vers=3,proto=tcp,rw

About DB2:
  1. Our migration plan (from HDS to NetApp) was using DB2 backup/restore.  This avoids slow log rolling and is the safest route for data integrity. 
  2. Make sure you distribute your DB2 datafiles onto multiple controllers and multiple volumes.  Think of each volume as a thread: the more threads, the better CPU parallel-ization, the better your performance.
  3. When performing a DB2 backup and restore, make sure you backup and restore to multiple controllers and multiple volumes: same reason as above.
  4. In order to reduce the number of SnapMirror relationships and improve the RPO, we combined all Transaction Logs and Archive Logs into 1 volume each per controller.  We put 
About FlexClones:
  1. When you FlexClone from a SnapVault destination snapshot, it inherits the SnapVault relationship.  You resolve this by breaking that relationship and restoring the qtree.
  2. In 7-mode, you can't FlexClone from a SnapMirror destination snapshot.
About SnapMirror:
  1. Single-threaded SnapMirror relationships can have a window size of 7MB, 14MB for multi-threaded.  Very good for long distance Snapmirror.
  2. You can multi-thread a SnapMirror relationship.
  3. Keep an eye on the SnapMirror Maximums.
  4. With a logs change rate of ~10MB/s per controller, we were able to accomplish a 1-minute SnapMirror interval over a 38ms RTT WAN.
More to come!