Friday, May 16, 2014

Datacenter Migration Project: Lessons Learned

Some note I wanted to save for posterity on a datacenter migration project I lead several years ago:

What we did well:
  • Everything was set into the racks correctly on the first try (pretty incredible if you think about it).
  • Everyone arrived to the pre-meeting and customer sites on time (started at midnight).
  • Communication with the customer was consistent.
  • We got the backup filer up before 7am!
  • We got the equipment all up and handed over by 4:30pm  (our expected was 4pm and deadline was 7pm)
  • We handled several mini-crises in stride:
    • Motherboard death
    • Loop “login delay” issue
    • FC cable shortage
    • Loop combination

What we need to remember next time:
  • Starting at midnight and going all day is very, very different from starting at 8am and going all day.
  • It’s important for a team to be familiar with one another and how the work tasks will flow.
  • It’s important to have a good blend of experienced and inexperienced team members.
  • Fly the team into town the day before – You can’t expect someone to travel all day and then work all night.
  • Make sure the team understands how to read the documentation before you get onsite.
  • Check to see if the team is experienced with the specific technologies relevant.
  • Have a pre-job meeting (preferably with pizza) to explain expectations and game plan.
  • Make sure everyone is in the same hotel, close to the jobsite.
  • Make sure there is food/drink arriving regularly for the guys who are working.
  • Add an extra guy to do physical work if you have someone supervising/project managing.
  • Drills are important.
  • Get all of the rails into the racks beforehand if possible.
  • Plan to combine loops/stacks if the system is complex and spread out.  Don’t underestimate how long it takes to cable per loop!
  • Cabling loops between racks is significantly more time consuming.
  • Don’t expect to be able to salvage any/all cables that run under/over racks.  Was a total rat’s nest.
  • Have a rested, standby team member to handle hardware failures/support issues 
  • Make sure you’re aware of the plan for switches/patch panels that are in-rack, and for disconnecting the PDU’s.