Now that we know we can perform a shelf reboot live, we got a bit adventurous.
The question we were trying to answer is "Could we replace/remove a shelf on a live system without causing downtime?" I used a 3160 cluster in the lab with 4 DS14s in a loop, slowly failed all the disks in shelf 3, and removed ownership on those disks. At that point, I could shut down/unplug that shelf at will, and neither system complained except noting that they were transitioning to single-path.
I doubt NGS will ever give the plan their full blessing, but it's good to know that it's ok from a technical standpoint.
Update 1: I also successfully swapped out a shelf chassis in this manner in the lab. The controllers were totally ok with a new serial number! No issues that I could find.
Update 2: NGS did in fact OK this action plan twice, but later completely backed out. There's concern that the system will keep the shelf registered in the OS somewhere. A possible solution for this is the perform a failover/give back for each node after the shelf removal, since failover/giveback includes a reboot.
The question we were trying to answer is "Could we replace/remove a shelf on a live system without causing downtime?" I used a 3160 cluster in the lab with 4 DS14s in a loop, slowly failed all the disks in shelf 3, and removed ownership on those disks. At that point, I could shut down/unplug that shelf at will, and neither system complained except noting that they were transitioning to single-path.
I doubt NGS will ever give the plan their full blessing, but it's good to know that it's ok from a technical standpoint.
Update 1: I also successfully swapped out a shelf chassis in this manner in the lab. The controllers were totally ok with a new serial number! No issues that I could find.
Update 2: NGS did in fact OK this action plan twice, but later completely backed out. There's concern that the system will keep the shelf registered in the OS somewhere. A possible solution for this is the perform a failover/give back for each node after the shelf removal, since failover/giveback includes a reboot.
No comments:
Post a Comment