Can you non-disruptively
transition a clustered single chassis system into two chassis? We
included a 7.3.6=> 8.1.1 upgrade to try to take advantage of cf takeover –n,
which is used when there is a version mismatch to force a takeover when the
other controller halts.
Here’s a timeline of what
we tried (on a 3240 in the lab) along with the results:
1.
Upgrade B
a. Update B to 8.1.1, fail over to A
b. Move B to new chassis and connect interconnect cable
c. Set B's boolean to false
d. Cf giveback -f
2.
Upgrade A
a. Update A to 8.1.1
b. Cf takeover -n failed because the interconnect was
determined to be down, so B couldn't see A halting*1
c. A is halted at this point
d. Cf takeover –f failed, because of the version
mismatch*2
e. Cf forcetakeover succeeded
f. Set A's
boolean to false
g. Cf giveback failed because the interconnect was
determined to be down. *3
h. Cf giveback -f succeeded.
3. All
appears stable, interconnect is up.
Notes:
*1 “Partner
is not UP, NDU Takeover Terminated”
*2 “cf:
takeover cannot be performed because of reason (interconnect error)”
*3 “cf
monitor all” attached
What we found out:
There is a Boolean
env variable that tells each controller whether it’s sharing the chassis with
another controller, which is called a “CC” configuration (true = yes, CC config).
The cool thing about this variable is that ONTAP will automatically set it to
the correct value in two cases:
- 1. Any time the system is in CC configuration, ONTAP will set the correct value itself (true).
- 2. Any time the system is in CI configuration (i.e. an IOXM is present), ONTAP will set the correct value itself (false).
- 3. For all other configurations, ONTAP will not change the value.
Conclusion: The upgrade/cf takeover -n didn't contribute. There is still a viable path for a non-disruptive plan, but
it requires a precisely timed halt and cf forcetakeover, which
isn’t without risk. Action plan below:
Part 1:
- Fail over to A
- Move B to new chassis and connect interconnect
- Set B's boolean to false
- cf giveback -f
Part 2:
- Halt A, cf forcetakeover as soon as A drops to LOADER prompt
- Set A's boolean to false
- Boot A. Interconnect should be up when node reaches 'Waiting for giveback'
- cf giveback –f
- cf should be enabled