Had an interesting situation
after an option 4 on a new CDOT system (3250’s), this is a bit long but I
wanted to get it all onto paper and into our tribal knowledge. While one controller was still clearing, I started
configuring the other one (set HA true, set to switchless, update from 82p3 to
82p5). Then I updated the other
controller. When the other system tried
to join the cluster, I saw this:
Error: Node
"cluster-04" on ring "Management" is offline. Check the
health of the cluster using the "cluster show" command. For further
assistance, contact support personnel.
Well, ok. A little
checking:
cluster::> cluster
show
Node Health Eligibility
---------------------
------- ------------
cluster-01 true
true
cluster-04 true
true
Warning: Cluster HA
has not been configured. Cluster HA must be configured on a
two-node cluster to ensure data access
availability in the event of
storage failover. Use the
"cluster ha modify -configured true" command
to configure cluster HA.
2 entries were
displayed.
cluster::> node
show
Node Health Eligibility Uptime Model Owner
Location
--------- ------
----------- ------------- ----------- -------- ---------------
cluster-01
false
true 00:26:23.001
FAS3250 Minneapolis DR Site
cluster-04
false
true 00:09:39.043 FAS3250
Warning: Cluster HA
has not been configured. Cluster HA must be configured on a
two-node cluster to ensure data access
availability in the event of
storage failover. Use the
"cluster ha modify -configured true" command
to configure cluster HA.
2 entries were
displayed.
Well, then let’s modify cluster HA.
cluster::> cluster
ha modify -configured true
Warning: High
Availability (HA) configuration for cluster services requires
that both SFO storage failover and SFO
auto-giveback be enabled. These
actions will be performed if
necessary.
Do you want to
continue? {y|n}: y
Error: command failed:
Not enough online nodes in the cluster:
SL_REMOVE_EPSILON_OOQ_ERROR (code 129)
There are too few healthy nodes in the
cluster to allow join of
additional nodes. Ensure that the nodes
are operational and re-issue the
command. Use the "cluster
show" command on a node in the target cluster
to view the state of the cluster.
Well sheesh. So I
reboot and what happens? A takeover.
Jan 22 14:53:28
[msp-cluster-04:callhome.sfo.takeover:CRITICAL]: Call home for CONTROLLER
TAKEOVER COMPLETE AUTOMATIC
Jan 22 14:53:28
[msp-cluster-04:callhome.reboot.takeover:error]: Call home for PARTNER REBOOT
(CONTROLLER TAKEOVER)
But the system doesn’t think it was taken over, or that
it’s in HA mode.
cluster::>
storage failover show-giveback
Partner
Node Aggregate Giveback Status
--------------
----------------- ---------------------------------------------
Warning: Unable to
list entries on node cluster-01. RPC: Port mapper
failure - RPC: Timed out
cluster::cluster
ha> modify -configured true
Warning: High
Availability (HA) configuration for cluster services requires
that both SFO storage failover and SFO
auto-giveback be enabled. These
actions will be performed if
necessary.
Do you want to
continue? {y|n}: y
Error: command failed:
Could not enable auto-sendhome on partner node: Failed
to set option cf.giveback.auto.enable.
Reason: 169.254.97.26 is not
healthy.
After another reboot I turned off and on HA, then everything
cleared up and TO/GB’s were working perfectly.
cluster::> cluster
ha modify -configured true
Warning: High
Availability (HA) configuration for cluster services requires
that both SFO storage failover and SFO
auto-giveback be enabled. These
actions will be performed if
necessary.
Do you want to
continue? {y|n}: y
Notice: HA is
configured in management.
So on to the next problem: one of the vol0’s isn’t being
recognized (click for larger image).
After a lot of searching, I found this magical solution.
Curiously, the
vol0 is referred to in the excerpt above as a “7-Mode volume.” But
both vol0’s are, and there’s no way to change it. The word from other engineers is that this is
correct.
cluster::> vol
show -is-cluster-volume -is-cluster-volume false
(volume show)
Vserver
Volume Aggregate
State Type
Size Available Used%
--------- ------------
------------ ---------- ---- ---------- ---------- -----
cluster-01 vol0
aggr0_msp_cluster_01 online
RW 330GB
310.8GB 5%
cluster-02 vol0
aggr0_msp_cluster_02 online
RW 330GB
310.8GB 5%
2 entries were
displayed.
Lastly, I needed to move one of the vol0’s. I used this link to move the vol0 over to a
new aggregate:
https://kb.netapp.com/support/index?page=content&id=1013762&actp=search&viewlocale=en_US&searchid=1390428133178
Thanks for making it all the way to the end with me!