Really very interesting things have happened lately. I had a shelf add that kicked out a ridiculous amount of errors for one disk on the new shelf:
disk.senseError:error]: Disk 2d.53: op 0x28:0000a3e8:0018 sector 0 SCSI:hardware error - (4 44 0 3)
diskown.RescanMessageFailed:warning]: Could not send rescan message to eg-naslowpc-h01. Please type disk show on the console for it to scan the newly inserted disks.
diskown.errorReadingOwnership:warning]: error 46 (disk condition triggered maintenance testing) while reading ownership on disk 2d.53
Disk 2d.53: op 0x28:0000a3f0:0008 sector 0 SCSI:hardware error - (4 44 0 3)
diskown.AutoAssignProblem:warning]: Auto-assign failed for disk 2d.53
The weird thing was that the messages just continued to loop rather than just fail the disk. We swapped a new disk into that slot, and the old disk into a different slot to see if the disk was bad: turns out, the slot is bad.
We also tried reseating shelf Module B on that shelf. NetApp Support informed me that "Module A handles communication to the even numbered disks by default, and Module B the odd disks." I don't think this is true.
We're working with the customer to find a good resolution for this. Since downtime is difficult to accomplish, we may try to swap out the shelf chassis while the system is running. We'll see :-)
disk.senseError:error]: Disk 2d.53: op 0x28:0000a3e8:0018 sector 0 SCSI:hardware error - (4 44 0 3)
diskown.RescanMessageFailed:warning]: Could not send rescan message to eg-naslowpc-h01. Please type disk show on the console for it to scan the newly inserted disks.
diskown.errorReadingOwnership:warning]: error 46 (disk condition triggered maintenance testing) while reading ownership on disk 2d.53
Disk 2d.53: op 0x28:0000a3f0:0008 sector 0 SCSI:hardware error - (4 44 0 3)
diskown.AutoAssignProblem:warning]: Auto-assign failed for disk 2d.53
The weird thing was that the messages just continued to loop rather than just fail the disk. We swapped a new disk into that slot, and the old disk into a different slot to see if the disk was bad: turns out, the slot is bad.
We also tried reseating shelf Module B on that shelf. NetApp Support informed me that "Module A handles communication to the even numbered disks by default, and Module B the odd disks." I don't think this is true.
We're working with the customer to find a good resolution for this. Since downtime is difficult to accomplish, we may try to swap out the shelf chassis while the system is running. We'll see :-)
No comments:
Post a Comment