Good afternoon we saw a strange issue on our 23 node cluster this morning.
The Cluster was restarted last night as part of an ongoing maintenance issue, when all the Nodes came back up things looked like they were running normally. However on Node 1, there was an issue when you performed an rcndsd status you were getting the following error, rcndsd status dead
When we did rcndsd start it would look like ndsd was running, but in reality the service was dead. Following TID 7010298 DDCGetServerName failed: -6038 in eDirectory we found that the error code matched what we were seeing on the menu screen
running an ndsrepair -T from the Master replica server did not reveal any errors, actually was able to contact a2c1s01 as if ndsd was actually running, which we know was not. Further using the df -h the "/" volume was actually at 21% filled, basically over 200 GB free.
The two volumes that were on that Node would not allow you to access them, the Novell login screen would pop up asking you to login which would fail, in the end we restarted the Node, but were confused why 1.) ndsrepair -T revealed no errors when ran from the Master replica server, 2.) df -h showed plenty of space,