Hi Everyone,

I wanted to share information about a problem we ran into last night,
and how we fixed it (with the help of opening an SR with Novell). We
have a mixed environment of 881 and 886 servers. (We are migrating from
the 881 servers and should be done in a month.) Last night we were
adding replicas to one of the new 886 servers. Everything went as
expected. After the last replica turned On, I started going to each of
the 20 servers in the tree to check sync status, and just making sure
there were no problems (not only because that's a good idea, but 881 is,
well, 881.)

All of a sudden on one server, while running ndsrepair -E I started
seeing a -699 Remote error. Having never seen this before, I went to
support.novell.com for information on what it was, if it was cosmetic or
a problem, and what to do about it. I looked around for maybe three
minutes, then went back to the server and ran ndsrepair -E again. Now
there were about 40 reports of -699 Remote. To make a long story short,
ultimately I ended up with a few hundred 699 Remote errors in report
sync status. THis was happening on every server in the tree.
Additionally, on the remote server that initially showed up with the 699
error, I was seeing -755 errors in ndstrace ("Verification Failed").
This was another new one on me. Lastly, other servers report -625 errors
in communication between the affected servers.

Keeping the thought of making a long story short, I called Novell for
help. What we found was that one of the servers (the one that initially
was the remote server that was showing the -699 error) had for whatever
reason, assigned its own ip address for tcp and udp as values in the
Network Address attribute of all the rest of the servers in the tree (in
its own dib), as being reported in iMonitor. The rest of the servers in
the tree were fine (with regards to how they were seeing the Network
Address attribute values for the other servers in the tree), although
they were all getting more and more -699 Remote errors in report sync
status. And of course, nothing would sync due to this bad value in the
Network Address attribute on the one server.

This tree was set up without SLP. The tree is used only for
authorization via LDAP queries. Every server holds a copy of any
partition that users or consumers would query against, so while I
personally would like to see SLP running, it's worked fine this way for
six or seven years.

To fix this, what we ended up doing was to put a hosts.nds file in the
same directory as nds.conf. The file looked very similar to a
resolv.conf file. For instance:

FS1 192.168.22.23:8524
FS2 192.168.22.24:9524
FS3 192.168.22.25:10524

where the first column is the names of the servers; the second column
is the ip address of the server and the port where ndsd is running.
After populating this file, we ran ndsrepair -N, and repaired all
network addresses. We only had to do this on the one server that
initially was being reported with the -699 Remote (the server that had
the extra value on the Network Address attribute for the rest of the
servers in its own dib), and the problem started clearing itself up.
However we went ahead and performed the same steps in the rest of the
servers in the tree.

I wanted to share this because I didn't find anything that was very
helpful in searching novell.com or Google. There were some TIDs and
suggestions, however they were rather complicated, and most carried
warnings that performing the steps could actually do more harm and
potentially make things worse. The steps I've outlined above though are
much more safe.

I hope this may help someone else out in the future.

Take care,
Sam


--
samthendsgod
------------------------------------------------------------------------
samthendsgod's Profile: http://forums.novell.com/member.php?userid=11121
View this thread: http://forums.novell.com/showthread.php?t=456593