IDM 3.5.1 engine ir3 on SLES 10. eDir 8.8.2 FT2 with LOTS of drivers
(eDir, Ad, Linux/Unix, JDBC, Work Order, Null, loopback, NXSettings,
Composer shims (3 different screen scrapers), and more!)

I have a somewhat complex driver set up (what, me? Complex? Never you
say!). Anyway, when we developed this, we ran it all on one server.

Now in production with greater load we want to split it out across three
boxes.

Thing is, I am loosing events in several different cases:

1) First box creates a WO object, and modifies a user. The other two
replicas often see the User mod and event on it, before the WO object
syncs over. This causes me pain as I write to the WO to report
success/failure after the User event is processed. What hurts is that I
get a success on the write to the WO in trace. (I.e. no errors, a
success in fact) But the WO object does not get those changes. If it
was not yet there, i should get a 601 error. If it is there I should
record the changes.

But instead I get lost events.

2) Mods to the users come across from the first box, the other box sees
them, decides this is a move user case, write to the User some notes
(two tracking attributes), which succeed. Then they move the user. But
those events on the user do not appear on the user object.

Again, no error messages, everybody seems to be writing happily and
correctly, but the events are lost! That is, two drivers, on different
boxes updated a history attribute, adding a value each, and those are
neither to be found.

3) Same case as in 2, but this time an attribute that the first driver
added (a back ref to the WO object) WAS there in the second server,
which decided to move it, but then when I look later, it is gone! (I
noticed because the first server reacts to the Move event and is looking
for the back ref attr to confirm it is to do something, but does not
find it).

This kind of looks like DS sync issues, like writes are overwriting
other writes. All boxes are on a 100 Meg if not GigE network, local
VLAN to each other, replication is fine, no errors being reported in
trace or in Audit.

Time is in sync. Very very weird. In principle if put all the drivers
(30+) on one box, I should be able to get it all to work hunky dory, but
I really want the ability to scale out a smidgen with at least 2 or 3
servers involved.

I have never seen eDir be this unreliable before, to the point I have
trouble believing my eyes when I see it in trace.

(Posting trace is somewhat pointless as I would need about 4 hours to
explain what is going on, and show you trace from 4-5 drivers for each
cases, and walk through it with you. Needless to say, no errors at all
are showing up for these events. IDM is thinking all is well).