Saturday, May 12, 2012

Oracle GoldenGate script fail...or how fast can you re-sync that database?

So a planned hot swap of version 10 to version 11 of GoldenGate was a colossal failure. In fact the pieces are still being collected, catalogued and rebuilt.

It stems from a migration script that relied on nowaited IO to perform a "hot swap" of the logger object, not unlike the techniques used by Base24 to perform a "refresh" operation. However a TACL script is often vulnerable to failures and not reporting them.

The result, half of the the objects for the logger processes were still version 10 and the other half were version 11, and will clear incompatibilities, discovered after updating the manager process the perfect storm for replication disaster ensued.

The recovery required stopping all objects, including loggers, and returning the configuration to version 10. At this point undocumented changes to logtrails were discovered and a variety of fatal logging errors were reporting to EMS at an unprecedented rate.

Several calls and cries for help from Oracle resulted in argumentative and pointed questions regarding the version installed, the number of loggers configured and why we as the customer had so many files being replicated, truly not a stellar moment for Oracle support. I was pretty much on my own to cleanup this mess and so far, a database that has been in sync for about 3 years now is still in the process of being re-synced and validated all because one script indicated that the command had been received and processed normally, as was reported on three previous attempts at the same version to version update using he same script.

Several thousand files later we are only at about 90% of files replicated and not completely validated using their Veridata product.

More to report later.


- Posted using BlogPress from my iPad

No comments:

Post a Comment