Tuesday, December 18, 2012

Disaster Guidelines for my department.


Current Disaster Guidelines for my company, I think it's a bit 'thin'....


EMERGENCY MANAGEMENT DISASTER BOX
(This is only a basic guideline)
1. Copy of Emergency Management Plan
2. Copy of department Business Continuity Plan
3. Telephone numbers of personnel
4. Map of the work area, with exits, fire extinguishers
5. Flashlights and batteries
6. Portable, battery operated radio
7. A small first aid kit
8. Felt tip markers, pens, and pencils
9. Large gallon-sized zip-lock bags
10. A bag of hard candy
11. Any other items you may need for your area
12.   Industrial Gloves
13. Duct tape
14.    Basic hand tools (pliers, screw drivers, hammer etc.)
15.   Crank- cell phone charger
Note: Copies of plans, or anything paper should be encased in plastic page protectors. Every department should have one-day supply of water for their staff.


I'm guessing the duct tape is used to put over people's mouths.


Sunday, May 13, 2012

Do you Klout?

I'm still trying to figure out how this works, but Klout seems to have raised my score since rejoining yesterday. Of course I linked to several social media accounts as an experiment to see what has influence.
Watching and waiting.
- Posted using BlogPress from my iPad

Saturday, May 12, 2012

Oracle GoldenGate script fail...or how fast can you re-sync that database?

So a planned hot swap of version 10 to version 11 of GoldenGate was a colossal failure. In fact the pieces are still being collected, catalogued and rebuilt.

It stems from a migration script that relied on nowaited IO to perform a "hot swap" of the logger object, not unlike the techniques used by Base24 to perform a "refresh" operation. However a TACL script is often vulnerable to failures and not reporting them.

The result, half of the the objects for the logger processes were still version 10 and the other half were version 11, and will clear incompatibilities, discovered after updating the manager process the perfect storm for replication disaster ensued.

The recovery required stopping all objects, including loggers, and returning the configuration to version 10. At this point undocumented changes to logtrails were discovered and a variety of fatal logging errors were reporting to EMS at an unprecedented rate.

Several calls and cries for help from Oracle resulted in argumentative and pointed questions regarding the version installed, the number of loggers configured and why we as the customer had so many files being replicated, truly not a stellar moment for Oracle support. I was pretty much on my own to cleanup this mess and so far, a database that has been in sync for about 3 years now is still in the process of being re-synced and validated all because one script indicated that the command had been received and processed normally, as was reported on three previous attempts at the same version to version update using he same script.

Several thousand files later we are only at about 90% of files replicated and not completely validated using their Veridata product.

More to report later.


- Posted using BlogPress from my iPad

Thursday, March 22, 2012

HP Discover

Another year rolls around and HP Discover 2012 approaching. While its uncertain if I'll be attending, for reasons I won't wallow in, I will reflect on some of the highlights and concerns from last years event.

Certainly stealing the show was Paul McCartney and a performance that left me a new fan, yes I'm sort of doing this backwards chronologically.

Stepping back again, the venue, The Sands Conference center was all together very good, although chasing sessions around at the last minute and trying to get into rooms that I had signed up for was a bit difficult. It seemed information about session changes was tough to come by, even at a technology based event. I was disappointed more than once over having sessions moved, canceled or outright different from what was on the schedule. All of these were HP mainstream labs or presentations, none were HP NonStop sessions, Disappointing even more because of the limited nature of the offerings.

The HP Certified Professional folks, now HP ExpertOne, were excellent as always. Providing a great reception suite, news and information about the program going forward. Rich Gossman and his team were on target and provided a great experience for HP ExpertOne professionals. The only unfortunate blemish involved on-site testing for credentials, despite signing up for testing prior to the event, the testing team seemed overwhelmed, the suite too small the registration process was cumbersome and the whole testing process seemed very difficult compared to previous years. To be fair changed have been made to this process going forward, so improvements should be realized.

General sessions were glitzy and splashy as expected, but well attended and organized. The usual offerings from the now departed former CEO were of the kind level one would expect for a show this size. Little to nothing was mentioned about NonStop. Typical though, from a company concerned with volume.

The exhibits and tradeshow, while impressive from an HP viewpoint, included things like portable datacenter pods, the usual enormous displays from Microsoft loads of color and vast swaths of products targeting the majority of the PC and printer lines, as expected. The NonStop area seemed to be relegated to the ugly stepchild kiosks with little visibility and it took some time to locate in the sea of servers. True, it's a smaller volume of HP's business and granted many of these vendors are not interested in toting their tradeshow exhibits to Las Vegas at their cost, only to answer questions and hand out give-aways to a Linux or Windows professional about NonStop when they're really looking to speak with NonStop people. We've come a long way from the usual ITUG events, Regional User Groups and the future still seems uncertain. Especially with CEO's shuffling chairs, news of divisions consolidating within HP and we're left wondering what is HP doing with NonStop?

I'm not sure there is still value in large events like HP Discover, considering the cost, but I do appreciate the offer of $300 bucks off the registration fee for being a Connect member but I already get that for being an HP ExpertOne too and no doubt I'll get a "special" registration code from our HP Account team to register. This leaves me wondering what the real value is or if there is any value left.

The most important part of these events, to me, is seeing friends and colleagues and networking, sharing and comparing. This is true for NonStop product vendors and VARs. I always enjoy talking to folks about new products and ideas. A smaller event may fill that need? Perhaps, the NonStop Symposium in San Jose was apparently well received and attended, but no mention of any further events like that one.

Yes, we all secretly hope HP will spin-off NonStop, perhaps even call it Tandem again. But we all know that's not going to happen with a company like HP. We can hope that HP will someday realize the value of NonStop and it can come out from the shadows and smudges of printer ink.

-K

- Posted using BlogPress from my iPad

Tuesday, March 20, 2012

Availability is about BEING available

Another excellent resource focusing on the practices of Availability and what this means is The Availability Digest by Dr. Bill Highleyman.

His site focuses on the processes and practices, not a platform. 

Although many Tandem/NonStop users know Bill as the guy who wrote the book, literally, on Performance Analysis and Tuning for OLTP systems, his digest gives food for thought (no pun here) about the process of establishing and maintaining availability as well as some pretty interesting "never again" horror stories.

I invite you to take a look at http://availabilitydigest.com/ and sign up for his digest if you see something that sparks your interest there.

I've provided a link from this page, on the right side control bar, to this and other sites of interest. 

-K

Good read for NonStop Users and Mission Critical Fans


A quick post to invite you to take a look at well established blog by a good friend Richard Buckle, longtime Tandem Evangelist and Founder of Pyalla Technologies.

I had the opportunity to serve on the Board of Directors for the International Tandem Users Group, (or ITUG as many know it's former life, now called Connect) with Richard and his wife Margo. I learned a great deal from Richard and Margo and appreciate the opportunities we've shared.

Richard has some excellent writing about the past, present and future of NonStop and Mission Critical Computing.

His blog is at http://itug-connection.blogspot.com/ , and provides valuable insight on mission-critical computing.

-K


Switch and Stay on the new Database - Loading files using GoldenGate

The advantage of having scripts to execute the switch to new files online cannot be overstated. Manually typing the commands and queries to stop, rename and restart processes accessing the files really opens the process up to errors.

These files are generally accessed via database server processes, either TS/MP Pathway managed or the occasional individual generating reports with data from the files. This gave us the ability to 'interrupt' file activity with a relatively small impact.

The high level steps of the script involved, stopping database servers, stopping any ad hoc processes, renaming the current primary file to old primary file, renaming the new primary to the online primary file. Decision to leave the new alternate key file with its created name was so that fewer steps would be involved and since the alternate keys map from the file system it would not be a problem going forward. If  needed the ALTER of the alternate key file could be performed later at a planned downtime.

Keep in mind the intent here was to minimize impact to database access, add new alternate key functionality for application use and avoid a planned lengthy system downtime to load and switch to new primary and alternate key files.

The switch process for each set of database files took approximately 5-10 seconds using the script. At this point the new primary file with associated alternate keys was swapped in and the database server recovered. Similar but not exactly like a Base24 log rollover.

The current GoldenGate configuration also facilitated the swap on the Target site as it executed file renames with the GETFILEOPS parameter enabled on the Target. Essentially keeping us in sync across source system and target.

After all files were switched and the database functions were recovered, it was necessary to stop and delete the temporary Extractor and Replicator processes that had been configured to load the new files, they were no longer needed.

Then, a Veridata compare configured to evaluate the new files indicated the New Source and New Target were in Sync, except for one missing 107 records. To recover from this, the Extractor was repositioned on the Extract Trail to a point just a few minutes prior to the swap and restarted with HANDLECOLLISIONS ON. The records were applied and the file was checked again and found to be in sync across Source System and Target System.

Considering time required, the Initial file loads were performed 2 weeks prior to the switch. This enabled appropriate notification, change management and design of switch scripting. The loads were all performed online, during normal working hours and validated without impact to the user community. The file switches were performed without perceived impact from the user community, during normal business hours and using automated scripts in a few seconds.

"Your actual mileage may vary...."

Typically in past operations like these, it required a system downtime, use of FUP to CREATE, LOAD and LOADALTFILE during the outage. Considering the size and number of alternate keys to be added to theses database files the entire process would have had to be either an extremely lengthy downtime or several downtimes to perform.

By using GoldenGate to Load and synchronize the files, the majority of the effort could be focused on distilling the switch scripts to the smallest possible window and time dedicated to validating file integrity before and after the switch.

Again, your conditions and environment may vary, but reducing impact, eliminating a downtime and providing 100% record integrity before and after made GoldenGate an excellent choice for performing an online alteration to these database files.

If you'd like to know more, have questions, comments or I haven't covered something here, drop me a note.

Now I'm off to find another topic of NonStop interest.

-K