Scott's Column
Server problems plague digital printing, and other details

July 1, 2002
By Scott Lewis

June was without a doubt one of the busiest months of my life. I finished the swimming pool, and put in more hours of overtime than I have in a long time. We missed our July 1 deadline for putting digital printing in production. And wait till you here the story why.

Server Crashes

We had a fail over of our production Oracle server. It is a clustered server and one node failed over to the other. Oracle ran perfectly. It did exactly what it was supposed to do. A few manual processes had to be performed to get everything running. The FTP service could not be started so we moved that to another server temporarily.

Then the server crashed while we were trying to determine what went wrong. Fortunately we had shut down the database and we did not lose any data. But the server was in a pretty bad state. We were getting a report of a corrupt system file on both nodes.

Development to the rescue

I put my development server into production so we could take the production cluster offline to perform diagnostics and repairs. Without a development server I was not able to work on the digital printing project, which was so far along I was soft and hard proofing from Oracle data. This was the last straw in my ability to out work a series of mishaps that have been delaying me for a while. I finally told the project leads that I would not be able to make the July 1 deadline. My boss told me during the migration to the development server that he already told the business side that I would not make July 1. He told them two weeks before I did. Whoa!

He was leveling their expectations. I was pretty confident that I could overcome all the obstacle I was having, such as the controller being three weeks late and taking three more days to get printing, as well as discovering a bug in the software we are using to generate our Postscript output. The fix for that came about the same time as the controller came online. In case you need to know what a controller is... every printer has a controller. If you have been dealing with mostly popular laser and inkjet printers the controller is in the printer itself. Some printers can only work under Windows. These use your PC for some or all of the controller functions. The professional printers we use require dedicated workstations that are the controllers. Our controller is a beefy Sun Workstation. It basically queues print jobs and interprets the postscript into the dots the printer is capable of printing.

Back to the point. My development server was moved into production on a Friday night. Microsoft and Hewlett Packard (it is a HP server running MS Windows NT 4.0) both looked over the server over that weekend. Microsoft fixed the corrupt file. But HP couldn't find anything wrong. They gave their blessing to move the server into production. We didn't like that they had no answer to why the machine failed, so that Sunday night we rigged it up to be a standby server. We setup the development box to generate a log file (which contains all the changes to Oracle) every 10 minutes, and copied those across the LAN to the HP server and applied those changes. If the development box went down, we could recover the database to within 10 minutes with a minimal lose in data.

Sure enough Monday the HP failed again. Good thing we didn't put it back into production. Meanwhile my development work was suffering. I scrounged up a Compaq iPaq PC with a 20 GB drive and 384 MB RAM. Good enough. I don't even know how fast its CPU is. I don't care. I can make it run with Oracle and store my data.

HP loaned us a server on Tuesday, while they took down our production server to check it out. I scrambled to build two Oracle boxes. A temporary box that we may or may not move into production, and a little PC (I mean LITTLE, this iPaq is a legacy free workstation with no floppy, a CD-ROM that looks like a laptop drive, and little to no ports). It is time consuming to get Oracle running from scratch. The biggest hurdle is jockeying 15+ GB worth of files across a network.

On Wednesday and Thursday they rebuilt the cluster with Windows 2000 Server (or Advanced Server, I don't know and don't care which). I worked with this HP guy. This guy really knew his stuff. He found a number of things wrong with out configuration. He could not tell us for sure what caused our problems, but he was surprised we did have more. I worked with him to get Oracle running on the cluster.

To test the system, and provide my boss with reassurance that it would not fail, the HP guy randomly pulled drives out of the mirrored array. They rebuilt once re plugged in. Oracle never even noticed. The HP guy also ran a utility that forced the nodes to fail. Again all stayed up. Finally he unplugged the two of the four controllers to the drive arrays. This essentially failed half of the entire array and it still ran smoothly. All the drives rebuilt when the controller were turned back on.

All seems well. We haven't put it back into production as of this writing, but it will any day.

The Pool

It only took me 3 weeks to build the pool. I had some help from one of my best friends, my brother-in-law and my wife. I did about 80 percent of the work myself. It was just too time consuming to horde help. I broke my toe when I dropped a fully assembled post on it. Ouch! And that was the first part of the pool assembly. I still had a weeks and a half of work. So my poor little toe was never given a chance to heal.

The results we well worth the time and effort. My wife and the kids have enjoyed the pool so much that my wife has almost completely forgotten how bad the yard looks. Next year I hope to build a deck between the house and the pool. The back porch is significantly above the top of the pool, so I want a three level deck. One level will be even with the porch and that's where we will eventually put a hot tub. The next level will drop down some and have a bar-b-que grill and table and chairs for outdoor dining. The third level will drop down to the top of the pool for lounge chairs and such.

My father asked me to send pictures showing the elevation of the house to the pool. He wants to take a stab at designing something. Plus he might be able to come down from New York to help build it. I am hoping to do this next summer... if we can afford it.


Also this month I bought my project car, a 1967 Camaro RS Convertible. I picked it up on June 1st. The day after I drove it to work for the first time my the air conditioning compressor went out in my 93 Camaro Z-28. I think he was jealous of his long lost brother and the attention he was getting. I replaced the compressor and the evaporator on the last day of the month. I still have to meet up with my mechanic to charge it up, so expect more details next month. 


That about covers all I have time to type into words. I helped a friend scrape linoleum in preparation to having tile put down in his new house. I will probably be helping him move. After all he helped me move three times and then some. Payback can be hell. I am so busy at work I just don't have time for much else. 

Until next time...