September 1, 2014
By Scott Lewis
This month we are going to take a break from the photography
business. I ran out of capital to put into the business and will be
waiting for an insurance reimbursement check to resume the business.
So you are not left empty handed, I will describe the server upgrades I have been involved in.
The last week of July we performed a server upgrade. We were replacing a server built in 2002. This server's purpose was to compose our Postscript output for printing. Company-wide we are moving more and more toward virtual servers. So the hardware is a massive machine that can support many servers. It is a little beyond my understanding (the hardware) but the principle is the same as having virtual machines on a desktop computer. You can run multiple operating systems, each with their own "chunk" of disc space and memory. Obviously the actual hardware must be very robust to have multiple servers running on it concurrently.
For me, although I was only replacing 1 physical server with 1 virtual server, the project was much more involved than that. The new server was going to be running Windows Server 2008R 64 bit. All the other servers I am responsible for are 32 bit (and a mix of 2003 & 2008). I know... if the server we replaced was built in 2002, how did it have Windows Server 2003 on it. Because some time ago it was upgraded (as an in place upgrade) from its original install.
You cannot do an in place upgrade from 32 bit to 64 bit. And we are trying to go the virtual server route where we can.
The biggest problem was with our queuing software. This directs jobs to run on one of 4 composition servers. The queuing software that can run on a 64 bit machine is not compatible with the version we were running. So, this means that we had to upgrade the queuing software on 3 other composition servers, and 1 control server (that runs the core scheduling of the queue software). So we were going to be touching 5 servers in one weekend.
Oh, but it gets better. The queuing software has a .Net component that 8 server application, and 4 client applications were compiled with. So we also had to upgrade all these apps on the same weekend.
It actually went quite well. We only missed a few configuration settings, and missed a copy of important files. These were easily corrected on the spot. We started the upgrade on Saturday morning. We stopped all the services that generated output. Then we initiated a 5 hour file copy of the data files needed from the old server to the new server. We then upgraded the queuing software on all 5 servers, and 4 PCs. Next up was to upgrade the 8 server apps, and the 3 client apps on those 4 PCs.
Then we went home.
On Sunday we came in again, including people in the plant to test printing output from all this new software/hardware. It took exactly how much time we had allotted for the upgrade. I estimated 3 hours on Saturday, and 3 hours on Sunday. We did indeed work from 9-12 those two days.
I planned this upgrade for so long, and went over it and over it so many times. I really did not want to take a chance on having to rollback. All that preparation paid off.
So 4 weeks later we replaced another 12 year old server with another virtual server that was "cloned" from the server we put into production in July. This went even smoother since all we had to do was stop software, initiate a 5 hour file copy (we did this remotely without coming into the building). Then on Sunday we came in to do some quick config changes and fired up the software and tested printing. All went super smooth. All of 2.5 hours time over both days.
Well, we have 2 more composition servers to replace. These are newer servers (about 3 years old), but we are going to match them to the virtual ones we put into production. However, they are the low priority.
The current server priority is our main "control" server. It runs so much software I have no idea how I am going to be able to build its replacement. Too many areas have added software and such to this server that I don't know about. Though I do know more about this server than any other individual.
The control server is the scheduler that all composition work goes through to be put on its appropriate composition server. It runs those 8 apps we upgrade above. Plus this server acts as a repository for all incoming files from our web sites and other places. it is also a file share for may areas. It will be quite an effort to rebuild it. The fun part is we get to do it twice. See, first we will build the test version of this server. We will document every step it takes to build the test server, making corrections to the documentation as we get everything working. Once the test server is running successfully we will repeat those steps on a clean virtual server for the production server.
For this particular server I have an issue with this philosophy. I understand the principle. Build the test server and make your mistakes and correct them in the instructions. We don't want to clone the test server which my be less stable because of those mistakes. Uninstalling and reinstalling software to get it to work. I get it. But we are talking so many moving parts (in software) that I can't imagine that the instructions will be perfect. Since the test server has different rights within out network (for instance, it is exempt from the same firewall rules as a production server) there is no fully accurate way to test all the external access the production server will see. Which means even if we test the test server properly, it won't directly relate to production. Add to that, many of the processes that talk to outside servers (through the internet and our own internal network), don't have self contained test environments. So even if we could test... it would be manual testing that our test server can reach someone else's product server. Hardly an accurate test.
Additionally, things may not work in the production server until we
actually start enforcing firewall rules and such. So we could end up
uninstalling and reinstalling software to make the production server
work anyway. And in doing so we would be no better off than if we took a
test server that is running stable and cloned it and only deal with the
configuration changes between the 2 servers.
I will start working on the test server as soon as they provide me the virtual servers. I will install everything I know about and plug it into our test environment. That will be fun (that was sarcasm). At least there is a big push from the business to get this done. SO I will be given ample time to work on it, and that should put pressure on other areas to help where they need to.
I'll let you know how it goes.