[Talk] Re: [Linux] Virtuality

Nick Simicich talk@flux.org
Tue, 26 Sep 2006 18:33:30 -0400


On Fri, 2006-09-22 at 22:01, Kwan Lowe wrote:
> At this moment I'm installing a Fedora Core 6 Test 3 under a VMWare environment. The
> boot CDROM is actually an iso loaded on the same machine. Installation media is
> actually the FC6-DVD hosted on a virtual machine on another server and reached via a
> virtual ethernet device bound to the wireless card on the laptop. The installation
> GUI is done via VNC using the VNCViewer client on another machine -- which is in
> turn an Ubuntu image running under VMWare on a WinXP laptop. Nothing's "real".
> 
> The most interesting thing I've found since working with virtual machines is how it
> enforces my backup policy. Because of the sheer number of images (at last count,
> over thirty distinct installations) I've had to centralize authentication and
> storage via LDAP and automounted home directories. At this point I can build out an
> image in a matter of moments (well, moments to setup and twenty minutes to wait for
> completion) and have all my data available whenever. If a web server goes down, it
> can be restored on another machine in a matter of moments. Faultless failover is not
> setup yet, but this is mainly for lack of physical machines to host the images.

Of course, IBM invented this sort of virtual machine back in 1967 in the
operating system CP-67 and, while the 67 was not widely installed, all
of the mainframes of the 370 era (late 1970's or early 1980's forward)
had the ability to run virtual machines using the VM/370 monitor - the
monitor was relatively lightweight - you could actually get more
timesharing users on the same hardware by giving them a virtual machine
than you could by running TSO under MVS.  CMS was an "operating system"
that a user could run - it depended on being run under VM/370 - it could
not be booted on real, bare hardware.  

VM had an advantage over MVS - it was open source - distributed in
source, and you could build from source - patches were distributed as
source "diffs" that allowed for user patches (as long as there were
actually no user changes in the patches area - in which case you had to
refit your changes).  This meant that you could make modifications. For
a lightly loaded server like ours (100+ users active in a 5 minute
monitoring in 8 meg of real storage) we had to schedule reboots because
the operators were not comfortable when a mainframe was not booted once
a month or so - we finally started booting once a week.

The people who were big VM  believers talked about the advantages to the
technology - and I honestly thought that there was a chance, given the
cheapness of the current hardware technology (it is one thing to run a
hundred users on a system where the mainframe costs hundreds of
thousands of dollars, and another where the seat price for equivalent
tech would be under $500) that the VM tech was dead.

Then a post like Kwan's basically talks about the tech in such a way
that it remakes a lot of the arguments that the old VM believers made.
It is probably a great advantage to have installed LDAP, automounted
home dirs and the like, for many environments. Having to centralize that
stuff because you have a lot of images is both an impact and an
advantage of having a lot of images.

I was able to debug MVS in ways that the mainframe types could not - and
while I did not have the resources for them to run extra images during
prime time, they were able to swing a couple of strings of DASD to my
side and run virtual machines on their own DASD graveyard shift - and
not compromise the security of the rest of my users. They could run
tests with trapping and tracing in their OS code using the VM monitor,
and find bugs in the virtual environment that would have been almost
impossible to find in the real world.

I've moved my answer to talk - because the only Linux point here is that
the more things change the more they stay the same - and that this is a
well proven, old technology - it is just "new to you". And that source
maintained operating systems are better than OCO (object code only).

One interesting thing was that while I was working on VM, we moved from
a 370/168 to a 4341 - we had to keep paying for our lease on the 370/168
- and we actually packed it for shipment and left it on the computer
room floor. Then we installed the 4341 and paid for it - it was
essentially the same speed machine (although the 4341 had much better
floating point) and why did we do this?  The difference in power
consumption and air conditioning paid for the new machine and the salary
of another sysadmin which got assigned to the project I was managing.
And, the 4341 hardware was so much more reliable than the 370/168
hardware that we went from about 1 hardware related boot (mostly memory
failures - I remember when it seemed like you only got a couple bits per
chip - and the processes were so bad that they would build the chips so
that they would "flip" - if they did not work one way they might work
the other, the tester would mark the good end with a drop of paint
rather than a cutaway in the ceramic or plastic of the dip - we had 8
meg of memory in a box that was eight feet tall and four feet on a side
- and this was not even core memory, it was semiconductor. The memory
failed all the time (and this was not ECC, as I remember, these were one
way, non-repairable parity errors - you'd run a simple program from the
front panel (which you'd type in, in hex) and it would machine check
when it read the bad memory, then you'd configure the box to not run
with the bad half meg until the CE could be scheduled on the weekend.
This was third party memory.

The 4341 put 8 megs of memory on a single board - it had a buss that you
might recognize, although the cards were huge by today's standards - and
we ran for months with a bad spot in the board until one of the CE's
noted that we were always faulting when we accessed a particular word -
but this was ECC so the fault would be repaired and it would not
actually cause anything to fail, the program running would keep running.
We finally got them to replace it after some discussion - they wanted to
get the board replacement into the next fiscal year of our service
contract and I wanted to proceed since another failure in the same area
would overcome the ECC - and we had the choice of going third party,
yadda yadda - we agreed on a month's delay, which allowed them to put it
in a different class of non-emergency part and made it cheaper for
them.  Everything comes down to dollars.

-- 
Blog: http://majordomo.squawk.com/njs/blog/blogger.html
Atom: http://majordomo.squawk.com/njs/blog/atom.xml
RSS: http://majordomo.squawk.com/njs/blog/atom.rdf