Some long-term perspectives and multi-vendor gripes, and some general soap-boxing

My career has had some interesting side-effects on my life. In the beginning, computers could only understand upper case and as a result I got in the habit of printing, in upper case. I nearly always print, and seldom print in lower case. My handwriting is difficult to read and to do! In high school I took a typing course, one of the few classes I still use every day. But the keypunches of the era forced me to abandon using the right shift key, because a keypunch only had a left shift key, and a right un-shift key. The Cyber consoles had re-arranged the digits, destroying my already-minimal ability to touch type digits.

Over the years, various attitudes have existed between the technical staff and management. In the early 1970s, management was itself fairly technical, and a good rapport existed between the employers and employees. The assistant directors would do programming and hardware development! As time went on, and management became more detached from the work being done by the staff, this attitude became somewhat antagonistic at times because management may not be as aware of what the "workers" actually do. An example from February 1995 of this degradation came from the division director in a note announcing a video-taped panel making its visions known on future computing:

  • "I have a video of a CAUSE Current Issues Forum. The title is "Seeing Higher Education in the Year 2020." I have viewed it. It lasts 78 minutes. A bit tedious in the middle, it does address many issues relative to information technology and its influence on higher education. Furthermore, the participants are "real people" like Presidents, Provosts and Faculty, not computer techies."

Aside from the ridiculous assertion that "real people," or anybody, can predict the state of computing 25 years into the future, please note that the technical staff are not real people. (Was this an attempt at humor? If so, it didn't come across that way, as it seems to be too close to the perceived attitudes).

For another example of the rift between management and workers, refer to the story earlier on write-protecting a 9-track tape by slapping red stickers all over the write-ring slot. In this case it is a knowledge rift, demonstrating the lack of technical expertise of some managers. If they can't understand technical work, how can they manage it? Surprisingly, it can be done (I've seen it), but doesn't seem to happen very often.

Maybe we are lucky, but to my knowlege we've had only a few examples of "bad" employees. The best (?) example was an operator from either the late Philco or early Sigma era (I think it was early 1970). At the time, instructors would mark a student's grade for a class on a computer card. The cards were then punched by our keypunch operators (and verified), then fed into the system for sorting and final grading. An operator apparently was not satisfied with his grade, and managed to find his card and alter it by pasting a piece of tape over the hole and re-punched it with an "A". The instructor didn't notice this when he posted the grades on his door (he got a sheet from Registration & Records, presumably cut off the SSNs, and taped it to his door) but another student saw the grade and complained. The instructor insisted our operator had not been given an "A" but there it was on the roster. Eventually the card was pulled, the tape was spotted and the operator dismissed. In another case an operator was using the Cyber console "H" display to examine the user validation file. I thought this was rather clever, and painless because the passwords were not stored in clear text, but management decided to dump him.

A long-standing problem in computing is the repeated failure of vendors to "get it right". In this example, tape drives and drivers. In spite of the complete dominance of 1/2-inch tape for over 20 years, each time we changed vendors I found problems and shortcomings. With the Sigma, I've already described the fiasco of the poor design of their 7-track drive that triggered operating system crashes. They also made the common blunder of disallowing arbitrarily large block sizes. When we got the Cybers, I ran a test with Sigma-written tapes containing every block size from 1 byte to 32K bytes and discovered that NOS would mis-report block sizes that were a certain multiple of 512 CM words. On our VAX 11/785, the TU80 tape drive would mis-report some small block sizes up to about 14 bytes, a problem for which DEC had a known fix that they never turned into an ECO. They felt it was only worth installing on drives if the customer complained, even though the drive was on a service contract. VMS cannot read arbitrarily large blocks. It also claims to ignore all small blocks smaller than 14 bytes, which is a lie (no complaint except that the documentation is wrong). However, it will refuse to write blocks smaller than 14 bytes with the result that we can read old NOS tapes but cannot copy them! VMS' BACKUP utility loves to write its own labels instead of letting the operating system do it (very bad news), and on OpenVMS Alpha 6.1 botched the labels in some way on continuation reels so that VMS won't understand it as an ANSI tape set. (This was been fixed in VMS 6.2).

I've never seen a vendor that included a useful tape copying utility in the operating system. All utilities were special-purpose, such as being useful for labeled tapes only, or unlabeled tapes with fixed records. The big problem in tape copying is knowing when to stop. If the tape is ANSI labeled and the utility knows this, it's well defined. But if the tape is unlabeled, of unknown format and just a bunch of blocks and tape marks, the only convention worth mentioning as a de-facto standard is to stop when two tape marks in a row are found. Most utilities are not that smart.

One of the very worst tapes I ever saw, in terms of format, was the source tape for Versatec software for the Sigma. It was unlabeled, and actually began with a double tape mark (almost a universal signal that no more valid information follows). Each file was separated by a double tape mark, except the last file was simply followed by blank tape instead of even a single tape mark.

There is also the issue of multi-reel tape sets. If ANSI labeled, things are pretty clean except for determining what the "next" tape is if you don't already know in advance (NOS had a fix for this, that worked most of the time). But there are no standards for unlabeled tape sets, and whenever I hear of such, I get upset. For example, NOS could be told to switch reels on unlabeled tape sets in one of three ways:

  1. When the reflective marker is found during a write, abandon the write, switch reels, then write the block again.
  2. When the reflective marker is found during a write, complete the write, switch reels.
  3. When the reflective marker is found during a write, complete the write, notify the program, and keep writing until a write-eof is requested by the program. Then switch reels.

Of course, if you didn't know how the tape was written, you probably could not read it correctly. (This applied to NOS "Stranger" tapes, not ANSI or "Internal" tapes).

Another big blunder the industry made, aside from starting out with octal computers instead of hexadecimal, was to adopt a 7-bit communication code for storage of 8-bit bytes. This led to all sorts of problems, such as CDC claiming they supported "8 bit ASCII." Lots of vendors played tricks with the 8th bit, such as leaving it zero, leaving it 1, using 0 or 1 interchangeably, or using it to define an incompatible second set of 128 characters (e.g. MS/DOS, DEC Multinational). Then again, IBM can't even standardize its own 8-bit EBCDIC code. IBM will define a code of X'nn' as, say, a vertical bar (shift-Y on a keypunch), except that on some particular model of printer with some specific print chain it will print instead as some other character. Look at an IBM code chart and you'll find things like three different underscore characters, and several other codes that have different graphics depending on the equipment. Finally, there's the issue of non-translatable characters between ASCII and EBCDIC, such as EBCDIC's cent sign.

Speaking of ASCII, a related snafu is the "tab" character. Its use was originally to save space in files, but it critically yet silently assumes particular tab settings. On the Sigma, terminals (i.e. TTY 33) had no real tabs, so it was all done in software. The default mode was that when a user hit a tab key, the tty driver would figure out how many spaces were required to get to the next tab stop, and would send that many spaces to both the terminal, and the calling application program. Thus, tabs were never actually stored within the file (I think they called this "space insertion mode"). Now, in 1995, the problem still persists. Many PC applications will store tabs in the file, but if you carry this file to VMS (or Unix?) it will not display correctly because the tab settings are different (and there's no way for the file to declare its assumptions). Even within VMS itself, lines will display differently in some editors (e.g. EDT in line mode vs EDT in screen mode) because the tabs are not handled correctly. Moral: For Pete's sake, don't use tabs!

Vendors keep releasing editors that can't edit an arbitrarily-large file. In many cases, the limits were seen as "reasonable" at the time the software was written, but customers and/or storage advances quickly overcame it. On the Sigma, the editor would handle 16K lines max if I remember. This was an interesting editor because the files were keyed (ISAM) with the key being the line number (CP-V keyed files stored the key completely separately from the data, unlike most other implementations) but the key was too short to handle large files. But it did mean that no "work" file was needed, nor huge gobs of memory. On the Cybers, XEDIT "almost" did it right but in a few places used 17-bit "B" registers for the line number and thus was limited to 128K lines. A popular patch among NOS customers was to fix XEDIT to use 60-bit "A" registers instead, removing that limit. Later, CDC's Full Screen Editor (FSE) could handle huge files but took a very long time to read them in and build a work file first. In VMS, EDT will break if the work file is over 64K blocks, and TPU will break if the file is larger than the user's working set limit. Most DOS and PC editors will also fail if the file being edited exceeds 640K or some other memory limit. I often hit this using the Solaris vi editor with either "Tmp file too large" or "Out of register space (ugh)" or "Line too long".

Many vendors/applications can't seem to cope with empty files. On one of the early Sigma operating systems (BTM?) the SORT utility considered a file of zero length to be an error. While this *may* be the case, it depends on what's going on, but the utility certainly can't make that decision. Several sites had to beat up on XDS to convince them that sorting an empty file made perfectly good sense. Just give me an output file with all zero records sorted in order; in other words, create an empty output file if the input is empty. Let the programmer decide whether it's a problem! This came to mind just today (1995), 25 years later, noting that the MXRN news reader aborts if the input news.rc file has zero records!

One of the amusing fiascos to watch in the long term, is the constant flip-flopping on the issue of aborting jobs/processes. In the early days if a job ran away, you rebooted the system. The OS folks soon gave us a way to abort jobs, either from the operator console or, eventually, from a user terminal. Almost as quickly the users demanded a way to kill-proof their jobs, either completely or to at least allow the program to "catch" such exceptions and do whatever was necessary (which in some cases was to continue on with the task in progress). Next, the OS folks gave us a way to *really* kill a process as well as a way to merely ask for death. A good example is VMS, with the often-misunderstood distinction between control-C (please die) and control-Y (I demand that you die). Of course, the users then demanded a way to catch control-Y and do with it what they wished. Unix allows ten different levels of kill signals (the last level being uncatchable), and NOS would unconditionally kill a process that had already been killed once. In VMS, a privileged process can set a bit that says "don't kill me ever, no matter what" and I recently had such a process (Distributed Transaction Manager) hang and prevent the system from shutting down properly. On a VMSNET newsgroup, somebody was asking how to kill a kill-proof process. It's been a constant battle between the "I must not be killed" and the "I must be able to kill" factions. Of course neither is inherently right or wrong. But it has been interesting to watch things swing back and forth, with more and more layers of kill and no-kill. And of course the PC makers have re-discovered the whole issue from scratch rather than pay attention to experience.

Mainframes are dead. Small systems will take over. I've heard this since the PDP-11 was introduced, and I still don't believe it. Certainly the small desk-top system has made a large impact, especially if dropped from a height. But despite 15+ years of being told the dinosaurs of computing are outmoded, they continue to live. Just today (March 7 1995) I read that in 1994, IBM's sales of MVS systems grew by 40%. Pretty active for a corpse. As I said, small systems have their place. That place is growing, and of course today's small system has more storage and compute power than the mainframes of a few years ago. But there are tasks that simply cannot be done on a household appliance, such as payroll for a University or registration/records. At least not yet. My latest chuckle is comparing tentative plans for a PC file server for students. A multi-processor system with gobs of memory, very fast tape drive for backups, and many gigabytes of disk. Much better than the klunky old mainframe, with its dual processors, gobs of memory, IBM 3480-style tape drives, and 27 gigabytes of disk, right? Even when management tries to make their own predictions come true, "mainframe" use continues to climb.

Yet another common mistake, or maybe mistaken attitude, is the "normal error." An application or system yields an error message, but the vendor or provider says this is normal. Baloney. Aside from being a nuisance, the Normal Error tends to produce the "wolf" effect, desensitizing the viewer from real errors. It's unprofessional and tacky. My favorite example was our Digital 7620 Alpha system. If you enter TEST at the console prompt it would run a bunch of self-diagnostics, then exit. But upon exit the console software will either function correctly, hang, or explode with a "catastrophic failure" (their phrase, not mine). Because the diagnostics themselves completed normally, this error was of little concern to Digital. In other words, a "normal catastrophic failure." Sheesh.

Hey, I'm starting to sound like Andy Rooney.

A lack of standard terminal behavior also plagued the industry, though with Windows taking over, this is less important. Example: some terminals wrap long lines, some truncate long lines, many can be set one way or another. Some programs expect a truncating terminal, some expect a wrapping terminal. Some even set the terminal the way it expects, then leaves it set that way when it exits. Why is this important? Example: In VMS mail, users here frequently typed message without ever hitting the Return key because the terminal/emulator they are using happens to wrap long lines, so the user thinks WYSIWYG and keeps typing. But in fact VMS considered it a single long line because the wrapping was done by the terminal, not VMS or Mail, and when the user hits character 512 the line gets truncated with an error message. This confusion could be prevented by not wrapping.

Vendors often give you ways of setting something like a process attribute, but no way of examining it. I remember in NOS it was trivial to set a User Job Name (UJN) but darn near impossible to determine its current setting. There are several things in VMS that you can SET but not SHOW.

Before we started getting Digital-compatible terminals, my experience in this particular focus was with terminals that either wrapped or didn't. Then when we got real DEC-compatible ones, I was further annoyed to find that VT-terminals would position the cursor at column 80 after you typed the 79th character, and would then *leave it there* on top of the 80th after you typed *that* (so how do you distinguish a 79-character line from an 80-character line that ends with a blank?). This is an interesting feature but complicated the issue of supporting multiple terminal types on NOS.

For many years, we simply *had* to standardize on terminals that were compatible with the original Lear-Siegler ADM-1, because the AMS systems had been programmed to operate with those terminals. This excluded the Digital VT series, much to the chagrin of almost everybody else. When WHECN was formed, they also adopted the AMS systems as a standard, and WHECN repeatedly forced the issue of a compatible terminal type for both them and UW, even after UW shut down the last AMS. An excellent example of the tail wagging the dog. Only when we acquired the VAXcluster, and PCs became plentiful with Kermit's VT-100 emulator, and UW/WHECN divorced each other, could we standardize on a standard terminal.

In the area of serious computing, UW has seldom been a contender (depending on whether you consider a Cyber 760 a serious number cruncher, which it was just barely in 1979 when we got it). Various hand-waving has been done by directors, generally claiming that time on supercomputers was freely available at other sites, which is/was not the case. One well-placed source claims that Cray Research offered UW one of their systems, but it was rejected because of "too many strings attached." Another sign of this is UW's Institute for Scientific Computing. I contend the mere existence of this group shows that our Division isn't serving a broad enough segment of UW's population, but others have disagreed. As desktop computing power has increased, this has almost become a non-issue. We seem infatuated with networking, e-mail, and word processing, but there are still those who use computers for computing.

A second example of not meeting campus needs, at least in a specialized area, is the attempted startup of a separate organization called GRIP. This was the brainchild of the VP for research, who wanted a service organization dedicated to computer graphics. But this group was supposed to be "self sufficient" though I never got a precise definition of what this meant. The implication, as above, is that the central computing division wasn't doing its job in terms of serving the campus. Unfortunately the VP in question "quit" (i.e. got canned) early in the project, which then quickly turned into a graphics lab for a few researchers rather than a campus-wide service center.

I sometimes wonder how management views me, especially my zeal for detail (for which I was once criticized on an annual evaluation). Recently I discovered a retiree's account, where the retiree had died. This brought up the question of how to dispose of the user's files, if for example a department or relative requested access, so it was passed on to management for a proper pronouncement. I could almost hear them saying "Only Kirkpatrick would worry about dead users."

Management, while meaning well, has occasionally gotten technical jargon fouled up. This at least provides us with some humor. One person is always worried about "increasing response time" instead of decreasing (improving) it. Recently they got excited about a computerized University information kiosk, with its "touch tone screen." One manager tried to figure out how big a program could be, by dividing the page file size by the number of authorized users. Of course we often hear about "Local Area LANs."

We've repeatedly terminated specialized services because they were only used by a small number of users. While I can understand the strength of this argument, especially if dictated by economics, it worries me. I once told a director that we could save a *lot* of money by not offering any services at all, if money is the issue. But a University needs to cater to more than just the demands of the majority. For example, back in 1970 there were a few hundred people using our computers to do computing; in 1995 there are over ten thousand using them for e-mail and composing useless memos, and a few hundred people still using computers to do computing; should we therefore tell the latter to get lost because they are in the minority? The last thing we need to do is strive for mediocrity.

Time: Nobody seems to understand time, and the distinctions between local time, Universal Coordinated Time (UTC), and International Atomic Time (TAI). People get dumbfounded when you mention that UTC can have leap seconds added or subtracted up to twice a year; I understand the POSIX official stance is that this simply doesn't happen (no doubt because they could not agree on exactly how a POSIX-compliant system should handle leap seconds). Leap seconds are such a mysterious pain that some have argued they shouldn't happen at all, or they should only happen once each decade (if we can't handle it twice a year and get it right, what makes them think it will be handled correctly once a decade?). At issue are applications and operating systems where the human coders have not considered time correctly, who write things like "make xyz happen at 23:59" without realizing that 23:59 might not actually happen because a negative leap second could cause UTC to jump from 23:58 to 00:00, and other similar mistakes. If you want something to happen one second before date change, you should be able to code it that way without assuming that 23:59 is that time. And don't forget that with positive leap seconds, the time could progress from 23:59 to 23:60 to 00:00. Then there are the darn legislators who in 2005 changed when daylight saving time starts and ends, effective in 2007. How many automated systems will malfunction because the old timezone rules were hard-coded?

Spam: Everybody has sure-fire ideas on how to stop spam, but most of them simply don't make a lot of sense. For example, the greet-pause feature of sendmail will reject connections from senders/servers if they attempt to send email before the receiving system send an intentionally-delayed greeting. The assumption is that spammers use impatient software, and that non-spammers use polite RFC-compliant software. Of course this is not true but out of the thousands of connections we reject, there is no way to know how many would have been spam and how many are not. The same is true for IP-address blacklisting; UW is not a major spammer yet we seem to land on black lists from time to time and in spite of being falsely targeted, we gleefully turned on our own feature to reject email from black-listed sites. There was some sort of idea that involved having everybody put in a special DNS entry so that others would know you're OK but this never seems to have gotten very far because, of course, not everybody bought into the idea and spammers can run their own DNS servers. And so on.

Disappearing error messages: Idiots who write error messages on the screen then clear the screen before you can read them. For this reason I'm generally opposed to clearing a screen for any reason. Of course here I'm talking about terminal emulators for the most part but the same thing has been observed for example in Solaris installation windows, where I once had the install complete and clear the screen in spite of unrecoverable CD-ROM read errors, removing the evidence that anything went wrong let alone what it was that went wrong. The same thing can happen on Sun systems when OBP detects a problem then clears the screen when the windowing software starts; if this is before syslogd starts, you may not even know that anything is amiss.

Metric: When the metric system was invented, all sorts of new screw sizes were invented. Not only did these differ from the old "English" sizes, they even decided to invert pitch, going from threads-per-unit-length to length-per-thread. Duh. But then they come up with standard sizes that are *almost* identical to the older sizes, so that you can't even decide at times whether you're holding a metric screw or a standard screw. So what does this have to do with computing? A lot if you just mangled a metric threaded hole in a rack because you put a 1/4-inch screw into it. Or if you have one of those old stupid Sun SBUS cover plates with screws that might be 2-56 or they might be 2.5mm. Of course the bloody French expected the planet would go metric just because they are the French, and most of the planet did except the dumb old US. Sigh. Time for me to take a stress pill.

"The fact that [the sample random number generators] were given in Pascal should not be taken as an indication that they are intended for toy applications. A Fortran adaptation is given here." -- F. James, Computer Physics Communications, Volume 60, page 337.

– End