About 12-15 years ago, some friends and I were discussing Moore’s Law, but from a slightly different angle. While we’ve enjoyed exponential increases in computer memory memory and storage space over the past couple of decades, we were nevertheless impressed by the programmers of the early personal computers. They were able to write useful programs and very enjoyable games that were less than 64K in size. I don’t think that any of today’s programmers would have the talent or resourcefulness to do something like that now – packing that much functionality into such a small space requires not only proficiency in a low-level programming language, but also an intimate knowledge of the computer hardware itself (along with its limitations and idiosyncrasies).
One of us then took the comparison a step further and said “What about the Apollo engineers during the 1960s? They had even less memory, and their code had to send men to the moon and back!”. Another friend added “Did you know that 90% of the computer code used during the Apollo missions was redundant? Only 10% of the the code was needed to run the computer – the rest was used for error checking and to ensure that the computers never crashed”.
I can usually identify an urban legend or a hoax fairly quickly, but this one – despite the lack of references or source material – actually sounded plausible. The thought of a computer miscalculation, crash, or the Apollo equivalent of the dreaded Microsoft Windows BSoD (Blue Screen of Death) would be simply terrifying! It seemed reasonable to me that the Apollo engineers would add as much extra error-trapping code as necessary to ensure that the onboard computers never crashed.
So I filed that story away in the back of my mind as something that would likely remain one of life’s great mysteries.
Fast forward to July, 2017. I was attending the American Mensa Annual Gathering, and deciding which lecture to see next. There are typically 6-7 simultaneous lecture streams, and naturally, I think they’re all interesting; it’s exceedingly difficult to settle on just one. For the 10:30 a.m. slot, I finally decided to go with the one billed as “A Behind-The-Scenes Look at the Apollo Moon Landing“. The lecturer was Martha Lemasters, who was a member of IBM’s Launch Support Team as a PR writer during the Apollo missions (IBM was a NASA contractor). After the end of the Apollo program, she worked on the Skylab and Soyuz programs.
Lemasters had also written a book about her time at IBM, called The Step: One Woman’s Journey to Finding her Own Happiness and Success During the Apollo Space Program. Her engaging, 75-minute presentation included numerous facts and trivia about NASA and the Apollo missions, stories about her job and the working conditions, excerpts from her book, and a slide presentation filled with photos that I had never seen before. The room full of Mensa members enjoyed themselves thoroughly. Lemasters is a natural storyteller, and she effortlessly took the audience with her on a journey back in time, to a challenging, fast-paced working environment, but also one that may seem insufferably chauvinistic by today’s standards. For example: women were not allowed to wear dresses on the launch platform because it would be too much of a distraction for their male coworkers. Of course, that’s not quite how NASA phrased it – they said that dresses were a “safety hazard” because a distracted male working on an elevated platform might drop a wrench and injure someone working below.
Personally, I found this directive puzzling: IBM employs only intelligent, educated, ambitious, disciplined and professional people – the best of the best. Surely these men wouldn’t be reduced to salivating teenagers at the sight of a woman in a dress.
Lemasters finished her presentation with a Q&A session, which was an unexpected surprise and a wonderful opportunity – a chance to speak with someone who actually worked on the Apollo mission and who was embedded with its engineers. As she pointed out during her lecture “There aren’t too many Apollo veterans left”. I raised my hand, recited my friend’s claim about the redundant computer code, and asked her if this was actually true.
Unfortunately, she didn’t know the answer. Now most presenters, when faced with a similar question, would simply say that they don’t know, and then move on. However, she then did something that really impressed me. She replied that she didn’t know the answer herself, since she didn’t work directly with the computer systems. However, she added that she still keeps in touch with many of the engineers on the Apollo project, and that if I’d like to write down my question and give her my e-mail address, she’ll forward my question to them.
Well, this was much more than I could have hoped for! I never thought that the redundant code story would ever be verified, and now my question was about to be forwarded right to the source – engineers and programmers who actually worked on Apollo 11 (the first moon landing)!
A few days later, I received e-mail messages from Martha Lemasters, and two former Apollo Mission veterans, James Handley and Kenneth Clark (both of whom Lemasters described as “geniuses”). They not only answered my question, but were kind enough to send several e-mail messages over the next few days, containing an incredible amount of detail. I was impressed with the amount of information they provided, and also astounded that they were able to recall these technical details so vividly after almost half a century.
James Handley was in charge of the design and programming effort for the SLCC (Saturn Ground Computer Launch Checkout System) in Huntsville, Alabama, and then transferred to the Kennedy Space Center in Florida, to oversee the installation and maintenance of the software. Using one the first IBM 360 mainframe computers, Handley and his team developed the SIRS (Saturn Information Management System), a workload management system. He also headed the NASA Flight Crew Training Directorate contract. Handley eventually managed a staff of 90, and was responsible for all Saturn programming efforts, the facility computer, and all new business activities. Later in his career, Handley worked on the design, development and installation of the Space Shuttle Ground Checkout System.
Kenneth Clark summarized his role in the Apollo / Saturn project as follows: “I was a programmer and launch team member for IBM’s part of the project at the KSC (Kennedy Space Center). My earliest job was writing programs to check out the Saturn IB & V launch vehicles. I later became a member of the launch team and the ‘go to’ guy for anything bad that happened to the software in the Ground Launch Computers (RCA 110As). Later I was the leader of the design / development team for the Space Shuttle Launch Processing System.”
NASA Code Redundancy – The Real Story
Here is their response, pieced together from our e-mail conversations:
The Launch Vehicle Digital Computer (LVDC), made by IBM in Owego NY, was called a Triple Modular Redundant (TMR) computer. That meant that the guidance equations (or code) were simultaneously being solved by three different circuits then compared and voted on so if there was a single point failure in the computer, two answers would agree and the third would be discarded. This was done to achieve the close to the 100% reliability desired. So this meant the computer was like three computers plus circuits to compare. On the issue of code redundancy I think there was only one set of code in the computer and the TMR logic all operated on that set of code. Therefore the code itself was not replicated, although I think there were checks and balances in the code also but I don’t think the 10% vs 90% is true.
The term “code redundant” implies that there is code that recomputes a value for which the answer is known, in order to verify correctness. There were two Apollo Guidance Computers in the spacecraft. One in the Command Module and one in the Lunar Module. I doubt there was any of that in the flight computers and know for a fact there was none in the ground computers. The Launch Vehicle Digital Computer used Triple Modular Redundancy (TMR) logic, but I don’t believe the code was replicated. The Saturn Ground Launch Computers were not TMR. However the Mobile Launcher Computer did contain redundant set of code which was switched to if the primary memory encountered a parity error, or if there was a no instruction alarm during execution.
On the subject of error checking, not even close to 90% of the code would be allocated to that task. The amount of memory in any of the computers made it absolutely impossible for there to be much if any code in the computers to be used for error checking. During the Apollo era memory was big, bulky, and most of all, heavy. They just couldn’t afford to launch much of it. Having redundant code would require redundant memory. The error checking that existed was to determine if an operation requested or commanded by a program completed successfully. There were some checks even in the Lunar Lander to report on unexpected errors. An example of this was the Lunar Module program alarms minutes into the landing sequence (Error codes 1201 & 1202).
The memory used in the computers was mostly magnetic core. Here are some examples of the memory sizes used in the computers:
- Saturn Ground Launch Computers (RCA 110A) – 32 K 24-bit words + 1 parity bit
- Instrument Unit Launch Vehicle Digital Computer – 32 K 28-bit words including 2 parity bits
- Apollo Guidance Computers — 2048 K words of erasable magnetic core memory and 36 K 16-bit words of read-only core rope memory.
The Space Shuttle Program carried redundancy to the ultimate level. The computers on the Space Shuttle were AP-101s manufactured in Owego by IBM. They were called the Space Shuttle General Purpose Computers or GPCs for short. There were five GPCs on board the Space Shuttle. During launch, four of the GPCs were executing 100% redundant code programmed by IBM Houston. Each output from this “Redundant Set” was voted by hardware logic. If one of the computers came up with a different answer it was voted out by the hardware. The fifth computer was running software programmed by MIT Labs. The backup flight computer could take over if the “Redundant Set” experienced multiple failures or some other failure took out the “Redundant Set”.
There you have it, right from the source. An urban legend debunked with a mixture of curiosity, serendipity and the graciousness of some people who actually worked on NASA’s Apollo mission. Thank you so much Martha Lemasters, Kenneth Clark and James Handley!