Insights from 46 Classic Papers — ‘Ideas that Created the Future’

24 min readNov 16, 2023

You read so many books in a decade, and then you read decades in a single book. Prof. Lewis’s compendium of 46 “classic” computer science papers is one such.

It took a week long grind to go through such dense matter, but — just like after a steep climb — the feeling was exhilarating. Spanning from Aristotle, Leibniz to Shannon, Bush, Wiener to Moore, Dijkstra to Knuth, RSA — this is like a grand tour of Computer Science’s Louvre accompanied by insightful expert commentary of Lewis. The 8 different sections of the museum are — Logic & Mathematics; Calculating — from numbers to logical expression; “Grand Theory” of computing — bits and information; Stored program & Data; Higher level language development; Human Factors — adding process and efficiency as an engineering discipline; Algorithms and its classes; Network as a discipline.

Each paper starts with a one pager commentary that often, delightfully, brings back the individual and the times to life. For example, retired Admiral Grace Hopper — in full Navy regalia — was misunderstood for a cabin attendant on her way to a facilitation in D.C.. Pitts never really went to school, contributed massively to the predominant model of AI these days and had a tragic ending. Dijkstra was very opinionated and non-relenting with his candor — good and economical with words — a deadly combination for supposed “average performers” to work with him :). Turing wrote a letter to his mother surprised at why his paper was not making an impact — it eventually did, created the whole industry! Leibniz was the primordial techno-utopian. And so many more.

Perhaps one of the best Computer Science books I ever read. Definitely THE best if I consider the “information density” alone. Cannot recommend this highly enough. Please take at least a week and savor the ideas that STILL keep building the future. More importantly, be ready to be awed with their clarity of thoughts and and deep, piercing insights that often peeked through times hundreds of years ahead.

These papers are, quite literally, the “Federalist Papers” of Computer Science. The authors are the Founding Fathers. (And, “fathers” is a mere placeholder for all genders here — like Richard Dawkins once said — something like — “as ‘table’ is used as him, her or it in different languages — that does not take away from its table-ism”)

I read each of the 46 papers primarily with a singular question — “what was the key insight?” As — I believe - Schopenhauer once said , “talent hits a target no one else can hit; Genius hits a target no one else can see.” Another way to look at it was “The future is already here — it’s just not evenly distributed”. These geniuses could not only spot THE future with their insight, but also pushed Computer Science very hard to accelerate that dispersion. Inspiring, enlightening and timeless — each of the 46 papers.

Key insights -

Aristotle — Prior Analytics

We lose sense of wonder with time. As a remedy, someone once said “just think of modernity without the toilet”. Almost 2500 years ago, with no precedence, Aristotle figured out “two-valued logic” — true and false. By this binary categorization, he boldly posited, essentially, that — therefore we can “reason mechanically”, i.e., compute.

His clever use of disjointed counter-example is still used as a method to refute conjecture. i.e., “Man is an animal”, “Horses are animals”, but “No Horse is a man”.

Leibniz — The True Method

If Aristotle discovered “true” and “false”, Leibniz found out how to do “math” with them. He also worked on a computer that could do BOTH addition and multiplication — therefore uncovering the potential to add efficiency to a process AND mechanize/automate it — the Holy Grail of computing.

Aside, he was the first to use systemic notation, e.g., in calculus — the power of standardize, succinct symbols to fit larger concepts — the very essence of language — into easily understandable expressions. Good notations lead to good language. Good language leads to good thoughts. That eventually led to “logical reductionism” and paved way for the future.

Ada Lovelace — Sketch of Analytical Engine

Aristotle created binary logic. Leibniz progressed with expressing it as mathematical equation. Lady Lovelace essentially created a “compiler” of this algebraic expression so it could be mechanized. Think of the very first “Computer Architecture”. Almost 200 years ago, without any precedence, she laid out the essential components of a programming language — Nested Loops, Loop flow controlled by an index integer, Step Counting, Exception Handling, Dichotomy of data and code.

Perhaps even most importantly, what we could call “A Programmer’s Curse” — a theme that we would see time and again — she lamented about underestimation of a “program difficulty”!

George Boole — An Investigation of the Laws of Thought

Time came for a “logic piano” — calculation of binary prepositions. Boole’s singular insight was “language is an instrument for human reason and not merely a medium for the expression of thought.” With this first principle, logic algebra was now an indubitable part of mathematics — and could apply the laws of mathematics (commutation, distribution, union etc) to logic. The formal bedrock of computing was now all set. It will take a few decades till these laws could be “circuitized”.

Hilbert — Mathematical Problems

“We must know. We shall know.” — the very essence of everything.

Perhaps the best ever opening line of ANY paper — if you ever read anything more inspiring, please share —

Who of us would not be glad to lift the veil behind which the future lies hidden; to cast a glance at the next advances of our science and at the secrets of its development during future centuries?

Turing — On Computable Numbers

Hilbert’s “know the unknowable” inspired Turing. Rest is history. His machine was essentially built on a single, simple principle — “write symbol in a box and move to an adjacent box” to process and compute the 0s and 1s. This fundamental reductionism has continued to be the “First Principle” of computing innovation, e.g., “taxicab through computer == Uber”.

His extremely unfortunate and untimely death was poetically put — “Giant of logic lost to irrationalism”.

Eiken — A Proposed Automatic Calculating Machine

Harvard architecture — i.e., data and program in separate kinds of storage has been the bedrock of embedded computing where programs are “frozen in firmware that cannot accidentally be altered as the machine is running”.

Shannon — A Symbolic Analysis of Relay and Switching Circuits

Shannon engineered Boole’s Binary Algebra (“Laws of thought”) into electric circuits. He connected the two binary valued systems — electric (switch on/off) and logical (true/false) — philosophy met and fused with physics to build a new world.

McCulloch and Pitts — A Logical Calculus of the Ideas Immanent in Nervous Activity

Like Shannon found the analogy between electric circuit and binary logic — McCulloch and Pitts modeled brain and all its functions as a digital system — a deterministic finite state machine that can be explained by propositional logic. They enthusiastically declared “We know how we know”, and — delighted by their model, proclaimed a future untethered — “for prognosis, history is never necessary.”

To literally reverse engineer physiology from a bold new concept has always been a grand theme of science — from bloodletting to neural network.

von Neumann — First Draft of a Report on the EDVAC

Programs could be processed and loaded into memory like any other kind of data. EDVAC program was stored in fast memory, rather than on card or tape in Aiken’s Mark I so successive instructions could be accessed at electronic speed.

Vannevar Bush — As We May Think (1945)

While Shannon focused on circuit design, Bush did on information storage and retrieval. In the precursor for what would be “the internet”, his vision for storage was global — a ‘planetary memory for all mankind’, and retrieval of that data was contextual as opposed to semantic — ‘selection by association rather than by indexing’. Bush’s insight was, searching such a global database would not stop after finding one record — like it does with searching Library catalog — but would work analogously with human mind — by association. Third, for such search results, like our memory does when it instantly connects a favorite childhood song with the smell of Grandma-baked-cookie though we may have forgotten BOTH independently — he also emphasized the “speed of action…is awe-inspiring beyond all else in nature”.

Like Aristotle’s binary classification became the Cinderblock of back-end of Computer Science, Bush’s “store all, retrieve all likes in the fastest possible manner even if the trails are weak or transitory” became the core interface of every global interface. PageRank is one such implementation for data retrieval for universal search.

Lastly, looking back, one can only wonder at this constructive and egalitarian vision of future right after the desolation of World War II.

Shannon — A Mathematical Theory of Communication (1948)

A repeating theme of Computer Science, and key reason behind the rapid accelerated adoption, is its ability to compress ideas into short names. Naming has remained one of the three key challenges in CS for a reason. Bit - binary digit —being one. Shannon, who earlier designed the primordial circuit, now defined bit as the unit of memory - the ability to retain the value of one binary digit. This short and succinct name itself acted as a mirrored example what Shannon focused in the paper — how to encode long messages in the most compact way that also minimizes the transmission of error, but also — in most cases — can rectify them.

Shannon juggled, rode a unicycle and sometimes did both together. Tragically, the father of digital memory died of Alzheimer’s.

Hamming — Error Detecting and Error Correcting Codes (1950)

Hamming, who once said — “In science if you know what you are doing you should not be doing it. In engineering if you do not know what you are doing you should not be doing it” — employed an engineer’s mindset and accepted errors in communication channels as inevitable. Moreover, he applied a secondary insight- “If a computer can indicate a problem, it can also figure out where it was and fix it.”

He formulated the notion of ‘error correcting code’ by adding (fairly minimal) storage overhead to the message for ‘parity check’. This was perhaps also the very first example of software reliability engineering.

This — accepting imperfection as inevitable and using the system itself as a lever to fix the flaw — is the bedrock of real-life engineering practices.

Turing — Computing Machinery and Intelligence (1950)

While “Turing Test” remains the popular abstract from this paper, this is a masterful example of reframing a fuzzy question so the answer could be subject to easy validation. Turing substituted “can machines think” to “can a machine replace the man (or woman) in a game where an invisible interrogator is trying to find out if s/he is a man or woman by only questioning them?” Notice the two levels of indirection — first, the imitation game — that can be defined and inferred independently of the original problem; second, the replacement attribute — can a machine play this game successfully?

This - applying a layer of indirection to the core problem to make the problem analytically tractable — has remained a Universal Principle of computing. The indirection is even inevitable as we essentially flatten a real-life problem into the realm of bits.

In this paper Turing also deftly insinuates the orthogonality between humans and machines — “We do not wish to penalize the machine for its inability to shine in beauty competitions, nor to penalize a man for losing in a race against an aeroplane”.

Wilkes — The Best Way to Design an Automatic Calculating Machine (1951)

Wilkes’ paper is a good example of “reliability first” approach to architecture.

“Reliability of the machine will depend on the following:
(a) The amount of equipment it contains.
(b) Its complexity.
(c) The degree of repetition of units.”

Stating the principle, he goes on to concretely define complexity —

the extent to which cross-connections between the various units obscure their logical inter-relation.

Therefore, a machine is easier to repair if it consists of a no of units connected together in a simple way without cross-connections between them.

His elegant solution was to “design a more primitive micro-instruction set for such micro-operations as shifting bits within a register OR moving bits between registers, and then to implement the actual machine instructions as micro-programs of those micro-instructions.”

This set of four principles — (i) reducing complexity by optimizing number of components, (ii) minimizing and standardizing their interactions, (iii) designing lower level operations as atomic and composable, (iv) then, composing progressively higher-level workflows of those lower level operations — is canonical for all successful large-scale system architecture in technology.

Hopper — The Education of a Computer (1952)

Trivia: Hopper taped a moth to a log book with a note — “First actual case of a bug being found” — indicates she did not coin the word. It ALSO indicates the tendency of software engineers’ continuously playing with words and metaphors. If doubt, just check Jeff Dean’s tweets!

Grace Hopper’s core insight was computer usage in business apps will far exceed that of numerical analysis in science — the de facto workstream till then. To enable this business-computing-at-scale requires compilers, code optimization tradeoffs, global program analysis, hierarchical program structure, macro-languages etc. One needed to invest into the bottom of the stack so more engineers who had not necessarily worked in likes of Manhattan Project or Moon Landing could safely write code. To build and maintain this ease of use and manageability framework, cost of software in the long run will vastly exceed cost of hardware. But gains in organizational efficiency will far exceed the loss in execution speed in high level languages. One traded up to safe, easy to use higher level languages betting the multiplier effect of democratizing computer programming will far offset progressively slowing cost of execution slowdown. She was so far ahead of time that practically any innovation in commercial software architecture primarily has been aimed at increasing the efficiency of production of code. Think of frameworks to integration tests in pipeline to Apache Common Log Format — we have been betting the same bets Admiral Hopper won.

Rosenblatt — A Probabilistic Model for Information Storage and Organization (1958)

Adam Smith : Capitalism is Frank Rosenblatt : AI. Till he challenged the paradigm and shifted it, curve fitting was the go to way to forecast from data. Like Socrates vowed to support the earth with the right lever alone, “given just seven parameters, I could fit an elephant” was the predominant dogma. Rosenblatt broke away and introduced the path — how the information could flow — to dictate the results over the data compared to itself. He generalized a deep binary arithmetic system. Second, he analogized and modeled decision making after our brain chemistry. That model has remained the core abstraction as we try to build our computers in our shadow.

Wiener — Some Moral and Technical Consequences of Automation (1960)

Norbert Wiener was kind of like Young Sheldon — he enrolled in Tufts at a ripe age of 11 and earned his PhD at the age of 18. He immediately started teaching at MIT.

In a field concerned with discrete atomics, Wiener brought statistical and continuous thinking. Especially around how messages are passed between different parts of a system as part of a continuous feedback loop.

His was one paper that could be read for insights alone, leaving aside practical implication. e.g., he firmly believed machines may not exceed human intelligence but they almost always transcend human performance. In other words, machines would make mistakes. And machines would make mistakes much faster than humans are capable of detecting, let alone fixing. Thus even if machines are subject to human criticism, such criticism may be ineffective until long after it is relevant. By the very slowness of our judgment, we give up our effective control of machines.

Any big disaster — Flash Crash, Air France Flight 447, or even the recent failures of some of the large banks as they did not optimize their asset holdings with the rapid increase in interest rate — is essentially a flaw in the feedback being far faster than our ability to safely react to it.

Intuition — i.e., contextual System 1 (rapid) judgment from deep situational experience — is probably the only way to mitigate such systemic disaster. A good example is how Vasily Archipov relied on his judgment and not the radar data to prevent a thermonuclear war.

Licklider — Man-Computer Symbiosis (1960)

From Leibniz to Lovelace to Turing to Hamming, all Computer Scientists focused on the machine. Grace Hopper strategically focused on programmer productivity. But Licklider was the very first visionary who deeply thought on how might the computer help the human make better decision — what we call User Experience or Human-centered Design these days. This is a natural progression from Wiener’s concern — machines make fast mistakes offering humans very little time to react. Licklider proposed solutions so those split-second decisions are as safe as possible. The goal was to prevent accidental nuclear disaster. But we also got memes and cat videos built on the same principles.

His core insight was — “in man-machine systems, the sum of probabilities of very-low-probability alternatives if often too large to neglect.” This is essentially Murphy’s Law encoded in a maxim. Things always go wrong because it is only natural for them to go wrong with a vast alternatives of outcome. As Werner Vogels posited “everything fails all the time”.

He also wrote — “men are noisy, narrow-band devices, but their nervous systems have very many parallel and simultaneously active channels. Relative to me, computing machins are very fast and very accurate, but they are constrained to perform only one of a few elementary operations at a time”. Thinking these two actors together — a fast calculator or information retriever (computers, say “System 1” in behavioral psychology) and a slow, parallel wetware with judgment but many encoding and retrieval errors (humans, say “System II”) — as a symbiotic, inter-dependent system and think about the design holistically was the genius of Licklider.

McCarthy — Recursive Functions of Symbolic Expressions and Their Computation by Machine (1960)

He created LISP — implementing lambda-calculus. Imagine a fast-growing programming universe without a compiler — McCarthy inverted the existing programming langugage design principle. Rather than using programming constructs as lego blocks and build a house, he tried to see if we can just ask the language to build a house and it could sort things out. Sort of! In doing so, recursion — say, keep fetching lego blocks — was the key semantic. Efficient recursion required even more discipline to manage scarce computer memory. So McCarthy even wrote the first Garbage Collector — it kind of magically cleans up your lego staging area when you are done. These legos were lists — of various things, numbers, words but also, larger custom-built legos themselves — like house addresses.

In other words, he tried a code-writing framework that emulates thought. If you want to alphabetically sort a list of names, just tell computer “sort(list_of_names)”. Even today, irrespective of the progamming language, such functions are the core engines of work. Following LickLider’s principle in Programmer experience he created a language that ‘iIs more convenient for the programmer than a system in which he has to keep track of and erase unwanted lists.”

Engelbert — Augmenting Human Intellect (1962)

Engelbert “put flesh on the bones of Bush’s vision”. From Jira to Zoom — Engelbert’s biggest influence was conceptualizing and boldly demo-ing “distributed, collaborative workflows”.

“No person uses a process that is completely unique every time he tackles something new” is a digital complementary of Heraclitus’ “you never step onto the same river twice”. Even revolutionary progress has the past sewn into it someway. Think of “recycle bin” in Windows.

Second, Engelbert is likely one of the first to boldly recommend, essentially, to move fast, right now, even if you do not have all the tools, knowledge or data. You will very likely figure things out as you iterate. This has become the primary operating principle of Silicon Valley.

Third, “In the long run, the power of the human intellect is really much the more important than the nuclear power” — Engelbert realized tiny doped semiconductors that will augment human problem solving will be the driving force of civilization over any other mechanical forces.

Corbató et al — An Experimental Time-Sharing System (1962)

The primeval debate was between bunch of small computers, each capable of solving mass customer needs, perhaps by working in parallel VS. one large-computer, fully obtaining the benefit of speed-ups. Grace Hopper, who believed in the business applications’ being the primary use case was of the former camp. John Backus, believer of computers being in service of “big science” and inventor of Fortran, was of the latter.

This paper was the bridge o settle debate between large vs small computers. Essentially, divide the processing power of a big computer into planned chunks among end users. “Larger and more complicated programs are written to take advantage of larger and faster computers. This process invariably leads to more programming errors and a longer debugging window.”

This, also, was the precursor to Unix.

Sutherland — Sketchpad (1963)

If Licklider wrote the API interface for human-computer symbiosis, Sutherland was the very first to concretely implement it with his sketchpad. His invention was a light pen — a photocell — to operate on the computer displayed image.

His big insight was to make computer interactions multi-modal — from text to touch — “we have been writing letters to rather than conferring with our computers.”

Moore — Cramming More Components onto Integrated Circuits (1965)

For me, this paper is a full-blown academic course on how to set targets by analyzing patterns — i.e., pretty much 80% of leadership. His concrete and succinct articulation of a “stretch goal” became a self-fulfilling prophecy — we are now trying to build transistor essentially as big as a single atom! Less is Moore.

“Future of integrated electronics is the future if electronics itself” — is a unique Moore insight and a great thinking filter to apply on any new innovation.

Dijkstra — Solution of a Problem in Concurrent Program Control (1965)

“Brilliant and challenging figure” — is how Prof. Lewis starts his introduction to Dijkstra.

Dijkstra would be a Twitter star with his razor sharp, near-trolling opinions. Consider this — “Computer user is not a real person of flesh and blood but a literary figure..He is stupid, education resistant if not education proof, and he hates any form of intellectual demand made of him, he cannot be delighted by something beautiful, because he lacks the education to appreciate beauty. Large sections of computer science are paralized by accepting this moron as their typical customer.”

or,
“Any successful AI project by its very nature would castrate the machine.”

“(CS) has an order of magnitude too many students. From a scientific point of view you would like to weed out the lot. Keep the brightest 2% and do business.”

Eminently quotable like any other “challenging” figure, this ‘one pager’ paper from him is a textbook example of clarity of thinking directly transposed on typed paper. He offers a solidly logical solution — a pattern (“mutex”) — to solve a canonical concurrency problem duo — safety (no bad thing happens) & liveness (good thing ultimately happens).

Weizenbaum — Computer Program for the Study of Natural Language Communication between Man and Machine (1966)

William Morrison of Des Moines, Iowa built the first electric car in 1890- six-passenger wagon that could go up to 14 mph. Tesla Model S-Plaid (2021) could top 150 mph and is the fastest mass market car (0–60 mph < 2 sec).

As they say, the future was always in the past — just not evenly distributed. Or, had the necessary support from technology.

ELIZA — rule-oriented (first!) chatbot — was the parent of ChatGPT. Weizenbaum designed it to have a “persona of a psychotherapists” (i.e., over-index further probing question over a factual answer). Fleeing from Nazi regime, he witnessed dehumanization of human beings. His work, perhaps even unwittingly, animated the inanimate.

The design principle of “when the conversational partner is a machine then the idea of credibility may well be substituted for that of plausibility” — could apply as a filter to the debate of where AI should, rather than could, replace human interactions.

The critical insight of this paper can perhaps be found in “ELIZA shows how easy it is to create and maintain the illusion of understanding, hence perhaps, of judgment deserving of credibility. A certain danger lurks there.”

Dijkstra — The Structure of “THE”-Multiprogramming System (1968)

3 out of 46 classic papers are from Dijkstra — a stunning 6.5%. In this one he shares design of “semaphore” — a ubiquitous construct for concurrent programming. He, also, brilliantly articulates the crux of complex project management - “Select a project as advanced…in the hope that routine work can be kept to a minimum; hold out against all pressure to incorporate such system expansions that would only result into a purely competitive increase of the total amount of work to be done”.

Pithy aphorisms are sprinkled throughout, like -

“Be aware of the fact that experience does by no means automatically lead to wisdom and understanding; in other words, make a conscious effort to learn as much as possible from your previous experiences.”

This was a great retrospective, note how he provides numbers behind his robust argument -

“Taking care of the pathology took more energy than we expected and some of our troubles were a direct consequence of our earlier ingenuity. Coding errors (1/500 instructions), located within 10 minutes of inspection by the machine and easy to remedy. The larger the project, the more essential the structuring.”

Dijkstra — Go To Statement Considered Harmful (1968)

This was a letter “to the editor”, not a formal paper. It contains his typical piercing empirical goal post — “Quality of programs is a decreasing function of the density of go to statements in the program”.

“Our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed...should do our utmost to shorten the conceptual gap between the static program and dynamic process, to make the correspondence bet the program (spread out in text space) and the process (spread out in time) as trivial as possible.”

i.e., elevate the program-process conceptual parity to easily understandable domain.

The broader takeaway here is how Dijkstra was very comfortable shedding a prevalent dogma when he understood that real-life changes in the field (better compiler, more programmers, bigger projects, easier recursion etc) necessitated readability over classicism. “When the facts change, I change my mind. What do you do?…” — is a hallmark of every great operator, paving their success in a field more than just once as Dijkstra did.

Strassen — Gaussian Elimination is Not Optimal (1969)

If I have to guess, a big percentage — perhaps >50% — of heavy work in modern technology is matrix multiplication. The share will only increase as we map our analog world richness into digital dimensions and try to extract meaning out of it.

Before Strassen, it was universally assumed that two (n x n) matrices could only be multiplied with a O(n³) efficiency. Strassen demonstrated two (2 x 2) matrices can be multiplied with 7, rather than 8, multiplications.

Without Strassen’s clever optimization, the core web — search, for example — will likely be nearly 2x as much slower. Or, the “I am feeling lucky” button won’t exist for multiple use cases.

While the optimization is seemingly “high-school math trick”, no one before Strassen had even tried it.

Hoare — An Axiomatic Basis for Computer Programming (1969)

From outside, Computer Science feels a lot of building. From inside, the reality is a lot of optimizing — relentlessly making a few things, namely the bottlenecks, more efficient. e.g., Strassen’s algorithm (above) to solve linear equations faster.

Hoare’s domain was programming language and his core insight was “time spent in program testing is often more than half the time spent on the entire programming project, 2/3rd of the cost of project is involved in removing errors during this phase.”

Once he identified the bottleneck, Hoare proposed a framework so programs could be subject to more “formal verifications” to eliminate the large time spent in post-hoc testing.

Codd — A Relational Model of Large Shared Data Banks(1970)

Someone once said “business computing is two things — either put data in database — write safely; or get data out of database - read fast”.

Many others focused on hardware, programming languages, designing user interface or optimizing algorithms before Codd, but he came with the fundamental insight that would add trillions of dollars to global GDP a year — “it is common in academic circles to think that the science of computing is all about algorithms. But in business, computing has always been about data”.

Codd’s paper was the Manhattan Project for any real life software application. It defined the rules, showed how it’s done and truly democratized computing-integrated-over-time focusing on database.

Royce — Managing the Development of Large Software Systems (1970)

I like to use 5P framework to think deeply about software systems — Product, Platform, Process, People, Production. As hardware, software matured, software engineering needed to encode processes that work at large. Royce’s paper was the first one to do so. It gave way to — somewhat arbitrarily derided these days despite having optionalities— “waterfall model”, though he never used that term. The model was inherently sequential — a step has to complete before next begins. But like iterative methods, like Agile, early implementations were subject to refinement and rapid change embracing feedback from the latter stages. e.g.,

“Simplicity and elegance are the highest values. Design before coding, document it completely and build a pilot model to uncover and solve problems before entering the test phase.”

That is the fundamental block of any modern process — built, iterate, better.

Royce was the Alpha Project Manager.

Jones — A Statistical Interpretation of Term Specificity and Its Application in Retrieval (1972)

“The less common a term is in a collection of documents the more likely it is of significance for the few docs in which it appears. Counter-weighting factor is IDF.”

Read this again and marvel at the counter-intuitive insight. And its intuitive appeal — simplicity.

Computing built things out of nothing — it used lack of a signal as 0, created binary logic on top of that 0–1 dichotomy and created a universe of its own. As Sherlock Holmes once labelled “the dog that did not bark” as the curious incident — Ms Jones used rarity of a term to label its significance.

Simply speaking — one, the more often a term occurs within a document the more relevant it is to the document; two, less common the term is within a collection of documents the more significant it it to the document(s) where it belonged. This counter-measurement is called “inverse document frequency” (IDF) and is the cornerstone of any search. The running joke on ChatGPT is — you can identify a document written by it if it has the word “fortify” in it!

Thompson & Ritchie — The Unix Time-Sharing System (1974)

The final section of the paper (“Perspective”) brims with insights — “Perhaps paradoxically, the success of UNIX is largely due to the fact that it was not designed to meet any predefined objectives….Our goals throughout the effort, when articulated at all, have always concerned themselves with building a comfortable relationship with the machine…We have not been faced with the need to satisfy someone else’s requirements, and for this freedom we are grateful.”

It also underscores the contribution of scarcity to successful design what the authors called “salvation through suffering” philosophy.

Cerf & Kahn — A Protocol for Packet Network Intercommunication (1974)

“Internet” was largely conceived as a proactive response to possible thermonuclear apocalypse. Design of TCP is therefore an ultimate masterclass in “defense in depth”. Starting from flow control, multiplexing and demultiplexing of segments among processes, tackling “unrecoverable mutation of the data or missing data”, end-to-end principle and many essential “fault detection, isolation and recovery” primitives that we reflexively apply today originated from this seminal paper.

Careful study of this paper shows how to think about a system with extreme (good!) paranoia and impute stability at large.

Liskov — Programming with Abstract Data Types (1974)

Like our country itself had with Watergate, early software had a major mid-life crisis in the 70s. As programs got bigger, bugs became trickier and more damaging. To counter, two paths two complexity reduction were taken — through control of program flow, and on manipulation of data. Dijkstra, for example, annihilated “go to” to manage control of program flow.

Liskov prescribed language conventions that allow expression only of the minimum necessary functional operations on data, not the internal structures of the data. The user does not need to know how the data structures are implemented — s/he is only concerned with what they do. In other words, your specific business’s “LinkedList” could be “DebitTransaction” data structure. A programmer does not need to know how it is implemented, just the operations s/he could perform on it and the contract of those operations.

This was the root of Object-oriented programming, where, argubaly, we overstepped with overly convoluted and rigid object hierarchy in the earlier days.

Brooks — The Mythical Man-Month (1975)

“The difference between civil and software engineering is that when someone tells you that a bridge is half built, you can walk out onto it and see.”

Brooks coined the term “computer architecture”. This paper has a very high insight density — “All programmers are optimists.”

“Our ..techniques fallaciously confuse effort with progress, hiding the assumption that men and months are interchangeable...thus begins a regeneative cycle which ends in disaster”.

“Cost varies with man-months progress does not.”

However, the core essence of the essay was “availability bias” and how it impacts large software projects. When we estimate, we estimate what we easily can (coding). In reality, non-coding activities dramatically overwhelm coding — e.g., (per Brooks) planning takes twice as much as coding, tests take 3x as much. Since they are difficult to estimate, we just do not factor them when we quote an estimate.

Metcalfe et al. — Ethernet: Distributed Packet Switching for Local Computer Networks

Another prime example of counter-intuitive win was this paper’s key insight — “solution to the subtleties of interconnecting computers with wires was to relax control. So we choose to achieve reliability through simplicity”.

Networking — sharing data over unreliable components — is failure prone. Metcalfe solved the tricky problem within a local network by connecting computers with a common coaxial cable — and then dumping/reading data onto/from them. This is where rather than overwhelming the architecture with extreme control, redundancy and dynamic routing (like needed for TCP — a much broader use case). Metcalfe’s key insight was to relax it and, essentially, let the computer and network figure it out, say, by retrying or retransmitting. His metaphor for ethernet was “an unrooted tree” — where the ether (wire) could branch yet avoid multi-path interference (single broadcast multi-access channel).

Diffie & Hellman — New Directions in Cryptography (1976)

Earlier, we have seen an innovation pattern to “turn nothing (0) into something (computing)”. The beauty of this paper is to turn impossibility (NP) into a solution. There are some problems we cannot solve. Rather than giving up on them, why do not we use them to guard things that should not be solved? Like, say, prime factorization guarding your password.

A secondary insight — skill in cryptanalysis has always been heavily on the side of the professionals, but innovation, particularly in the design of new types of cryptographic systems, has come primarily from the amateurs. Like adversity becomes the strength, lack of institutional knowledge became the courage in Cryptography!

Insights from 46 Classic Papers — ‘Ideas that Created the Future’

Written by Nilendu Misra