A short (and mostly wrong) history of computer networking

👋 This page was last updated ~6 years ago. Just so you know.

When I launched my Patreon, I vowed to explain how computers work. But in 2019, computers rarely work in isolation. So let’s take the time to write a few articles about how computers talk to each other.

The history of network protocols and standards is long and complicated. Starting with a comprehensive review would prove quite tedious, especially if such a review was done in isolation from modern use.

So, instead, as we did with files, we’ll start by accomplishing concrete tasks and work our way back to understand which protocols are involved and how they fit together.

Since our last exploration was focused on Linux, this one will be focused on Windows, for no good reason other than a love of diversity.

Why connect computers?

First off, I think we need to cover why we may want to connect computers together.

In the 1940s, computers weren’t really connected. Organizations that could justify (and afford) using a computer usually had just a really big one, that could only compute one thing at a time.

Those computers, like the ENIAC, were programmed by flipping switches and plugging wires:

Betty Jean Jennings and Fran Bilas operating ENIAC's main control panel, circa 1945-1947.

U.S. Army Photo

A punch card reader was used to input data into the computer, and it could also output data to punch cards thanks to a printer. You could argue that this was an early form of manual networking, since it allowed the one, really big computer to communicate with the outside world.

In the 1970s, time-sharing became the dominant paradigm. It allowed many humans to use the same computer. At first, by queueing whatever they asked, so that as soon as a task finished, another could be started without delay.

Later, more advanced forms of time sharing emerged, where each task could be sliced into smaller tasks and computers, although they still technically only computed one thing at a time, gave the illusion of doing several things at once - multitasking!

Commands were inputted via computer terminals. Some of the more advanced time-sharing systems like PLATO V allowed terminals to be remote.

An IST-II terminal, which could be used to access the CDC Plato network, circa 1978-1980.

Mabu2 user, Wikipedia

This is a lot closer to our modern conception of “networking”, but it’s still not quite there. Terminals weren’t computers - they were basically a display and a keyboard. They didn’t do any computations themselves. They simply allowed access to the actual, single, very big computer buried somewhere in a university.

In parallel, efforts to connect different computers together were underway. I’m talking about ARPANET, for which early plans were published in 1967.

A logical map of the ARPANET, March 1977

The Computer History Museum

Those were separate endeavors. The goal was not to allow anyone to access any computer as if they were physically present, but to perform “data interchange”, so that multiple computers may collaborate in solving a hard problem by sharing intelligence and each performing the task they were designed for.

Eventually, the line between those two use cases blurred, as a standard protocol for remotely control computers emerged, Telnet. While Telnet was initially run over ARPANET’s NCP, it was transitioned in 1983 to TCP/IP, and that constituted the start of what we know as “the modern internet”.

So, to answer our own question, why connect computers?

  • Because powerful computers are expensive, and networking allows sharing them across multiple users
  • To provide access to central databases of information
  • To allow people and organizations around the globe to communicate efficiently
  • Why not?

How to connect computers?

So, let’s say we have two computers.

Connecting them is simple enough, right? We just need to link them:

Since computers are electronic machines, the simplest way is probably to use a cable. A cable is ideal, because it provides a direct connection between the two computers. If we make some very generous assumptions, like, the conductivity being high enough and the cable being short enough that data loss aren’t an issue, then we have a reliable way to transmit electricity from one computer to the other.

We’ll also assume that we have found a way to reliably transform bits (zeroes and ones) into an analog signal and back, because electricity isn’t binary, it’s continuous. Maybe we do something like, check every 100 nanoseconds whether we’re crossing the 0V line going up, or doing down, and that tells us if a 1 has been sent or a 0 has been sent.

How we might do things

Ben Eater

Then again maybe this is actually a whole lot more complicated, because once we start having cables long enough, and we lay them in an environment with other electrical machinery, we start having to deal with noise, and then there’s the fact that both the sending and receiving ends need to have clocks that sort of agree, so maybe for now we leave that part to our electrical engineer friends.

But! The point remains: we’re assuming we have a direct way to send ones and zeroes from one computer to the next. From that, we can build pretty much anything.

If we have more than two computers, we need links between all of them.

For three computers, it may look something like this:

Suddenly things are a lot less straightforward, but it’s okay! Everything is fine! We can still make sense of who we’re talking to because each link is connected to a separate network interface controller (NIC, for short).

So, for example, the purple computer (top-left) knows that if it receives bits on its second NIC, it comes from the green computer (bottom).

But, as you can imagine, things get more complicated the more computers you have:

If you think the wiring in the above diagram is hairy, wait until you have to physically lay it out in a building. It gets ugly. Oh, did I mention that you need 5 NICs per computer just for the diagram above? It gets expensive, too.

So maybe instead of doing that, we make a bus instead. The bus is mostly just a long cable, that all other cables plug into:

Look how great this is! Each computer has a single NIC! There’s only one cable per computer (not counting the bus itself). But maybe at this point we have some big problems to solve, which are:

  • Everybody’s talking at the same time
  • Nobody can tell who anybody else is
  • Nobody ever shuts up

To solve these problem, maybe we come up with a scheme.

Before, we had full control of a link, so we could just keep talking and talking, there wasn’t ever really a need to shut up. But now, we may want the bus to carry several conversations at the same time.

So, much like time-share systems of the 70s and 80s, we divide a unique resource (here, the bus) into slices (here, datagrams), so that we may have the illusion of multiple conversations being maintained in parallel. (When really, there’s only ever a single conversation going on, we just switch very often between bits of conversations).

Second, just so we know who’s who, we attribute every computer a unique address - no, better, we attribute every NIC a unique address. Maybe we pick a large address space. If we give it 6 bytes, we’ll have about 281 trillion unique addresses, and ought to be enough for everyone.

We’ll call those addresses MAC, for “media access control”, because they help controlling access to a media (the bus). And of course, now we’ll need to include those MAC addresses in every datagram (slice of a conversation) anyone sends - both the source MAC and the destination MAC, so that computers know the sender, and if the datagram is meant for them.

That leaves us one last problem: everybody keeps talking over each other, and that will definitely corrupt our datagrams.

So, since all the computers are on the same bus, they can hear everything. So maybe as long as they hear chatter they shut up. And as soon as there’s silence, they start to talk again.

Of course, this has several flaws. For starters, if multiple computers are waiting, and they all start sending as soon as there’s silence, they’ll keep talking over each other over and over and over and over again:

So maybe they wait a random amount of time. And maybe, if all else fails, we add a checksum to all the datagrams, so we know if something bad happens and we know to just discard it. Maybe we leave it to the protocols above to retransmit.

And maybe, if we did all that, we’d have come up with one of the key technologies that makes up the internet. Not only a physical layer (sending bits over wires), but a data layer as well (MAC addresses, collision detection, etc.).

And if we were Robert Metcalfe, and it was 1973, we might name it Ethernet.

How to connect many computers

A bus is all fun and games, but it has limitations.

As mentioned earlier, anyone on the bus can hear what anyone else says, no matter whether it was meant for them or not. In addition, as more and more computers are added to the bus, collisions are more and more likely to happen - wasting everyone’s time.

But more importantly, it doesn’t, really.. physically scale. A bus is a single cable. Imagine running a single cable throughout Europe so that everyone is on the same bus:

A map of Europe

Ssolbergj

Ye can’t, really, do it. And even if you did, the bus would be so long that you would start to have a lot of problems, like, the signal would attenuate and distort a lot as it moves through the metallic conductor, to the point where it would be indistinguishable from noise.

Even if you found a super conductive material with zero loss, you would be limited by the speed of light (more or less). It would affect transmission speeds for sure, but more importantly it would mess with our collision detection strategy.

Let’s make a back of the envelope calculation that I’m sure nobody will contest: let’s say the speed of light is actually 300’000 km/s, and let’s say we have a perfect conductor. The distance between Madrid (Spain) and London (ex-EU) is about 1262 km, according to Wolfram Alpha

In our scenario, a bit sent from Madrid would take about 1564 / 300’000 = 0.00521333… (repeating) seconds to reach London. That’s 5.2 million nanoseconds. If we have a speed of 100Mbit/s (like 100BASE-TX), that means we send a bit every 10 nanoseconds. So, if we emit continuously from Madrid, we can send over half a million bits before London even receives the first one. (That’s over half a megabyte.)

To solve our physical connection problem, we could make machines that have many, many ethernet ports, and whenever they receive something on a port, they’d send it verbatim to all other ports.

…but this still doesn’t solve the fact that, with a very large number of computers on the same bus, the chance for collision is incredibly high, and everyone basically spends their time waiting for everyone else to shut up.

Additionally, all the traffic sent from a Madrid computer to another Madrid computer is still sent to everyone in Europe. Now that we have “hubs” in Madrid and Paris, it would be useful to separate intra-country datagrams (Madrid-Madrid, Paris-Paris) from inter-country datagrams (Madrid-Paris, Paris-Madrid).

We’re going to need more addresses.

Whereas MAC addresses were uniquely assigned to each network interface, and had prefixes by constructors (for example, all MAC addresses starting with 00:13:10 belong to Linksys), the addresses we need now should be prefixed by region. Or at least, by “groups of nodes” that are related by their interconnection.

For example, we could decide that every computer connected to the “Paris” hub should have addresses like “109.208.x.x” (where x can be any number between 0 and 255), and every computer connected to the “Madrid” hub should have an address like “176.84.x.x”.

That way, if we specified the source address and destination address in every datagram we send, the central “hubs” in Madrid and Paris would know if they were meant for inside the country, or outside the country. They would know where to route them. Maybe we’d rename those “hubs” to routers instead.

Example for a Madrid-Madrid packet:

Example for a Madrid-Paris packet:

And if we were the University of Southern California and it was 1981, we might call it IPv4, for “Internet protocol, first version that Amos cares to talk about”.

But wait, there’s more

So if we had Ethernet and IP, we could connect the world, but it would still be a tiny bit annoying to remember everyone’s IP address.

So maybe we could figure out a system to attribute human-readable names to some of the nodes in the network. We would call them “domain names”, and the protocol would be named “DNS”, for Domain Name Service.

Maybe the IP addresses wouldn’t be static, but instead dynamically assigned, so that the router makes sure every computer on its network has IP addresses in the same range, and no two computers ever have the same address. We’d call that protocol “DHCP”, for Dynamic Host Configuration Protocol.

Of course, the regular consumer-grade computer wouldn’t know who to send the DHCP request to, nor what its own address is, so maybe we would have to reserve some specal IP addresses for broadcasting, ie. sending packets to everyone on the network at first. Maybe one such address could be 255.255.255.255.

We’ll need a lot other protocols. Maybe we could have protocols to reconfigure routes between countries. Maybe we could have protocols to exchange structured information of some sort, or perhaps files. Maybe some protocols would be reliable, and others would be lossy. Maybe some of them would fall out of fashion after a better one came up.

But maybe, since we wouldn’t want to get immediately overwhelmed, we would start with something simple, like, knowing if a given IP address is reachable from our current connection.

In other words pinging it.

Pinging a computer

Luckily, pinging a computer is really simple (for us). Windows comes with a built-in tool, C:\Windows\System32\ping.exe that does exactly that.

All that we need to do is supply it with an IP address and… tada:

However, if I supply it with an IP address I know is unassigned in my local network:

Things seem to work as expected.

As it happens, 8.8.8.8 is routed over the internet, to somewhere in the US. There’s a lot of hops happening there. Whereas 192.168.1.60 is a local address, so my local router, who’s in charge of assigning IP addresses, know that it’s not attributed to anyone currently, and can say with authority that, it cannot be reached at the moment.

But using a ready-made tool like ping.exe is no fun.

We want to make our own tool. And so, in the next series, we’ll peel away abstraction layers. One, after, the other. Until we cannot peel anymore without leaving the world of software.

You know.

Just to make sure we really understand what’s going on.

(JavaScript is required to see this. Or maybe my stuff broke)