is the founding director of the Center for Science and the Imagination at Arizona State University where he is an assistant professor in the School of Arts, Media and Engineering and the Department of English. His latest book is What Algorithms Want: Imagination in the Age of Computing (2017).
Do you think algorithms enhance or erode our creativity?
When IBM’s Deep Blue chess computer defeated the world champion Garry Kasparov in 1997, humanity let out a collective sigh, recognising the loss of an essential human territory to the onslaught of thinking machines. Chess, that inscrutably challenging game, with more possible game states than there are atoms in the Universe, was no longer a canvas for individual human achievement. Newsweek called it ‘The Brain’s Last Stand’.
Why was the loss so upsetting to so many? Not because chess is complicated, per se – calculating differential equations is complicated, and we are happy to cede the work to computers – but because chess is creative. We talk about the personality, the aesthetics of chess greats such as Kasparov and Bobby Fischer, seeing a ‘style of play’ in the manipulation of pieces on a grid. Chess was a foil, a plane of endeavour, for storytellers as diverse as Vladimir Nabokov and Satyajit Ray, and we celebrate its grandmasters as remarkable synthesisers of logic and creativity. It was particularly galling, then, for Kasparov to lose to a machine based not on its creativity but its efficiency at analysing billions of possible moves. Deep Blue wasn’t really intelligent at all, but it was very good at avoiding mistakes in chess. One might argue that its victory not only knocked humanity down a peg but demonstrated that chess itself is not, or does not have to be, the aesthetic space we imagined it.
And yet Kasparov, after having lost to what he later called ‘a $10 million alarm clock’, continued to play against machines, and to reflect on the consequences of computation for the game of kings. And not just against them: for the past two decades, Kasparov has been exploring an idea he calls ‘Advanced Chess’, where humans collaborate with computer chess programs against other hybrid teams, sometimes called ‘Centaurs’. The humans maintain strategic control of the game while automating the memorisation and basic calculation on which great chess depends. As Kasparov described an early such match:
Having a computer partner also meant never having to worry about making a tactical blunder. The computer could project the consequences of each move we considered, pointing out possible outcomes and countermoves we might otherwise have missed. With that taken care of for us, we could concentrate on strategic planning instead of spending so much time on calculations. Human creativity was even more paramount under these conditions.
Kasparov argues that the introduction of machine intelligence to chess did not diminish but enhanced the aesthetics of the game, creating a new space for creativity at the game’s highest levels. Today, players of ‘freestyle’ chess work with high-end chess systems, databases of millions of games and moves, and often other human collaborators too. Freestyle teams can easily defeat both top grandmasters and chess programs, and some of the best centaur teams are made up of amateur players who have created better processes for combining human and machine intelligence.
These centaur games are beautiful. The quality of play is higher, the noise of simple human errors reduced, making space for the kind of pure contest that the platonic solids and geometries of chess idealise.
Subscribe to Aeon’s Newsletter
We are all centaurs now, our aesthetics continuously enhanced by computation. Every photograph I take on my smartphone is silently improved by algorithms the second after I take it. Every document autocorrected, every digital file optimised. Musicians complain about the death of competence in the wake of Auto-Tune , just as they did in the wake of the synthesiser in the 1970s. It is difficult to think of a medium where creative practice has not been thoroughly transformed by computation and an attendant series of optimisations. The most profound changes have occurred in fields such as photography, where the technical knowledge required to produce competent photographs has been almost entirely eclipsed by creative automation. Even the immediacy of live performance gets bracketed by code through social media and the screens we watch while recording events that transpire right before our eyes.
In fact, the shift might be much more profound for the audience than the artist. Being a critic or consumer of art now relies on a deep web of computational filters and guides, from the Google and Wikipedia searches we use to learn about the world to the recommendation systems queueing up books, songs and movies for us. We rely on computational systems for our essential aesthetic vocabulary, learning what is good and beautiful through a prism of five-star rating systems and social-media endorsements, all closely watched over by algorithmic critics of loving grace. We shape our aesthetic expectations around these feedback loops, finding channels and lists that seem to match our interests and then following them. Google has already introduced a system that proposes responses to your emails based on millions of prior conversations, and the company Narrative Science has been creating algorithmically generated journalism for years.
Today, we experience art in collaboration with these algorithms. How can we disentangle the book critic, say, from the highly personalised algorithms managing her notes, communications, browsing history and filtered feeds on Facebook and Instagram? She exemplifies what philosophers call the extended mind, meaning that her memories, thoughts and perceptions extend beyond her body to algorithmically mediated objects, databases and networks. Without this externalised thinking apparatus, she is not the same critic she would be otherwise. This is true not just in pragmatic terms, in that she might not be nearly as good or efficient at her work, but in biophysical terms as well. Our brains adapt to the tools that we use, like London taxicab drivers with their enlarged hippocampi for mapping the city’s convoluted streets (or the GPS-dependent smartphone users whose shrivelled hippocampi can’t even navigate them around their own neighbourhoods).
Our entire cultural horizon is shaped by the filtering mechanisms of our news feeds, inboxes and search results
The extended mind is now also becoming a space of collective cognition. The critical network of literary reception made up of that critic, the author she writes about, both their friends and followers, the hashtags, links and cross-references that bind these nodes together, all form a much more inclusive tapestry of cultural discourse than was ever possible before. We depend on our friends and social networks to tell us what to think about new creative works, and that process of assessment and sharing depends on algorithmic filters designed to maximise attention, traffic and profits.
Sociologists such as Pierre Bourdieu painstakingly mapped the deeply social dimensions of cultural judgment in the 20th century, but today the deeply intersubjective nature of taste is not just obvious but almost subliminal. Algorithms are shaping the reception of works at the forefront, but also the periphery. The entire horizon of our cultural perspectives is shaped by the filtering mechanisms that populate our news feeds, prioritise our inboxes and rank our search results. And they are, of course, built out of our own collective responses to prior stimuli, modelling a collective aesthetic project that we (often unknowingly) participate in with every click and purchase.
The immediate creative consequence of this sea change is that we are building more technical competence into our tools. It is getting harder to take a really terrible digital photograph, and in correlation the average quality of photographs is rising. From automated essay critiques to algorithms that advise people on fashion errors and coordinating outfits, computation is changing aesthetics. When every art has its Auto-Tune, how will we distinguish great beauty from an increasingly perfect average?
The visceral reaction is to rebel against these simulations of progress and perfection. Many artists today explore the seams and rough edges of digital platforms, creating art out of the glitches and unintended juxtapositions that they can eke out of increasingly complicated creative systems. A few years ago, the American science-fiction author Bruce Sterling helped to champion the idea of the ‘new aesthetic’, predicated on the ‘eruption of the digital into the physical’. The art form celebrates the gritty, avant-garde edge of pixelated shear, and explores the inherent tensions between digital representations of culture and the consequences those representations have when plucked out of two-dimensional server-farm splendour rendered back into richer cultural forms. Carefully crafted physical representations of 8-bit graphic art, giant physical pins emulating the skeuomorphic place-markers in digital map software: these efforts to see like machines are also clearly rebellions against the aesthetic constraints of computation.
Lurking behind these efforts to disrupt the normal functioning of computational culture is a deeper creative need. What we crave most in art, what we reward more than anything else, is surprise. Marcel Duchamp’s urinal, the introduction of perspective to landscape painting, stream-of-consciousness literature – these creative breakthroughs achieve much of their impact by shocking us into some new perspective on the world. Little wonder that the modernist poets were so fascinated by the metaphors of blasts and explosions, or that art has such a long and complicated history with warfare. We need art to surprise us in order to blow up the world, to create fissures out of which the new can emerge.
Computation is not good at this. Algorithms are wonderful for extrapolating from past information, but they still lag behind human creativity when it comes to radical, interesting leaps. So far, they are much better at identifying and replicating surprising content than they are at producing it themselves. Platforms such as Facebook or Flickr’s ‘interestingness’ quotient ultimately measures a kind of surprise, one that draws on information theory as well as aesthetics. We respond to viral memes on social media because they produce something unexpected, often leveraging the deep relationship between surprise and humour. It is telling that so many memes now hide their linguistic tells (more tractable to algorithmic watchdogs than images) inside GIFs and JPEGs that circulate in a kind of shadow economy of surprise.
More often than not, it is a human creator who bends computational tools to achieve a breakthrough
Surprise will remain a human territory, at least for the short term, because it is so idiosyncratic in the first place. Our sense of the unpredictable is so oddly tuned that true randomness can sometimes seem too regular, too predictable, like a long string of coin tosses where the same side comes up many times. At the same time, we are quite choosy about the kinds of novelty that count, a form of distinction that could, in the end, be precisely what we mean by aesthetics. How many art critiques and book reviews boil down to the judgment ‘this is a predictable extrapolation’? Newness is necessary but not sufficient for human surprise. There is a cadence, a significance that we seek in the aesthetics of surprise that reaches deeper than mere randomness. As pattern-seeking animals, we are looking not just for comprehensible behaviours but for signs and portents – stories about the world that allow us to configure reality according to an aesthetic logic.
And that aesthetic, the deeply ingrained logic that the poet John Keats framed in his ‘Ode on a Grecian Urn’ (1819) as ‘beauty is truth, truth beauty’, provides a space for humanity in the age of culture machines. There is a future for human aesthetics in the modulation, the casting of surprise. We will continue responding most powerfully to those creative stimuli that somehow reconfigure our brains, literally allowing us to see in a new way. Machines can occasionally do that for us, like David Cope’s beguiling, algorithmically composed classical music (which disturbed so many audiences by emulating masters such as Mozart entirely too well). But more often than not, it is a human creator who bends computational tools to achieve a breakthrough that somehow becomes more than a recombination or incremental advancement of prior work.
Inevitably, the generation of those surprising works will grow more dependent on computational tools, following in the steps of every other field and industry. And so we come to the second major opportunity for human creativity in the face of increasingly intelligent, competent and aesthetically capable machines. In order to survive, but more importantly to thrive, in the age of algorithms, we need to cultivate a deep respect for algorithmic literacy and the capacity to ‘read’ the impact of computational influences on our work – not necessarily to resist those influences, but to understand them and use them to become better humans.
To follow this argument, we have to contemplate how computation is changing our fundamental cultural grammars of action. Our ubiquitous smartphones, sensors and platforms are more than just new nouns on the stage of cultural practice. They are generating new verbs and grammatical relationships, many of them so obvious that we no longer even pause to contemplate the godlike powers encoded in the phrase ‘to google’ something. As the media theorist Lev Manovich’s work on Instagram suggests, the general dissemination of cellphone cameras goes beyond making photography more accessible; it fundamentally changes what photography means. We are starting to perceive the world through computational filters, perhaps even organising our lives around the perfect selfie or defining our aesthetic worth around the endorsements of computationally mediated ‘friends’.
In other words, digital platforms are imposing exponential shifts in creative practice and possibility. It is possible, sometimes even trivial, to make genetic code sing or translate pitch into colour. Perceived as patches, programs, little nests of scripting, databases and sensors, such things are just demonstrations, methods or practices. But these tools are, first of all, endlessly recombinant, allowing stunning concatenations of creative cause and effect. Second, and more importantly, they depend on the same universal ideology of computation that drives Silicon Valley’s continued expansion into cultural life. According to this logic, it’s all data, from DNA to footstep counters, from high-frequency stock trading to videoblogging. And while that makes more aspects of cultural life accessible to algorithms, it also makes all the same aspects tractable for art.
As creators and appreciators of the arts, we would do well to remember all the things that Google does not know
Artists can now, in a very real way, make art out of the stock market or the emanations of our smartphone radio antennae. Art that is surprising, and new, and computational. Art that reverses the almost gravitational force currently sucking agency, money and meaning out of 20th-century industries and redistributing them to a small technological elite. For that reason alone, the emancipatory power of art is vitally important as we come to terms with the deep consequences of cultural computation. Surprise might get us to the door but it is after our neurons are reconfigured, after we can see what we could not before, that really interesting things begin to happen.
The role of art in creating new forms of legibility, of literacy, will become crucial for humans attempting to swim in the ocean of computation, that vast deep we can only appreciate through metaphor, analogy or creative interpretation. At times, it seems like the only way we can really come to terms with the full consequences of algorithms is when they play in our cultural spaces, like Google’s DeepMind machine-learning algorithm in its path of conquest from Atari videogames to the mythically creative game of Go, or its equally mesmerising and psychedelic image-processing cousin, Deep Dream.
Human creativity has always been a response to the immense strangeness of reality, and now its subject has evolved, as reality becomes increasingly codeterminate, and intermingled, with computation. If that statement seems extreme, consider the extent to which our fundamental perceptions of reality – from research in the physical sciences to finance to the little screens we constantly interject between ourselves in the world – have changed what it means to live, to feel, to know. As creators and appreciators of the arts, we would do well to remember all the things that Google does not know.
The creative response to computation needs to pursue both of these avenues – the human capacity and receptivity for surprise; the need for new creative computational literacies – to contend with the sea change we are living through. It’s a challenge that goes beyond coming to terms with algorithms, and one that might become a question of survival on an intellectual level, if not a physical one.
The remarkable precipice we stand beside now is one where our tools are, in a transformative way, just as plastic as we are. Our algorithmic systems are watching us, learning from us, just as we learn from them, creating the possibility for a complex dance of intention, anticipation, creativity and emergence based on individual people, algorithms, and the social and technical structures that bracket them all. This is terrifying and breathtaking all at once, and it’s artists that we need most of all to make sense of a future in which our collaborators are strange mirror machines of ourselves. Aesthetics has always been the unforgiving terrain where we assess pragmatic reality according to the impossible standards of the world as we wish it would be. Computation is a parallel project, grounded in the impossible beauty of abstract mathematics and symbolic systems. As they come together, we need to remain the creators, and not the creations, of our beautiful machines.
Just what is information? For such an intuitive idea, its precise nature proved remarkably hard to pin down. For centuries, it seemed to hover somewhere in a half-world between the visible and the unseen, the physical and the evanescent, the enduring medium and its fleeting message. It haunted the ancients as much as it did Claude Shannon and his Bell Labs colleagues in New York and New Jersey, who were trying to engirdle the world with wires and telecoms cables in the mid-20th century.
Shannon – mathematician, American, jazz fanatic, juggling enthusiast – is the founder of information theory, and the architect of our digital world. It was Shannon’s paper‘A Mathematical Theory of Communication’ (1948) that introduced the bit, an objective measure of how much information a message contains. It was Shannon who explained that every communications system – from telegraphs to television, and ultimately DNA to the internet – has the same basic structure. And it was Shannon who showed that any message could be compressed and transmitted via a binary code of 0s and 1s, with near-perfect accuracy, a notion that was previously pegged as hopelessly utopian. As one of Shannon’s colleagues marvelled: ‘How he got that insight, how he even came to believe such a thing, I don’t know.’
These discoveries were scientific triumphs. But in another way, they brought the thinking about information full-circle. Before it was the province of natural scientists, ‘information’ was a concept explored by poets, orators and philosophers. And while Shannon was a mathematician and engineer by training, he shared with these early investigators a fascination with language.
In the Aeneid, for example, the Roman poet Virgil describes the vast cave inhabited by the god Vulcan and his worker-drones the Cyclopes, in which the lightning bolt of Jupiter is informatum – forged or given shape beneath their hammers. To in-form meant to give a shape to matter, to fit it to an ideal type; informatio was the shape given. It’s in this sense that Cicero spoke of the arts by which young people are ‘informed in their humanity’, and in which the Church Father Tertullian calls Moses populiinformator, the shaper of the people.
From the Middle Ages onwards, this form-giving aspect of information slowly gave way, and it acquired a different, more earthy complexion. For the medieval scholastics, it became a quintessentially human act; information was about the manipulation of matter already on Earth, as distinct from the singular creativity of the Creator Himself. Thomas Aquinas said that the intellect and the virtues – but also the senses – needed to be informed, enriched, stimulated. The scientific revolution went on to cement these perceptible and grounded features of information, in preference to its more divine and form-giving aspects. When we read Francis Bacon on ‘the informations of the senses’, or hear John Locke claim that ‘our senses inform us’, we feel like we’re on familiar ground. As the scholar John Durham Peters wrote in 1988: ‘Under the tutelage of empiricism, information gradually moved from structure to stuff, from form to substance, from intellectual order to sensory impulses.’
Subscribe to Aeon’s Newsletter
It was as the study of the senses that a dedicated science of information finally began to stir. While Lord Kelvin was timing the speed of telegraph signals in the 1850s – using mechanisms rigged with magnets, mirrors, metal coils and cocoon silk – Hermann von Helmholtz was electrifying frog muscles to test the firing of animal nerves. And as information became electric, the object of study became the boundary between the hard world of physics and the elusive nature of the messages carried in wires.
In the first half of the 20th century, the torch passed to Bell Labs in the United States, the pioneering communications company that traced its origins to Alexander Graham Bell. Shannon joined in 1941, to work on fire control and cryptography during the Second World War. Outside of wartime, most of the Labs’ engineers and scientists were tasked with taking care of the US’ transcontinental telephone and telegraph network. But the lines were coming under strain as the human appetite for interaction pushed the Bell system to go further and faster, and to transmit messages of ever-higher quality. A fundamental challenge for communication-at-a-distance was ‘noise’, unintended fluctuations that could distort the quality of the signal at some point between the sender and receiver. Conventional wisdom held that transmitting information was like transmitting power, and so the best solution was essentially to shout more loudly – accepting noise as a fact of life, and expensively and precariously pumping out a more powerful signal.
The information value of a message depends on the range of alternatives killed off in its choosing
But some people at the Labs thought the solution lay elsewhere. Thanks to its government-guaranteed monopoly, the Labs had the leeway to invest in basic theoretical research, even if the impact on communications technology many years in the future. As the engineer Henry Pollak told us in an interview: ‘When I first came, there was the philosophy: look, what you’re doing might not be important for 10 years or 20 years, but that’s fine, we’ll be there then.’ As a member of the Labs’ free-floating mathematics group, after the war Shannon found that he could follow his curiosity wherever it led: ‘I had freedom to do anything I wanted from almost the day I started. They never told me what to work on.’
In this spirit, a number of Bell Labs mathematicians and engineers turned from telegraphs and telephones to the more fundamental matter of the nature of information itself. They began to think about information as measuring a kind of freedom of choice, in which the content of a communication is tied to the range of what it excluded. In 1924, the Labs engineer Harry Nyquist used this line of reasoning to show how to increase the speed of telegraphy. Three years later, his colleague Ralph Hartley took those results to a higher level of abstraction, describing how sending any message amounts to making a selection from a pool of possible symbols. We can watch this rolling process of elimination at work even in such a simple sentence as ‘Apples are red’, Hartley said: ‘the first word eliminated other kinds of fruit and all other objects in general. The second directs attention to some property or condition of apples, and the third eliminates other possible colours.’
On this view, the information value of a message depends in part on the range of alternatives that were killed off in its choosing. Symbols chosen from a larger vocabulary of options carry more information than symbols chosen from a smaller vocabulary, because the choice eliminates a greater number of alternatives. This means that the amount of information transmitted is essentially a function of three things: the size of the set of possible symbols, the number of symbols sent per second, and the length of the message. The search for order, for structure and form in the wending catacombs of global communications had begun in earnest.
Enter Shannon’s 1948 paper, later dubbed ‘the Magna Carta of the Information Age’ by Scientific American. Although the theoretical merits of Shannon’s breakthrough were recognised immediately, its practical fruits would come to ripen only over the following decades. Strictly speaking, it wasn’t necessary to solve the immediate problem of placing a long-distance call, or even required for the unveiling of the first transatlantic telephone cable in 1956. But it would be necessary for solving the 1990 problem of transmitting a photograph from the edge of the solar system back to Earth across 4 billion miles of void, or for addressing the 2017 problem of streaming a video on a computer that fits in your pocket.
A clue to the origins of Shannon’s genius can be found in the sheer scope of his intellectual interests. He was a peculiar sort of engineer – one known for juggling and riding a unicycle through Bell Labs’ corridors, and whose creations included a flame-throwing trumpet, a calculator called ‘THROBAC’ that operated in Roman numerals (short for ‘Thrifty Roman-Numeral Backward-Looking Computer’), and a mechanical mouse named Theseus that could locate a piece of metallic cheese in a maze. Genetics, artificial intelligence, computer chess, jazz clarinet and amateur poetry numbered among his other pursuits. Some of these predated his work on information theory, while he turned to others later in life. But what remained constant was Shannon’s ability to be as captivated by the capering spectacle of life and language as he was by physics and numbers. Information had begun, after all, as a philosophical term of art, and getting at its foundations entailed just the sort of questions we might expect a linguist or a philosopher to take up.
Shannon’s ‘mathematical theory’ sets out two big ideas. The first is that information is probabilistic. We should begin by grasping that information is a measure of the uncertainty we overcome, Shannon said – which we might also call surprise. What determines this uncertainty is not just the size of the symbol vocabulary, as Nyquist and Hartley thought. It’s also about the odds that any given symbol will be chosen. Take the example of a coin-toss, the simplest thing Shannon could come up with as a ‘source’ of information. A fair coin carries two choices with equal odds; we could say that such a coin, or any ‘device with two stable positions’, stores one binary digit of information. Or, using an abbreviation suggested by one of Shannon’s co-workers, we could say that it stores one bit.
But the crucial step came next. Shannon pointed out that most of our messages are not like fair coins. They are like weighted coins. A biased coin carries less than one bit of information, because the result of any flip is less surprising. Shannon illustrated the point with this graph. You see that the amount of information conveyed by our coin flip (on the y-axis) reaches its apex when the odds are 50-50, represented as 0.5 on the x-axis; but as the outcome grows more predictable in either direction depending on the size of the bias, the information carried by the coin steadily declines.
The messages humans send are more like weighted coins than unweighted coins, because the symbols we use aren’t chosen at random, but depend in probabilistic ways on what preceded them. In images that resemble something other than TV static, dark pixels are more likely to appear next to dark pixels, and light next to light. In written messages that are something other than random strings of text, each letter has a kind of ‘pull’ on the letters that follow it.
This is where language enters the picture as a key conceptual tool. Language is a perfect illustration of this rich interplay between predictability and surprise. We communicate with one another by making ourselves predictable, within certain limits. Put another way, the difference between random nonsense and a recognisable language is the presence of rules that reduce surprise.
Shannon demonstrated this point in the paper by doing an informal experiment in ‘machine-generated text’, playing with probabilities to create something resembling the English language from scratch. He opened a book of random numbers, put his finger on one of the entries, and wrote down the corresponding character from a 27-symbol ‘alphabet’ (26 letters, plus a space):
Each character was chosen randomly and independently, with no letter exerting a ‘pull’ on any other. This is the printed equivalent of static – what Shannon called ‘zero-order approximation’.
But, of course, we don’t choose our letters with equal probability. About 12 per cent of English text is comprised of the letter ‘E’, and just 1 per cent of the letter ‘Q’. Using a chart of letter frequencies that he had relied on in his cryptography days, Shannon recalibrated the odds for every character, so that 12 per cent of entries in the random-number book, for instance, would indicate an ‘E’. Beginning again with these more realistic odds, he arrived at what he called ‘first-order approximation’:
OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA NAH BRL.
We also know that some two-letter combinations, called ‘bigrams’, are much likelier than others: ‘K’ is common after ‘C’, but almost impossible after ‘T’; a ‘Q’ demands a ‘U’. Shannon had tables of these two-letter ‘bigram’ frequencies, but rather than repeat the cumbersome process, he took a cruder tack. To construct a text with reasonable bigram frequencies, ‘one opens a book at random and selects a letter at random on the page. This letter is recorded. The book is then opened to another page and one reads until this [first] letter is encountered [again]. The succeeding letter is then recorded. Turning to another page this second letter is searched for and the succeeding letter is recorded, etc.’
Just as letters exert ‘pull’ on nearby letters, words exert ‘pull’ on nearby words
Shannon didn’t specify the book he used, but any non-technical book in English should offer roughly similar results. If all goes well, the text that results reflects the odds with which one character follows another in English. This is ‘second-order approximation’:
ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE.
Out of nothing, a probabilistic process has blindly created five English words (‘on’, ‘are’, ‘be’, ‘at’, and, at a stretch, ‘Andy’).
‘Third-order approximation’, using the same method to search for trigrams, brings us even closer to passable English:
IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.
You can do the same thing with the words themselves. From the perspective of information theory, words are simply strings of characters that are more likely to occur together. Here is ‘first-order word approximation’, in which Shannon chose whole words based on their frequency in printed English:
REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE.
But just as letters exert ‘pull’ on nearby letters, words exert ‘pull’ on nearby words. Finally, then, Shannon turned to ‘second-order word approximation’, choosing a random word, flipping forward in his book until he found another instance, and then recording the word that appeared next:
THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.
Comparing the beginning of the experiment with its end – ‘XFOML RXKHRJFFJUJ’ versus ‘ATTACK ON AN ENGLISH WRITER’ – sheds some light on the difference between the technical and colloquial significance of ‘information’. We might be tempted to say that ‘ATTACK ON AN ENGLISH WRITER’ is the more informative of the two phrases. But it would be better to call it more meaningful. In fact, it is meaningful to English speakers precisely because each character is less surprising, that is, it carries less (Shannon) information. In ‘XFOML RXKHRJFFJUJ’, on the other hand, each character has been chosen in a way that is unconstrained by frequency rules; it has been chosen from a 27-character set with the fairest possible odds. The choice of each character resembles the flip of a (27-sided) fair coin. It is analogous to the point at the top of Shannon’s parabolic graph, in which information is at a peak because the uncertainty of the outcome is maximised.
What does any of this have to do with the internet, or with any kind of system for transmitting information? That question brings us to Shannon’s second key insight: we can take advantage of messages’ redundancy. Because the symbols in real-world messages are more predictable, many don’t convey new information or surprise. They are exactly where we expect them to be, given our familiarity with ‘words, idioms, clichés and grammar’. Letters can be harmlessly excised from written English, for example: ‘MST PPL HV LTTL DFFCLTY N RDNG THS SNTNC’, as Shannon aptly noted. Words can also be redundant: in Shannon’s favourite boyhood story, Edgar Allan Poe’s ‘The Gold-Bug’ (1843), a treasure hunter takes advantage of repeated strings of characters to crack a pirate’s coded message and uncover a buried hoard.
Shannon expanded this point by turning to a pulpy Raymond Chandler detective story, ‘Pickup on Noon Street’, in a subsequent paper in 1951. Just as the rules of spelling and grammar add redundancies to human languages, so too can the vague but pervasive expectations of context. He flipped to a random passage in Chandler’s story, which he then read out letter by letter to his wife, Betty. Her role was to guess each subsequent letter until she got the letter right, at which point Shannon moved on to the next one. Hard as this was at the beginning of words, and especially at the beginning of sentences, Betty’s job grew progressively easier as context accumulated. For instance, by the time they arrived at ‘A S-M-A-L-L O-B-L-O-N-G R-E-A-D-I-N-G L-A-M-P O-N T-H-E D’, she could guess the next three letters with perfect accuracy: E-S-K.
To write is to write ourselves into a corner: up to 75 per cent of written English text is redundant
Betty’s correct guess tells us three things. First, we can be fairly certain that the letters E-S-K add no new information to the sentence; in this particular context, they are simply a formality. Second, a phrase beginning ‘a small oblong reading lamp on the’ is very likely to be followed by one of two letters: D, or Betty’s first guess, T (presumably for ‘table’). In a zero-redundancy language using our alphabet, Betty would have had only a 1-in-26 chance of guessing correctly; in our language, by contrast, her odds were closer to 1-in-2. Third, the sentence’s predictability goes even further: out of the hundreds of thousands of words in a typical English dictionary, just two candidates were extremely likely to conclude the phrase: ‘desk’ and ‘table’. There was nothing special about that phrase: it was an ordinary phrase from a random page in a random paperback. It showed, though, that to write is almost always to write ourselves into a corner. All in all, Shannon speculated that up to 75 per cent of written English text is redundant.
The predictability of our messages is fat to be cut – and since Shannon, our signals have travelled light. He didn’t invent the idea of redundancy. But he showed that consciously manipulating it is the key to both compressing messages and sending them with perfect accuracy. Compressing messages is simply the act of removing redundancy, leaving in place the minimum number of symbols required to preserve the message’s essence. We do so informally all the time: when we write shorthand, when we assign nicknames, when we invent jargon to compress a mass of meaning (‘the left-hand side of the boat when you’re facing the front’) into a single point (‘port’).
But Shannon paved the way to do this rigorously, by encoding our messages in a series of digital bits, each one represented by a 0 or 1. He showed that the speed with which we send messages depends not just on the kind of communication channel we use, but on the skill with which we encode our messages in bits. Moreover, he pointed the way toward some of those codes: those that take advantage of the probabilistic nature of information to represent the most common characters or symbols with the smallest number of bits. If we could not compress our messages, a single audio file would take hours to download, streaming videos would be impossibly slow, and hours of television would demand a bookshelf of tapes, not a small box of discs. All of this communication – faster, cheaper, more voluminous – rests on Shannon’s realisation of our predictability.
The converse of that process, in turn, is what protects our messages from errors in transmission. Shannon showed that we can overcome noise by rethinking our messages: specifically, by adding redundancy. The key to perfect accuracy – to be precise, an ‘arbitrarily small’ rate of error, as he put it – lies not in how loudly we shout down the line, but in how carefully we say what we say.
Here’s a simple example of how it works (for which we’re indebted to the science historian Erico Marui Guizzo). If we want to send messages in a four-letter alphabet, we might start by trying the laziest possible code, assigning each letter two bits:
A = 00 B = 01 C = 10 D = 11
But noise in our communication system – whether through a burst of static, interference from the atmosphere, or physical damage to the channel – can falsify bits, turning a 0 into a 1. If just one of the bits representing C ‘flipped’, C would vanish somewhere between sender and receiver: it would emerge as B or D, with the receiver none the wiser. It would take just two such flips to turn ‘DAD’ to ‘CAB’.
Ordinary languages, though, happen to be very good at solving this problem. If you read ‘endividual’, you recognise it as a kind of transmission error – a typo – and not an entirely new word. Shannon showed that we can import this error-proofing feature from language into digital codes, strategically adding bits. In the case of the four-letter language, we could use a code like this:
A = 00000 B = 00111 C = 11100 D = 11011
Now any letter could sustain damage to any one bit and still resemble itself more than any other letter. It takes fully three errors to turn one letter into another. Our new code resists noise in a way our first one did not, and we were not forced to pump any more power into our medium of communication. As long as we respect the inherent ‘speed limit’ of the communication channel (a limit in bits per second that Shannon also defined) there is no limit on our accuracy, no degree of noise through which we cannot make ourselves heard. Shannon had not discovered the precise codes for doing so in practice – or the ways to combine codes that compress and codes that were error-proof – but he proved that such codes must exist.
Shannon’s work is the reason these words are appearing on your screen
And so they did. In the long-horizon spirit of Bell Labs, the codes would take decades to develop, and indeed, the questions posed by Shannon remain objects of study for engineers and information theorists to this day. Nevertheless, Shannon had shown that by converting any message – audio, visual, or textual – into digital bits, we could communicate anything of any complexity to anyone at any distance. His work is the reason these words are appearing on your screen.
Though it was marked by intellectual freedom and scientific celebrity, Shannon’s life did end with an acute unfairness. As the digital world he inaugurated began to flourish in the 1990s, Shannon was slipping into Alzheimer’s disease. His awareness of the practical payoff of his work was severely and tragically limited. But as to the ambition and sheer surprise of the work itself, no one put it better than his colleague John Pierce: ‘it came as a bomb’.
The source of that power lay in Shannon’s eagerness to explore information’s deep structure. In his hands, it became much more than a mechanical problem of moving messages from place to place; it began with rethinking what it means to communicate and to inform, in the classical sense. It was an old question – but now we live in an age of information, in large part, because of how Shannon gave that old question new life.