Specified Complexity Made Simple

Historical Backdrop

Specified complexity is the legitimate offspring of the mathematical theory of information. You wouldn’t know that, however, from reading what’s said about it on the internet. Take the opening sentence of the Wikipedia article on specified complexity: “Specified complexity is a creationist argument introduced by William Dembski, used by advocates to promote the pseudoscience of intelligent design.” It’s hard to imagine any other single sentence better crafted to discredit specified complexity. What further need is there to understand or engage this concept if that’s what it is?

In fact, specified complexity is a bona fide information measure. Yes, specified complexity is applicable to intelligent design, but its definition and properties make sense independently of its applications. In this article I want to define what specified complexity is, establish that it belongs squarely to standard information theory, review some intuitively clear applications of it, and show why it is an important concept even apart from its applications to intelligent design.

First off, despite my widely acknowledged association with the term specified complexity, let’s be clear that I didn’t invent it. Critics of intelligent design treat specified complexity as a con man pretending to be a real scientist. Thus they liken it to debunked scientific concepts such as phlogiston or pseudosciences such as phrenology.

But if you think of specified complexity as a job seeker with a resumé, the people put down as references (to vouch for the concept) are actually quite impressive. Indeed, some very respectable scientists were, for a time, happy to associate their names and reputations with the concept. Given its pedigree, there is no way to justify treating specified complexity as a bastard child of real science.

Biologist Leslie Orgel, a colleague of Francis Crick, introduced the term in 1973 in a book on the origin of life. Crick himself had toyed with a less developed version of the concept as far back as 1958—see his paper “On Protein Synthesis” where he writes, “By information I mean the specification of the amino acid sequence of the protein.” In the years immediately following specified complexity’s coinage by Orgel, the underlying idea was widely accepted even if incompletely understood.

With the term specified complexity, Orgel was trying to understand three distinct types of order:

Repetitive Order: BOATBOATBOATBOATBOATBOATBOATBOATBOATBOAT. Example in nature: A salt crystal.

Random Order: SBIPDYAQBUKHQFLYRTXHBIWGJNSCPVMZDLEGKYAC. Example in nature: Mixture of random polymers.

Complex Specified Order: THISSEQUENCEOFLETTERSISACARRIEROFMEANING. Example in nature: A DNA sequence coding for a protein.

In these examples, each sequence comprises 40 capital Roman letters. In the first, the word BOAT is repeated 10 times. Because BOAT is a known word, it may be regarded as specified in our language. Yet because the word BOAT is short and keeps being repeated, the entire sequence may also be regarded as simple. This, then, is an example of specified simplicity, which is therefore distinct from specified complexity.

Note that we might have used the more random looking QOZK in place of BOAT and then generated the 40-letter sequence QOZKQOZKQOZKQOZKQOZKQOZKQOZKQOZKQOZKQOZK. In that case, QOZK might be regarded as unspecified, and yet because of the repetition, the entire sequence would still be regarded as simple. Accordingly, this sequence would be an example of unspecified simplicity, which is therefore also distinct from specified complexity.

The example of random order, on the other hand, is neither specified nor simple. It is unspecified because there is no straightforward way to describe that sequence other than simply listing it in its entirety. Its generation, for instance, cannot be understood in terms of any easy formula. Compared to a four-letter word like BOAT, it is also relatively long, and therefore complex. It is an example of unspecified complexity, which is therefore distinct from specified complexity.

And finally, there’s the last example in which a sequence of 40 capital Roman letters spells a meaningful English sentence. It is specified in virtue of its use of English words arranged in grammatically correct syntax to make a meaningful statement. And yet there is no simple way to generate or account for it. It is an example of specified complexity.

I’m playing fast and loose with the terms specified and complexity in these examples, having yet to carefully define them. But these examples make clear the basic intuition underlying the concept. Moreover, these are exactly the sorts of examples Orgel had in mind when he introduced the concept. And indeed, for about thirty years after Orgel introduced the concept, specified complexity garnered wide respect in the scientific community.

As an example of the respect shown to specified complexity in the first three decades after Orgel introduced the term, consider well-known physicist and science writer Paul Davies, who in his 1999 book on the origin of life (The Fifth Miracle) wrote: “Living organisms are mysterious not for their complexity per se, but for their tightly specified complexity.” Unwilling as they might be to attribute specified complexity to intelligence, scientists during that period at least understood that the emergence of specified complexity posed a challenge that needed to be explained.

So what happened to change the fortunes of specified complexity in the mainstream scientific community? The intelligent design movement happened. Design theorists Charles Thaxton, Walter Bradley, and Roger Olsen got the ball rolling in the 1980s in their book The Mystery of Life’s Origin, noting that specified complexity raised a significant problem for the origin of life. They then went further by suggesting that intelligence should be regarded as a viable explanation in accounting for it.

Separately, in the first edition of my book The Design Inference (Cambridge, 1998), I argued that specification and small probability combined to produce a reliable criterion for inferring intelligent agency. Although the term specified complexity did not occur in that book, it was there implicitly. The “specified” of “specified complexity” was there in the form of specification. And the small probability corresponded precisely to increased complexity as an information-theoretic notion. More on this shortly.

But the gamechanger in revising specified complexity’s scientific fortunes occurred in 2002. That year I published the sequel to The Design Inference. Reviewed in Nature and titled No Free Lunch, it was subtitled Why Specified Complexity Cannot Be Purchased Without Intelligence. All cards were now on the table. Scientists who rejected intelligent design and yet had previously seen merit in specified complexity now withdrew their support of the concept.

Before proceeding, I want to underscore that in formalizing specified complexity as a precise information-theoretic concept, I attempted to preserve Orgel’s original intent. Critics of intelligent design, such as mathematician Jason Rosenhouse, suggest that I misappropriated Orgel, but I didn’t. Orgel appealed to information theory in formulating specified complexity, but in doing so he made some mistakes. At the end of this article, I’ll compare Orgel’s pre-theoretic version of specified complexity with the theoretic version described here. Readers can then see clearly what he was trying to do and how specified complexity, as developed here, improves on his work.

Intuitive Specified Complexity

Even though this article is titled “Specified Complexity Made Simple,” there’s a limit to how much the concept of specified complexity may be simplified before it can no longer be adequately defined or explained. Accordingly, specified complexity, even when made simple, will still require the introduction of some basic mathematics, such as exponents and logarithms, as well as an informal discussion of information theory, especially Shannon and Kolmogorov information. I’ll get to that in the subsequent sections.

At this early stage in the discussion, however, it seems wise to lay out specified complexity in a convenient non-technical way. That way readers lacking mathematical and technical facility will still be able to grasp the gist of specified complexity. In this section, I’ll present an intuitively accessible account of specified complexity. Just as all English speakers are familiar with the concept of prose even if they’ve never thought about how it differs from poetry, so too we are all familiar with specified complexity even if we haven’t carefully defined it or provided a precise formal mathematical account of it.

In this section I’ll present a user-friendly account of specified complexity by means of intuitively compelling examples. Even though non-technical readers may be inclined to skip the rest of this article, I would nonetheless encourage all readers to dip into the subsequent sections, if only to persuade themselves that specified complexity has a sound rigorous basis to back up its underlying intuition.

To get the ball rolling, let’s consider an example by internet personality David Farina, known popularly as “Prof. Dave.” In arguing against the use of small probability arguments to challenge Darwinian evolutionary theory, Farina offers the following example:

Let’s say 10 people are having a get-together, and they are curious as to what everyone’s birthday is. They go down the line. One person says June 13th, another says November 21st, and so forth. Each of them have a 1 in 365 chance of having that particular birthday. So, what is the probability that those 10 people in that room would have those 10 birthdays? Well, it’s 1 in 365 to the 10th power, or 1 in 4.2 times 10 to the 25, which is 42 trillion trillion. The odds are unthinkable, and yet there they are sitting in that room. So how can this be? Well, everyone has to have a birthday.

Farina’s use of the term “unthinkable” brings to mind Vizzini in The Princess Bride. Vizzini keeps uttering the word “inconceivable” in reaction to a man in black (Westley) steadily gaining ground on him and his henchmen. Finally, his fellow henchman Inigo Montoya remarks, “You keep using that word — I do not think it means what you think it means.”

Similarly, in contrast to Farina, an improbability of 1 in 42 trillion trillion is in fact quite thinkable. Right now you can do even better than this level of improbability. Get out a fair coin and toss it 100 times. That’ll take you a few minutes. You’ll witness an event unique in the history of coin tossing and one having a probability of 1 in 10 to the 30, or 1 in a million trillion trillion.

The reason Farina’s improbability is quite thinkable is that the event to which it is tied is unspecified. As he puts it, “One person says June 13th, another says November 21st, and so forth.” The “and so forth” here is a giveaway that the event is unspecified.

But now consider a variant of Farina’s example: Imagine that each of his ten people confirmed that their birthday was January 1. The probability would in this case again be 1 in 42 trillion trillion. But what’s different now is that the event is specified. How is it specified? It is specified in virtue of having a very short description, namely, “everyone here was born New Year’s Day.”

The complexity in specified complexity refers to probability: the greater the complexity, the smaller the probability. There is a precise information-theoretic basis for this connection between probability and complexity that we’ll examine in the next section. Accordingly, because the joint probability of any ten birthdays is quite low, their complexity will be quite high. Nothing surprising here.

For things to get interesting with birthdays, complexity needs to be combined with specification. A specification is a salient pattern that we should not expect a highly complex event to match simply by chance. Clearly, a large group of people that all share the same birthday did not come together by chance. But what exactly is it that makes a pattern salient so that, in the presence of complexity, it becomes an instance of specified complexity and thereby defeats chance?

That’s the whole point of specified complexity. Sheer complexity, as Farina’s example shows, cannot defeat chance. So too, the absence of complexity cannot defeat chance. For instance, if we learn that a single individual has a birthday on January 1, we wouldn’t regard anything as amiss or afoul. That event is simple, not complex, in the sense of probability. Leaving aside leap years and seasonal effects on birth rates, 1 out 365 people will on average have a birthday on January 1. With a worldwide population of 8 billion people, many people will have that birthday.

But a group of exactly 10 people all in the same room all having a birthday of January 1 is a different matter. We would not ascribe such a coincidence to chance. But why? Because the event is not just complex but also specified. And what makes a complex event also specified—or conforming to a specification—is that it has a short description. In fact, we define specifications as patterns with short descriptions.

Such a definition may seem counterintuitive, but it actually makes good sense of how we eliminate chance in practice. The fact is, any event (and by extension any object or structure produced by an event) is describable if we allow ourselves a long enough description. Any event, however improbable, can therefore be described. But most improbable events can’t be described simply. Improbable events with simple descriptions draw our attention and prod us to look for explanations other than chance.

Take Mount Rushmore. It could be described in detail as follows: for each cubic micrometer in a large cube that encloses the entire monument, register whether it contains rock or is empty of rock (treating partially filled cubic micrometers, let us stipulate, as empty). Mount Rushmore can be enclosed in a cube of under 50,000 cubic meters. Moreover, each cubic meter contains a million trillion cubic micrometers. Accordingly, 50 billion trillion filled-or-empty cells could describe Mount Rushmore in detail. Thinking of each filled-or-empty cell as a bit then yields 50 billion trillion bits of information. That’s more information than contained in the entire world wide web (there are currently 2 billion websites globally).

But of course, nobody attempts to describe Mount Rushmore that way. Instead, we describe it succinctly as “a giant rock formation that depicts the US presidents George Washington, Thomas Jefferson, Abraham Lincoln, and Teddy Roosevelt.” That’s a short description. At the same time, any rock formation the size of Mount Rushmore will be highly improbable or complex. Mount Rushmore is therefore both complex and specified. That’s why, even if we knew nothing about the history of Mount Rushmore’s construction, we would refuse to attribute it to the forces of chance (such as wind and erosion) and instead attribute it to design.

Consider a few more examples in this vein. Take the game of poker. There are 2,598,960 distinct possible poker hands, and so the probability of any poker hand is 1/2,598,960. Consider now two short descriptions, namely, “royal flush” and “single pair.” These descriptions have roughly the same description length. Yet there are only 4 ways of getting a royal flush and 1,098,240 ways of getting a single pair. This means the probability of getting a royal flush is 4/2,598,960 = .00000154 but the probability of getting a single pair is 1,098,240/2,598,960 = .423. A royal flush is therefore much more improbable than a single pair.

Suppose now that you are playing a game of poker and you come across these two hands, namely, a royal flush and a single pair. Which are you more apt to attribute to chance? Which are you more apt to attribute to cheating, and therefore to design? Clearly, a single pair would, by itself, not cause you to question chance. It is specified in virtue of its short description. But because it is highly probable, and therefore not complex, it would not count as an instance of specified complexity.

Witnessing a royal flush, however, would elicit suspicion, if not an outright accusation of cheating (and therefore of design). Of course, given the sheer amount of poker played throughout the world, royal flushes will now and then appear by chance. But what raises suspicion that a given instance of a royal flush may not be the result of chance is its short description (a property it shares with “single pair”) combined with its complexity/improbability (a property it does not share with “single pair”).

Let’s consider one further example, which seems to have become a favorite among readers of the recently released second edition of The Design Inference. In the chapter on specification, my co-author Winston Ewert and I consider a famous scene in the film The Empire Strikes Back, which we then contrast with a similar scene from another film that parodies it. Quoting from the chapter:

Darth Vader tells Luke Skywalker, “No, I am your father,” revealing himself to be Luke’s father. This is a short description of their relationship, and the relationship is surprising, at least in part because the relationship can be so briefly described. In contrast, consider the following line uttered by Dark Helmet to Lone Starr in Spaceballs, the Mel Brooks parody of Star Wars: “I am your father’s brother’s nephew’s cousin’s former room­mate.” The point of the joke is that the relationship is so compli­cated and contrived, and requires such a long description, that it evokes no suspicion and calls for no special explanation. With everybody on the planet connected by no more than “six degrees of separation,” some long description like this is bound to identify anyone

In a universe of countless people, Darth Vader meeting Luke Skywalker is highly improbable or complex. Moreover, their relation of father to son, by being briefly described, is also specified. Their meeting therefore exhibits specified complexity and cannot be ascribed to chance. Dark Helmet meeting Lone Starr may likewise be highly improbable or complex. But given the convoluted description of their past relationship, their meeting represents an instance of unspecified complexity. If their meeting is due to design, it is for reasons other than their past relationship.

Before we move to a more formal treatment of specified complexity, we are well to ask how short is short enough for a description to count as a specification. How short should a description be so that combined with complexity it produces specified complexity? As it is, in the formal treatment of specified complexity, complexity and description length are both converted to bits, and then specified complexity can be defined as the difference of bits (the bits denoting complexity minus the bits denoting specification).

When specified complexity is applied informally, however, we may calculate a probability (or associated complexity) but we usually don’t calculate a description length. Rather, as with the Star Wars/Spaceballs example, we make an intuitive judgment that one description is short and natural, the other long and contrived. Such intuitive judgments have, as we will see, a formal underpinning, but in practice we let ourselves be guided by intuitive specified complexity, treating it as a convincing way to distinguish merely improbable events from those that require further scrutiny.

Shannon and Kolmogorov Information

The first edition of The Design Inference as well as its sequel, No Free Lunch, set the stage for defining a precise information-theoretic measure of specified complexity. There was, however, still more work to be done to clarify the concept. In both these books, specified complexity was treated as a combination of improbability or complexity on the one hand and specification on the other.

As presented back then, it was an oil-and-vinegar combination, with complexity and specification treated as two different types of things exhibiting no clear commonality. Neither book therefore formulated specified complexity as a unified information measure. Still, the key ideas for such a measure were in those earlier books. In this section, I review those key information-theoretic ideas. In the next section, I’ll join them into a unified whole.

Let’s start with complexity. As noted earlier, there’s a deep connection between probability and complexity. This connection is made clear in Shannon’s theory of information. In this theory, probabilities are converted to bits. To see how this works, consider the tossing a coin 100 times, which yields an event of probability 1 in 2^100 (the caret symbol here denotes exponentiation). But that number also corresponds to 100 bits of information since it takes 100 bits to characterize any sequence of 100 coin tosses (think of 1 standing for heads and 0 for tails).

In general, any probability p corresponds to –log(p) bits of information, where the logarithm here and elsewhere in this article is to the base 2 (as needed to convert probabilities to bits). Think of a logarithm as an exponent: it’s the exponent to which you need to raise the base (here always 2) in order to get the number to which the logarithmic function is applied. Thus, for instance, a probability of p = 1/10 corresponds to an information measure of –log(1/10) 3.322 bits (or equivalently, 2^(–3.322) 1/10). Such fractional bits allow for a precise correspondence between probability and information measures.

The complexity in specified complexity is therefore Shannon information. Claude Shannon (1916–2001) introduced this idea of information in the 1940s to understand signal transmissions (mainly of bits, but also for other character sequences) across communication channels. The longer the sequence of bits transmitted, the greater the information and therefore its complexity.

Because of noise along any communication channel, the greater the complexity of a signal, the greater the chance of its distortion and thus the greater the need for suitable coding and error correction in transmitting the signal. So the complexity of the bit string being transmitted became an important idea within Shannon’s theory.

Shannon’s information measure is readily extended to any event E with a probability P(E). We then define the Shannon information of E as –log(P(E)) = I(E). Note that the minus sign is there to ensure that as the probability of E goes down, the information associated with E goes up. This is as it should be. Information is invariably associated with the narrowing of possibilities. The more those possibilities are narrowed, the more the probabilities associated with those probabilities decrease, but correspondingly the more the information associated with those narrowing possibilities increases.

For instance, consider a sequence of ten tosses of a fair coin and consider two events, E and F. Let E denote the event where the first five of these ten tosses all land heads but where we don’t know the remaining tosses. Let F denote the event where all ten tosses land heads. Clearly, F narrows down the range of possibilities for these ten tosses more than E does. Because E is only based on the first five tosses, its probability is P(E) = 2^(–5) = 1/(2^5) = 1/32. On the other hand, because F is based on all ten tosses, its probability is P(F) = 2^(–10) = 1/(2^10) = 1/1,024. In this case, the Shannon information associated with E and F is respectively I(E) = 5 bits and I(F) = 10 bits.

Shannon information, however, is not enough to understand specified complexity. For that, we also need Kolmogorov information, or what is also called Kolmogorov complexity. Andrei Kolmogorov (1903–1987) was the greatest probabilist of the 20th century. In the 1960s he tried to make sense of what it means for a sequence of numbers to be random. To keep things simple, and without loss of generality, we’ll focus on sequences of bits (since any numbers or characters can be represented by combinations of bits). Note that we made the same simplifying assumption for Shannon information.

The problem Kolmogorov faced was that any sequence of bits treated as the result of tossing a fair coin was equally probable. For instance, any sequence of 100 coin tosses would have probability 1/(2^100), or 100 bits of Shannon information. And yet there seemed to Kolmogorov a vast difference between the following two sequences of 100 coin tosses (letting 0 denote tails and 1 denote heads):




The first just repeats the same coin toss 100 times. It appears anything but random. The second, on the other hand, exhibits no salient pattern and so appears random (I got it just now from an online random bit generator). But what do we mean by random here? Is it that the one sequence is the sort we should expect to see from coin tossing but the other isn’t? But in that case, probabilities tell us nothing about how to distinguish the two sequences because they both have the same small probability of occurring.

Kolmogorov’s brilliant stroke was to understand the randomness of these sequences not probabilistically but computationally. Interestingly, the ideas animating Kolmogorov were in the air at that time in the mid 1960s. Thus, both Ray Solomonoff and Gregory Chaitin (then only a teenager) also came up with the same idea. Perhaps unfairly, Kolmogorov gets the lion’s share of the credit for characterizing randomness computationally. Most information-theory books (see, for instance, Cover and Thomas’s The Elements of Information Theory), in discussing this approach to randomness, will therefore focus on Kolmogorov and put it under what is called Algorithmic Information Theory (AIT).

Briefly, Kolmogorov’s approach to randomness is to say that a sequence of bits is random to the degree that it has no short computer program that generates it. Thus, with the first sequence above, it is non-random since it has a very short program that generates it, such as a program that simply says “repeat ‘0’ 100 times.” On the other hand, there is no short program (so far as we can tell) that generates the second sequence.

It is a combinatorial fact (i.e., a fact about the mathematics of counting or enumerating possibilities) that the vast majority of bit sequences cannot be characterized by any program shorter than the sequence itself. Obviously, any sequence can be characterized by a program that simply incorporates the entire sequences and then simply regurgitates it. But such a program fails to compress the sequence. The non-random sequences, by having programs shorter than the sequences themselves, are thus those that are compressible. The first of the sequences above is compressible. The second, for all we know, isn’t.

Kolmogorov’s information (also known as Kolmogorov complexity) is a computational theory because it focuses on identifying the shortest program that generates a given bit-string. Yet there is an irony here: it is rarely possible to say with certainly that a given bit string is truly random in the sense of having no compressible program. From combinatorics, with its mathematical counting principles, we know that the vast majority of bit sequences must be random in Kolmogorov’s sense. That’s because the number of short programs is very limited and can only generate very few longer sequences. Most longer sequences will require longer programs.

But if for an arbitrary bit sequence D we define K(D) as the length of the shortest program that generates D, it turns out that there is no computer program that calculates K(D). Simply put, the function K is non-computable. This fact from theoretical computer science matches up with our common experience that something may seem random for a time, and yet we can never be sure that it is random because we might discover a pattern clearly showing that the thing in fact isn’t random (think of an illusion that looks like a “random” inkblot only to reveal a human face on closer inspection).

Yet even though K is non-computable, in practice it is a useful measure, especially for understanding non-randomness. Because of its non-computability, K doesn’t help us to identify particular non-compressible sequences, these being the random sequences. Even with K as a well-defined mathematical function, we can’t in most cases determine precise values for it. Nevertheless, K does help us with the compressible sequences, in which case we may be able to estimate it even if we can’t exactly calculate it.

What typically happens in such cases is that we find a salient pattern in a sequence, which then enables us to show that it is compressible. To that end, we need a measure of the length of bit sequences as such. Thus, for any bit sequence D, we define |D| as its length (total number of bits). Because any sequence can be defined in terms of itself, |D| forms an upper bound on Kolmogorov complexity. Suppose now that through insight or ingenuity, we find a program that substantially compresses D. The length of that program, call it n, will then be considerably less than |D| — in other words, n < |D|.

Although this program length n will be much shorter than D, it’s typically not possible to show that this program of length n is the very shortest program that generates D. But that’s okay. Given such a program of length n, we know that K(D) cannot be greater than n because K(D) measures the very shortest such program. Thus, by finding some short program of length n, we’ll know that K(D) n < |D|. In practice, it’s enough to come up with a short program of length n that’s substantially less than |D|. The number n will then form an upper bound for K(D). In practice, we use n as an estimate for K(D). Such an estimate, as we’ll see, ends up in applications being a conservative estimate of Kolmogorov complexity.

Specified Complexity as a Unified Information Measure

With the publication of the first edition of The Design Inference and its sequel No Free Lunch, elucidating the connection between design inferences and information theory became increasingly urgent. That there was a connection was clear. The first edition of The Design Inference sketched, in the epilogue, how the relation between specifications and small probability (complex) events mirrored the transmission of messages along a communication channel from sender to receiver. Moreover, in No Free Lunch, both Shannon and Kolmogorov information were explicitly cited in connection with specified complexity.

But even though specified complexity as characterized back then employed informational ideas, it did not constitute a clearly defined information measure. Specified complexity seemed like a kludge of ideas from logic, statistics, and information. Jay Richards, guest-editing a special issue of Philosophia Christi, asked me to clarify the connection between specified complexity and information theory. In response, I wrote an article titled “Specification: The Pattern That Signifies Intelligence,” which appeared in that journal in 2005.

In that article, I defined specified complexity as a single measure that combined under one roof all the key elements of the design inference, notably, small probability, specification, probabilistic resources, and universal probability bounds. Essentially, in the measure I articulated there, I attempted to encapsulate the entire design inferential methodology within a single mathematical expression.

In retrospect, all the key pieces for what is now the fully developed informational account of specified complexity were there in that article. But my treatment of specified complexity in that article left substantial room for improvement. I used a counting measure to enumerate all the descriptions of a given length or shorter. I then placed this measure under a negative logarithm. This gave the equivalent of Kolmogorov information, suitably generalized to minimal description length. But because my approach was so focused on encapsulating the design-inferential methodology, the roles of Shannon and Kolmogorov information in its definition of specified complexity were muddied.

My 2005 specified complexity paper fell stillborn from the press, and justly so given its lack of clarity. Eight years later, Winston Ewert, working with me and Robert Marks at the Evolutionary Informatics Lab, independently formulated specified complexity as a unified measure. It was essentially the same measure as in my 2005 article, but Ewert clearly articulated the place of both Shannon and Kolmogorov information in the definition of specified complexity. Ewert, along with Marks and me as co-authors, published this work under the title “Algorithmic Specified Complexity,” and then published subsequent applications of this work (see the Evolutionary Informatics Lab publications page).

With Ewert’s lead, specified complexity, as an information measure, became the difference between Shannon information and Kolmogorov information. In symbols, the specified complexity SC for an event E was thus defined as SC(E) = I(E) – K(E). The term I(E) in this equation is just, as we saw in the last section, Shannon information, namely, I(E) = –log(P(E)), where P(E) is the probability of E with respect to some underlying relevant chance hypothesis. The term K(E) in this equation, in line with the last section, is a slight generalization of Kolmogorov information, in which for an event E, K(E) assigns the length, in bits, of the shortest description that precisely identifies E. Underlying this generalization of Kolmogorov information is a binary, prefix-free, Turing complete language that maps descriptions from the language to the events they identify.

There’s a lot packed into this last paragraph, so explicating it all is not going to be helpful in an article titled “Specified Complexity Made Simple.” For the details, see chapter 6 of the second edition of The Design Inference. Still, it’s worth highlighting a few key points to show that SC, so defined, makes good sense as a unified information measure and is not merely a kludge of Shannon and Kolmogorov information.

What brings Shannon and Kolmogorov information together as a coherent whole in this definition of specified complexity is event-description duality. Events (and the objects and structures they produce) occur in the world. Descriptions of events occur in language. Thus, corresponding to an event E are descriptions D that identify E. For instance, the event of getting a royal flush in the suit of hearts corresponds to the description “royal flush in the suit of hearts.” Such descriptions are, of course, never unique. The same event can always be described in multiple ways. Thus, this event could also be described as “a five-card poker hand with an ace of hearts, a king of hearts, a queen of hearts, a jack of hearts, and a ten of hearts.” Yet this description is quite a bit longer than the other.

Given event-description duality, it follows that: (1) an event E with a probability P(E) has Shannon information I(E), measured in bits; moreover, (2) given a binary language (one expressed in bits—and all languages can be expressed in bits), for any description D that identifies E, the number of bits making up D, which in the last section we defined as |D|, will be no less than the Kolmogorov information of E (which measures in bits the shortest description that identifies E). Thus, because K(E) |D|, it follows that SC(E) = I(E) – K(E) I(E) – |D|.

The most important take away here is that specified complexity makes Shannon information and Kolmogorov information commensurable. In particular, specified complexity takes the bits associated with an event’s probability and subtracts from it the bits associated with their minimum description length. Moreover, in estimating K(E), we then use I(E) – |D| to form a lower bound for specified complexity. It follows that specified complexity comes in degrees and could take on negative values. In practice, however, we’ll say an event exhibits specified complexity if it is positive and large (with what it means to be large depending on the relevant probabilistic resources).

There’s a final fact that makes specified complexity a natural information measure and not just an arbitrary combination of Shannon and Kolmogorov information, and that’s the Kraft inequality. To apply the Kraft inequality of specified complexity here depends on the language that maps descriptions to events being prefix-free. Prefix-free languages help to ensure disambiguation, so that one description is not the start of another description. This is not an onerous condition, and even though it does not hold for natural languages, transforming natural languages into prefix-free languages leads to negligible increases in description length (see chapter 6 of the second edition of The Design Inference).

What the Kraft inequality does for the specified complexity of an event E is guarantee that all events having the same or greater specified complexity, when considered jointly as one grand union, nonetheless have probability less than or equal to 2 raised to the negative power of the specified complexity. In other words, the probability of the union of all events F with specified complexity no less than that of E (i.e., SC(F) SC(E)), will have probability less than or equal to 2^(–SC(E)). This result, so stated, may not seem to belong in an article attempting to make specified complexity simple. But it is a big mathematical result, and it connects specified complexity to a probability bound that’s crucial for drawing design inferences. To illustrate how this all works, let’s turn next to an example of cars driving along a road.

A Tale of Ten Malibus

The example considered here to illustrate what a given value of specified complexity means is adapted from section 3.6 of the second edition of The Design Inference, from which I quote extensively. Suppose you witness ten brand new Chevy Malibus drive past you on a public road in immediate, uninter­rupted succession. The question that crosses your mind is this: Did this succession of ten brand new Chevy Malibus happen by chance?

Your first reaction might be to think that this event is a publicity stunt by a local Chevy dealership. In that case, the succession would be due to design rather than to chance. But you don’t
want to jump to that conclusion too quickly. Perhaps it is just a lucky coincidence. But if so, how would you know? Perhaps the coincidence is so improbable that no one should expect to observe it as happening by chance. In that case, it’s not just unlikely that you would observe this coincidence by chance; it’s unlikely that anyone would. How, then, do you determine whether this succession of identical cars could reasonably have resulted by chance?

Obviously, you will need to know how many opportunities exist to observe this event. It’s estimated that in 2019 there were 1.4 billion motor vehicles on the road worldwide. That would include trucks, but to keep things simple let’s assume all of them are cars. Although these cars will appear on many different types of roads, some with traffic so sparse that ten cars in immediate succession would almost never happen, to say nothing of ten cars having the same late make and model, let’s give chance every opportunity to succeed by assuming that all these cars are arranged in one giant succession of 1.4 billion cars arranged bumper to bumper.

But it’s not enough to look at one static arrangement of all these 1.4 billion cars. Cars are in motion and continually rearranging themselves. Let’s therefore assume that the cars completely reshuffle themselves every minute, and that we might have the opportunity to see the succession of ten Malibus at any time across a hundred years. In that case, there would be no more than 74 quadrillion opportunities for ten brand new Chevy Malibus to line up in immediate, uninterrupted succession.

So, how improbable is this event given these 1.4 billion cars and their repeated reshuffling? To answer this question requires knowing how many makes and models of cars are on the road and their relative proportions (let’s leave aside how different makes are distributed geographically, which is also relevant, but introduces needless complications for the purpose of this illustration). If, per impossibile, all cars in the world were brand new Chevy Malibus, there would be no coincidence to explain. In that case, all 1.4 billion cars would be identical, and getting ten of them in a row would be an event of probability 1 regardless of reshuffling.

But clearly, nothing like that is the case. Go to Cars.com, and using its car-locater widget you’ll find 30 popular makes and over 60 “other” makes of vehicles. Under the make of Chevrolet, there are over 80 models (not counting variations of models—there are five such variations under the model Malibu). Such numbers help to assess whether the event in question happened by chance. Clearly, the event is specified in that it answers to the short description “ten new Chevy Malibus in a row.” For the sake of argument, let’s assume that achieving that event by chance is going to be highly improbable given all the other cars on the road and given any reasonable assumptions about their chance distribution.

But there’s more work to do in this example to eliminate chance. No doubt, it would be remarkable to see ten new Chevy Malibus drive past you in immediate, uninterrupted succession. But what if you saw ten new red Chevy Malibus in a row drive past you? That would be even more striking now that they all also have the same color. Or what about simply ten new Chevies in a row? That would be less striking. But note how the description lengths covary with the probabilities: “ten new red Chevy Malibus in a row” has a longer description length than “ten new Chevy Malibus in a row,” but it corresponds to an event of smaller probability than the latter. Conversely, “ten new Chevies in a row” has shorter description length than “ten new Chevy Malibus in a row,” but it corresponds to an event of larger probability than the latter.

What we find in examples like this is a tradeoff between description length and probability of the event described (a tradeoff that specified complexity models). In a chance elimination argument, we want to see short description length combined with small probability (implying a larger value of specified complexity). But typically these play off against each other. “Ten new red Chevy Malibus in a row” corresponds to an event of smaller probability than “ten new Chevy Malibus in a row,” but its description length is slightly longer. Which event seems less readily ascribable to chance (or, we might say, more worthy of a design inference)? A quick intuitive assess­ment suggests that the probability decrease outweighs the increase in description length, and so we’d be more inclined to eliminate chance if we saw ten new red Chevy Malibus in a row as opposed to ten of any color.

The lesson here is that probability and description length are in tension, so that as one goes up the other tends to go down, and that to eliminate chance both must be suitably low. We see this tension by contrasting “ten new Chevy Malibus in a row” with “ten new Chevies in a row,” and even more clearly with simply “ten Chevies in a row.” The latter has a shorter description length (lower description length) but also much higher probability. Intuitively, it is less worthy of a design inference because the increase in probability so outweighs the decrease in description length. Indeed, ten Chevies of any make and model in a row by chance doesn’t seem farfetched given the sheer number of Chevies on the road, certainly in the United States.

But there’s more: Why focus simply on Chevy Malibus? What if the make and model varied, so that the cars in succession were Honda Accords or Porsche Carreras or whatever? And what if the number of cars in succession varied, so it wasn’t just 10 but also 9 or 20 or whatever? Such questions underscore the different ways of specifying a succession of identical cars. Any such succession would have been salient if you witnessed it. Any such succession would constitute a specification if the description length were short enough. And any such succession could figure into a chance elimination argument if both the description length and the probability were low enough. A full-fledged chance-elimination argument in such circumstances would then factor in all relevant low-probability, low-description-length events, balancing them so that where one is more, the other is less.  

All of this can, as we by now realize, be recast in information-theoretic terms. Thus, a probability decrease corresponds to a Shannon information increase, and a description length increase corresponds to a Kolmogorov information increase. Specified complexity, as their difference, now has the following property (we assume, as turns out to be reasonable, that some fine-points from theoretical computer science, such as the Kraft inequality, are approximately applicable ): if the specified complexity of an event is greater than or equal to n bits, then the grand event consisting of all events with at least that level of specified complexity, has probability less than or equal to 2^(–n). This is a powerful result and it provides a conceptually clean way to use specified complexity to eliminate chance and infer design.

Essentially, what specified complexity does is consider an archer with a number of arrows in his quiver and a number of targets of varying size on a wall, and asks what is the probability that any one of these arrows will by chance land on one of these targets. The arrows in the quiver correspond to complexity, the targets to specifications. Raising the number 2 to the negative of specified complexity as an exponent then becomes the grand probability that any of these arrows will hit any of these targets by chance.


Formally, the specified complexity of an event is the difference between its Shannon information and its Kolmogorov information. Informally, the specified complexity of an event is a combination of two properties, namely, that the event has small probability and that it has a description of short length. In the formal approach to specified complexity, we speak of algorithmic specified complexity. In the informal approach, we speak of intuitive specified complexity. But typically it will be clear from context which sense of the term “specified complexity” is intended.

In this article, we’ve defined and motivated algorithmic specified complexity. But we have not provided actual calculations of it. For calculations of algorithmic specified complexity as applied to real-world examples, I refer readers to sections 6.8 and 7.6 in the second edition of The Design Inference. Section 6.8 looks at general examples whereas section 7.6 looks at biological examples. In each of these sections, my co-author Winston Ewert and I examine examples where specified complexity is low, not leading to a design inference, and also where it is high, leading to a design inference.

For instance, in section 6.8 we take the so-called “Mars face,” a naturally occurring structure on Mars that looks like a face, and contrast it with the faces on Mount Rushmore. We argue that the specified complexity of the Mars face is too small to justify a design inference but that the specified complexity of the faces on Mount Rushmore is indeed large enough to justify a design inference.

Similarly, in section 7.6, we take the binding of proteins to ATP, as in the work of Anthony Keefe and Jack Szostak, and contrast it with the formation of protein folds in beta-lactamase, as in the work of Douglas Axe. We argue that the specified complexity of random ATP binding is close to 0. In fact, we calculate a negative value of the specified complexity, namely, –4. On the other hand, for the evolvability of a beta-lactamase fold, we calculate a specified complexity of 215, which corresponds to a probability of 2^(–215), or roughly a probability of 1 in 10^65.

With all these numbers, we estimate a Shannon information and a Kolmogorov information and then calculate a difference. The validity of these estimates and the degree to which they can be refined can be disputed. But the underlying formalism of specified complexity is rock solid. The details of that formalism and its applications go beyond an article titled “Specified Complexity Made Simple.” Those details can all be found in the second edition of The Design Inference.

Appendix: Orgelian Specified Complexity

Leslie Orgel, as noted at the start of this article, introduced the term “specified complexity” in his 1973 book The Origins of Life. Although specified complexity as developed by Winston Ewert, Robert Marks, and me and summarized in this article attempts to get at the same informational reality that Orgel was trying to grasp, our formulations differ in important ways.

For a fuller understanding of specified complexity, it will therefore help to review what Orgel originally had in mind and to see where our formulation of the concept improves on his. Strictly speaking, this section is of historical interest. It is therefore cast as an appendix. Because The Origins of Life is out of print and hard to get, I quote from it extensively, offering exegetical commentary. I focus on the three pages of his book where Orgel introduces and then discusses specified complexity (pages 189–191).

Orgel introduces the term “specified complexity” in a section titled “Terrestrial Biology.” Elsewhere in his book, Orgel also considers non-terrestrial biology, which is why the title of his book refers to the origins (plural) of life—radically different forms of life might arise in different parts of the universe. To set the stage for introducing specified complexity, Orgel discusses the various commonly cited defining features of life, such as reproduction or metabolism. Thinking these don’t get at the essence of life, he introduces the term that is the focus of this article:

It is possible to make a more fundamental distinction between living and nonliving things by examining their molecular structure and molecular behavior. In brief, living organisms are distinguished by their specified complexity. Crystals are usually taken as the prototypes of simple, well-specified structures because they consist of a very large number of identical molecules packed together in a uniform way. Lumps of granite or random mixtures of polymers are examples of structures which are complex but not specified. The crystals fail to qualify as living because they lack complexity; the mixtures of polymers fail to qualify because they lack specificity. (p. 189)

So far, so good. Everything Orgel writes here makes good intuitive sense. It matches up with the three types of order discussed at the start of this article: repetitive order, random order, complex specified order. Wanting to put specified complexity on a firmer theoretical basis, Orgel next connects it to information theory:

These vague ideas can be made more precise by introducing the idea of information. Roughly speaking, the information content of a structure is the minimum number of instructions needed to specify the structure. One can see intuitively that many instructions are needed to specify a complex
structure. On the other hand, a simple repeating structure can be specified in rather few instructions. Complex but random structures, by definition. need hardly be specified at all. (p. 190)

Orgel’s elaboration here of specified complexity calls for further clarification. His use of the term “information content” is ill-defined. He unpacks it in terms of “minimum number of instructions needed to specify a structure.” This suggests a Kolmogorov information measure. Yet complex specified structures, according to him, require lots of instructions, and so suggest high Kolmogorov information. By contrast, specified complexity as developed in this article requires low Kolmogorov information.

At the same time, for Orgel to write that “complex but random structures … need hardly be specified at all” suggests low Kolmogorov complexity for random structures, which is exactly the opposite of how Kolmogorov information characterizes randomness. For Kolmogorov, the random structures are those that are incompressible, and thus, in Orgel’s usage, require many instructions to specify (not “need hardly be specified at all”).

Perhaps Orgel had something else in mind—I am trying to read him charitably—but from the vantage of information theory, his options are limited. Shannon and Kolmogorov are, for Orgel, the only games in town. And yet, Shannon information, focused as it is on probability rather than instruction sets, doesn’t clarify Orgel’s last remarks. Fortunately, Orgel elaborates on them with three examples:

These differences are made clear by the following example. Suppose a chemist agreed to synthesize anything that could be described accurately to him. How many instructions would he need to make a crystal, a mixture of random DNA-like polymers or the DNA of the bacterium E. coli? (p. 190)

This passage seem promising for understanding what Orgel is getting at with specified complexity. Nonetheless, it also suggests that Orgel is understanding information entirely in terms of instruction sets for building chemical systems, which then weds him entirely to a Kolmogorov rather than Shannon view of information. In particular, nothing here suggests that he bring will both views of information together under a coherent umbrella.

Here’s is how Orgel elaborates the first example, which is replete with the language of short descriptions (as in the account of specified complexity given in this article):

To describe the crystal we had in mind, we would need to specify which substance we wanted and the way in which the molecules were to be packed together in the crystal. The first requirement could be conveyed in a short sentence. The second would be almost as brief, because we could describe how we wanted the first few molecules packed together, and then say “and keep on doing the same.” Structural information has to be given only once because the crystal is regular. (p. 190)

This example has very much the feel of our earlier example in which Kolmogorov information was illustrated in a sequence of 100 identical coin tosses (0 for tails) described very simply by “repeat ‘0’ 100 times.” For specified complexity as developed in this article, an example like this one by Orgel yields a low degree of specified complexity. It combines both low Shannon information (the crystal forms reliably and repeatedly with high probability and thus low complexity) and low Kolmogorov information (the crystal requires a short description of instruction set). It exhibits specified non-complexity, or what could be called specified simplicity.

Orgel’s next example, focused on randomness, is more revealing, and indicates a fatal difficulty with his approach to specified complexity:

It would be almost as easy to tell the chemist how to make a mixture of random DNA-like polymers. We would first specify the proportion of each of the four nucleotides in the mixture. Then, we would say, “Mix the nucleotides in the required proportions, choose nucleotide molecules at random from the mixture, and join them together in the order you find them.” In this way the chemist would be sure to make polymers with the specified composition, but the sequences would be random. (p. 190)

Orgel’s account of forming random polymers here betrays information-theoretic confusion. Previously, he was using the terms “specify” and “specified” in the sense of giving a full instruction set to bring about a given structure—in this case, a given nucleotide polymer. But that’s not what he is doing here. Instead, he is giving a recipe for forming random nucleotide polymers in general. Granted, the recipe is short (i.e., bring together the right separate ingredients and mix), suggesting a short description length since it would be “easy” to tell a chemist how to produce it.

But the synthetic chemist here is producing not just one random polymer but a whole bunch of them. And even if the chemist produced a single such polymer, it would not be precisely identified. Rather, it would belong to a class of random polymers. To identify and actually build a given random polymer would require a large instructional set, and would thus indicate high, not low Kolmogorov information, contrary to what Orgel is saying here about random polymers.

Finally, let’s turn to the example that for Orgel motivates his introduction of the term “specified complexity” in the first place:

It is quite impossible to produce a corresponding simple set of instructions that would enable the chemist to synthesize the DNA of E. coli. In this case, the sequence matters: only by specifying the sequence letter-by-letter (about 4,000,000 instructions) could we tell the chemist what we wanted him to make. The synthetic chemist would need a book of instructions rather than a few short sentences. (p. 190)

Given this last example, it becomes clear that for Orgel, specified complexity is all about requiring a long instructional set to generate a structure. Orgel’s takeaway, then, is this:

It is important to notice that each polymer molecule on a random mixture has a sequence just as definite as that of E. coli DNA. However, in a random mixture the sequences are not specified. Whereas in E. coli, the DNA sequence is crucial. Two random mixtures contain quite different polymer sequences, but the DNA sequences in two E. coli cells are identical because they are specified. The polymer sequences are complex but random: although E. coli DNA is also complex, it is specified In a unique way. (pp. 190–191)

This is confused. The reason it’s confused is that Orgel’s account of specified complexity commits a category mistake. He admits that a random sequence requires just as long an instruction set to generate as E. coli DNA because both are, as he puts it, “definite.” Yet with random sequences, he looks at an entire class or range of random sequences whereas with E. coli DNA, he is looking at one particular sequence.

Orgel is correct, as far as he goes, that from an instruction set point of view, it’s easy to generate elements from such a class of random sequences. And yet, from an instruction set point of view, it is no easier to generate a particular random sequence than a particular non-random sequence, such as E. coli DNA. That’s the category mistake. Orgel is applying instruction sets in two very different ways, one to a class of sequences, the other to particular sequences. But he fails to note the difference.

The approach to specified complexity that Winston Ewert and I take, as characterized in this article, takes a different tack. Repetitive order yields high probability and specification, and therefore combines low Shannon and low Kolmogorov information, yielding, as we’ve seen, what can be called specified simplicity. This is consistent with Orgel. But note, our approach yields a specified complexity value (albeit a low one in this case). Specified complexity, as a difference between Shannon and Kolmogorov complexity, takes continuous values and thus comes in degrees. For repetitive order, specified complexity, as characterized in this article, will thus take on low values.

That said, Orgel’s application of specified complexity to distinguish a random nucleotide polymer from E. coli DNA diverges sharply from how specified complexity as outlined in this article applies to these same polymers. A random sequence, within the scheme outlined in this article, will have large Shannon information but also, because it has no short description, will have large Kolmogorov information, so the two will cancel each other, and the specified complexity of such a sequence will be low or indeterminate.

On the other hand, for E. coli DNA, within the scheme outlined in this article, there will be work to do in showing that it actually exhibit specified complexity. The problem is that the particular sequence in question will have low probability and thus high Shannon information. At the same time, that particular sequence will be unlikely to have a short exact description. Rather, what will be needed to characterize the E. coli DNA as exhibiting specified complexity within the scheme of this article is a short description to which the sequence answers but which also describes an event of small probability, thus combining high Shannon information with low Kolmogorov information.

Specified complexity as characterized in this article and applied to this example will thus mean that the description will include not just the particular sequence in question but a range of sequences that answer to the description. Note that there is no category mistake here as there was with Orgel. The point of specified complexity as developed in this article is always with matching events and descriptions of those events, where any particular event is described provided it answers to the description. For instance, a die rolls exhibiting a 6 answers to the description “an even die roll.”

So, is there a simple description of the E. coli DNA that shows this sequence to exhibit specified complexity in the sense outlined in this article? That’s in fact not an easy question to answer. The truth of Darwinian evolution versus intelligent design hinges on the answer. Orgel realized this when he wrote the following immediately after introducing the concept of specified complexity, though his reference to miracles is a red herring (at issue is whether life is the result of intelligence, and there’s no reason to think that intelligence as operating in nature need act miraculously):

Since, as scientists, we must not postulate miracles we must suppose that the appearance of “life” is necessarily preceded by a period of evolution. At first, replicating structures are formed that have low but non-zero information content. Natural selection leads to the development of a series of structures of increasing complexity and information content, until one is formed which we are prepared to call “living.” (p. 192)

Orgel is here proposing the life evolves to increasing levels of complexity, where at each stage nothing radically improbable is happening. Natural selection is thus seen as a probability amplifier that renders probable what otherwise would be improbable. Is there a simple description to which the E. coli DNA answers and whose corresponding event is highly improbable not just when the isolated nucleotides making up the E. coli DNA are viewed as a purely random mixture but rather by factoring in their evolvability via Darwinian evolution?

That’s a tough question to answer precisely because evaluating the probability of forming E. coli DNA with or without natural selection is far from clear. Given Orgel’s account of specified complexity, he would have to say that the E. coli DNA exhibits specified complexity. But within the account of specified complexity given in this article, ascribing specified complexity always requires doing some work, finding a description to which an observed event answers, showing the description to be short, and showing the event precisely identified by the description has small probability, implying high Shannon information and low Kolmogorov information.

For intelligent design in biology, the challenge in demonstrating specified complexity is always to find a biological system that can be briefly described (yielding low Kolmogorov complexity) and whose evolvability, even by Darwinian means, has small probability (yielding high Shannon information). Orgel’s understanding of specified complexity is quite different. In my view, it is not only conceptually incoherent but also stacks the deck unduly in favor of Darwinian evolution.

To sum up, this appendix has presented Orgel’s account of specified complexity at length so that readers can decide for themselves which account of specified complexity they prefer, Orgel’s or the one presented in this article.