Biological "Machine Code"
CS 301 Lecture, Dr. Lawlor
Consider, if you will, water. Water is composed of molecules of hydrogen and oxygen. A *lot* of molecules: one mole of water weighs about 18 grams (a little over one tablespoon), but contains 6.02 x 1023 water molecules. (For comparison, the total count of all the bits on all the hard disks ever made comes to about 1022 bits.)
Water is also very fast. Its molecular orientations change in around 50 femtoseconds (10-15 seconds, millionths of a nanosecond). So in one second, while bouncing around each water molecule explores 2 x 1013 random orientations with the other molecules in its immediate neighborhood.
So the molecules in one tablespoon of water are rated at 1.2 x 1037
orientations/second. That's a ridiculous amount of potential
computational horsepower, if we could figure out how to harness it!
How to Harness It
Rather than compute with just plain water, biological systems use
various interesting chemicals dissolved in the water. Information
is encoded in a long linear polymer called DNA, made of a sequence of Nucleotide "bases": adenine, guanine, cytosine, and thymine
(which is switched to uracil in RNA). In the "coding" region of
DNA, these nucleotides A, G, C, and T/U are "expressed" into proteins,
the giant molecules that get stuff done in your cells. There are
also long stretches of DNA that do not code for proteins; formerly
called "junk DNA" because we didn't understand its function, these
segments now appear to be crucial in determining how often a given
stretch of DNA "executes" into a protein.
Amazingly, every known living entity uses the same basic coding scheme to get from DNA to proteins: the Universal Genetic Code.
The seqence of DNA that encodes a protein is composed of three-letter
"codons", similar to machine code: each protein begins with an "ATG"
sequence, and subsequent codons represent amino acids
to be added to the protein, until the protein is ended with one of the
three stop sequences, TAA, TAG, or TGA. A typical protein forms
from a chain of between one hundred and one thousand amino
acids--similar to the length of many real functions in a computer
Not only does natural DNA work like a computer program, you can *build*
a computational system using DNA, called "biological computing".
advantage of biological computing is density--since the bits in your
problem data be represented by a
single molecule, and the computation proceeds in parallel and in 3D,
you can search enormous search spaces quickly.
Leonard Adleman demonstrated how to use DNA to quickly perform an exhaustive search in parallel.
The idea is to dump all the input molecules into a well-mixed test
tube. The inputs stick together in various ways, some of which
represent the outputs you're looking for. You then have to use
chemistry to winnow out the (huge number of) incorrect outputs until
you're left with the correct outputs, which you can read off.
Specifically, Adleman used the travelling salesman problem. He
represented each city with a short sequence of DNA, call the sequences A
through Z. The routes between each each city were
represented by a piece of DNA that will stick to ("bind") to one city
and the next, where the length of the strand corresponds to the
distance between the cities. He then mixed all the cities and
routes together, where they stuck together randomly. Now, a
solution to the salesman problem is encoded as a strand of cities and
routes. A solution has two features:
Other folks have recently built Turing machines with DNA, and "DNA nanofabrication" is a huge research field nowdays.
- It visits all the cities. You can wash away strands that
don't match city A by building a gel that binds to A, and washing away
all the strands. Strands that don't have an A won't stick, and
hence will be washed away. You can then dissolve the gel.
Repeat to wash away strands that are missing any other city.
- It's shorter than any other path. You can determine this in a centrifuge! Shorter stuff is lighter, and rises to the top!
You can get companies online that will synthesize custom genes for you. For example, $100 will buy you 10 nano-mols of flourescent probe molecules to tag a particular protein or sequence you're interested in. 1 mol is 6.022 x 1023 molecules. So your 10 nano-mols is actually 6.022 x 1015
molecules. That's 60 trillion molecules per dollar! (For
comparison, a $100 billion-transistor chip has just 10 million
transistors per dollar.)
Biologically Inspired Computing
As components get smaller and faster, ordinary computers are getting
more and more sensitive to small disturbances. Already, cosmic rays can impact server uptimes. Anecdotally, there's a curious fact about space travel. Due to the higher radiation:
Human brain cells can be killed in large absolute numbers by chemicals
such as alcohol and a variety of viruses and bacteria (e.g.,
meningitis). Yet the brain remains operational (at degraded
efficiency) despite such abuse, and rarely suffers significant
permanent damage from these injuries. That is, the human brain is
a suprisingly fault-tolerant computer. Unfortunately, we have no
idea how the human brain operates (see Gerstner's online neuron reverse-engineering attempts). But we can infer that it's at least as smart as the reactions of a single cell.
- Typical computers lock up about once a day (including flight computers, laptops, digital cameras, etc.) and must be rebooted.
- Human beings have never been observed to lock up, and have never
needed to be rebooted. This is a good thing, because a spacecraft
with an insane or catatonic crew might have trouble returning to Earth.
A single cell uses a number of interesting design principles.
Many of these same principles are shared by healthy ecosystems, economic markets,
functioning democracies, and piles of gravel. I've come to call
these principles "ecosystem design":
As an example, I claim an automobile, CPU, dictatorship, and orderly stack of
bricks use "machine design", not ecosystem design: these systems have
crucial decisionmaker parts (e.g., the braking system, control unit,
dictator, or middle brick) whose failure can dramatically change the
entire system. Machine design requires somebody to go in
and fix these crucial parts now and then, which is painful.
- Many independent decision-makers. Examples: A cell is a
complicated collection of separate but interacting proteins. A
market is a collection of interacting buyers and sellers. A
gravel pile is a collection of interacting pebbles. Results: no
central decision-maker means no single point of failure, so the loss of
any few small pieces doesn't affect the overall result.
- Dynamic equilibrium--lots of small-scale stuff is changing all
the time, but the overall averages remain quite constant.
Examples: the metabolic rate of a cell is the result of all its pieces,
but it's quite predictable overall. A market's overall prices
tend to remain fairly stable. A gravel pile's average slope can
be predicted to within a few degrees.
All modern programming languages encourage machine-style design (for
example, what happens to a for loop if the variable "i" freaks
out?). At the moment, nobody knows what an ecosystem-design language would look