Biological Computing

CS 321 2007 Lecture, Dr. Lawlor

(Warning: Dr. Lawlor is pretty far outside his expertise here!)

Glossary

Example: DNA to Protein

Say in your cell's nucleus, your DNA contains a gene with this unusually-short sequence of nucleotides:
   ...TA ATG CAC GGG GGC GGG UGG GGG CAA CCA TAG AAA G...
 
This will get transcribed into a short string of Messenger RNA with this sequence (replacing T - Thymine, with U - Uracil):
      UA AUG CAC GGG GGC GGG UGG GGG CAA CCA UAG AAA G

Using the cell's ribosomes, this string of Messenger RNA will get executed as follows:
  1. U, A -> nothing happens
  2. AUG -> valid START sequence codon.  The ribosomes bind to this spot, and begin assembling a protein.
  3. CAC -> codon for the Histidine amino acid (table), which is the first amino acid in the new protein.
  4. GGG -> codon for the Glycine amino acid, which is the second amino acid in the protein.  The Glycine sticks to the existing Histidine, forming a two-acid "polypeptide".
  5. GGC -> another different codon for Glycine amino acid, which is the third amino acid in the chain.  There are 64 possible codons, but only 20 used amino acids, so some amino acids are represented by several different codons.  
  6. GGG -> Glycine again, which gets added as the fourth amino acid.
  7. UGG -> Tryptophan, which gets stuck on as the fifth acid.
  8. GGG -> yet more Glycine.
  9. CAA -> Glutamine.
  10. CCA -> Proline.
  11. UAG -> STOP codon.  The ribosome lets go of the newly formed chain of amino acids, which is a new protein.  The protein floats away.
  12. A, A, A, G, etc. -> nothing happens.  Stuff outside of START and END does not bind ribosomes, and so does not make proteins.
So this Messenger RNA has just created a new eight-amino-acid protein:
    Histidine - (Glycine)3 - Tryptophan - Glycine - Glutamine - Proline
(or HGGGWGQP using the confusing amino-acid-to-letter substitution).

(I'm skipping over lots of complexity here.  Real genes start with a promotor sequence that tends to attract the RNA replication machinery, and often include introns that fold themselves out of the RNA before it's executed into a protein.)

Why You Care: Disease & Bioterror

One particular folding of the protein above is human prion protein 61-68, which is the cause of the Creutzfeldt-Jakob disease, a brain-destroying disease that can either be inherited from the bad genes listed above, or aquired by eating the poorly-cooked brains of infected "mad" cows.  The problem with this protein is that it functions as an enzyme--it converts other useful proteins into more copies of itself.  Such self-catalyzing proteins are called prions.   Prions aren't nearly as infectious as viruses (they have to be eaten in large quantities to be infected, and take years to begin causing problems), but they're incurable and currently mostly undetectable.

Read that again.  The the gene sequence above, when executed into a protein, can kill you.  For under a hundred dollars, online you can mail-order physically expressed copies of that gene sequence from a gene synthesis lab.  You can order the copies as fully-assembled proteins (peptides), short RNA or DNA snippets (oglios), or even as working DNA inside living (non-human) cells like bacteria.

Bacteria are just little independent single-cell organisms living in your body.  Viruses are more interesting--they're just DNA in a cheap protein coat.  When executed, the DNA codes for... more viruses. So a virus just hijacks the code of a working cell to start manufacturing viruses--nanotechnology used for evil.

Here's the nucleotide sequence for smallpox (variola virus).  It's 185.5 thousand nucleotides long, or 46.4KB in binary form.  Luckily, it's currently not possible to artificially synthesize such extremely long-chain sequences into working DNA (the per-nucleotide error rate is too high), but in a few years these 46.4KB of *binary* data could be converted to *physical* form and cause horrific human suffering!

Also, cancer.  Cancer is very simple--it's when your body's normal cells stop doing what they're supposed to do, and change their own DNA to start reproducing without bound, like little single-cell organisms.  Your genes contain all sorts of interesting hacks to prevent this, like the ticking time-bomb of telomeres at the end of each chromosome, but cancer (evolution at work!) is pretty good at changing the cell DNA to evade these defenses.  A woman, Henrietta Lacks, who died in 1951, had a cervical cancer culture taken that still lives on to this day, having evolved into a successful experimental and wild single-celled organism, which to this day will occasionally infect other people's cancer biopsy results.

Why You Care: Information Density

Again, online you can order flourescent probe molecules to tag a particular protein or sequence you're interested in.  These probes are short little proteins that have one glowy end (for example, that glows green under UV light), and one "sticky" end, where by "sticky" I mean that end is designed to bind to whatever biological object you like.  For example, say you're interested in determining if a cow brain contains the prions above.  So you design a probe that will stick to the prion.  Then you just wash your cow brain (or plants, or toads, or whatever) with the probes, and then shine on a UV light--if it glows green, the probes have stuck to prions, so don't eat it!

How expensive are these useful little probes to fabricate?  Well, there's a special where $100 will buy you 1 "nano-mol" of probes.  1 mol is 6.022 x 1023 molecules (Avagadro's number).  So 1 nano-mol is 10-9 moles, or 6.022 x 1014 molecules.  That's 6 trillion probe molecules per dollar!

This is really cheap compared with the price of, for example, cars (0.00009 Kia Rios per dollar) or even like fast food (2 Taco Bell tacos per dollar).  It's still cheap compared to CPU transistors (300 million transistors/$100 = 3 million transistors per dollar) or even DRAM storage cells (1GB/$50 = 8 billion bits/$50 = 160 million bits per dollar).

Biological information storage is so cheap, in fact, that almost every cell in your body contains its own complete copy of your DNA.  Human DNA has about 3 billion nucleotide pairs, or 6 billion bits, or 750MB of data--about one CD-ROM worth.  There are something like 5 million cells per cubic centimeter of human flesh, which means (counting only the DNA) the information density of human flesh is over 3,000 terabytes per cubic centimeter!  And that's not even trying very hard--pure DNA could be thousands of times more efficient than this, since DNA is only a tiny portion of the complete cell.

The bottom line is that DNA is a spectacularly awesome information storage mechanism--one pair of nucleotides is only a few dozen atoms across, and stores two bits.  I feel like DNA and proteins represent amazing nanotechnology--atomic-scale fabrication done right.

Why You Care: Processing Speed

We saw above that $1 buys you six trillion (6 x 1012) probe proteins.  At room temperature, they're all wiggling all over the place, at a speed of molecules, "trying" to react with something nearby.

For example, this page's NAMD simulations of the cell-wall protein aquaporin shows the crucial atoms inside the protein wiggling around.  The atoms make complete wiggles on a timescale of picoseconds (10-12 seconds). 

Viewed as a computer, this means you've got trillion-way parallelism, and your clock rate is in the terahertz.  This means you're doing trillions of trillions of total wiggles per second--in this case, something like 6 x 1024 wiggles per second--per dollar!   So if you can figure out how to express your computation in terms of atomic wiggles, you can get absolutely insane performance.

Ecosystem Design

A single cell uses a number of interesting design principles.  First, because everything's on the scale of atoms (and wiggling around like crazy), it's quite easy for things to get knocked out of alignment, for crucial parts to break off, or for random unknown molecules to arrive and disrupt the functioning of the system.  The cell has to work even in the face of all that, and it does a wonderful job of it.  The main trick is simply replication--there's 500 copies of the Messenger RNA for every gene in the cell that matters, so losing one of the copies is no big deal.  It's a totally different design philosophy than normal computers are based on.

Many of these same cell-design principles are shared by healthy ecosystems, economic markets, functioning democracies, and piles of gravel.  I've come to call these principles "ecosystem design":
As an example, I claim an automobile, CPU, dictatorship, and orderly stack of bricks use "machine design", not ecosystem design: these systems have crucial decisionmaker parts (e.g., the braking system, control unit, dictator, or bottom brick) whose failure can dramatically change the entire system.   Machine design requires somebody to go in and fix these crucial parts now and then, which is inefficient and error-prone. 

Ecosystems, by contrast, harness the power of probability--chaos--in order to get stuff done.

Fault Tolerance

A computer is really not at all a robust system--outside a very narrow temperature and electrical voltage range, it will stop working.  Computers can be totally destroyed by dust, humidity, or even microscopic conductive "zinc whiskers".

Mammals are, of course, also quite easy to disrupt.  Mammals depend on the circulation of air and blood to continue to operate, and contain these fluids within quite delicate structures, such that poking even a small .223 caliber hole in the aorta, for example, will cause virtually all mammals to stop working.

Even a single cell in some ways functions in a machinelike, non-ecosystem fashion--tearing a hole in a cell wall is called lysis, and results in death.  However, note that many cells are quite difficult to kill.

For example, Deinococcus Radiodurans, also known as "Conan the Bacterium", can survive radiation sufficient to kill even cockroaches (by reassembling its own DNA),  hard vacuum (by forming spores), and various noxious chemicals.  A strain of this bacterium was recently engineered to reclaim mercury and toluene-contaminated nuclear waste. 

Even the tiny, recently-emerged 5kbp canine parvovirus can survive alchol, acids, lye, freezing, and 120 degree water.  The only known way to kill it is with bleach, which dissolves its tough coat.

But we can use ecosystem design to keep our machines running even in the face of these threats!  For example, one beautiful design your body uses to repel invaders is a set of proteins with selectively sticky parts. These are designed to stick to foreign material such as a virus, and flag it for disposal by the immune system.  When found, the immune system also creates more proteins with that kind of stickiness.  Even better, each protein, called an antibody, has two identical sticky pads, which tends to bind together antibodies and viruses into long folded-up chains that can easily be identified and destroyed.  Within a few days, the immune system cranks up antibody production to the point where viruses floating around in the blood almost immediately get stuck to antibodies and eliminated.  This is why you're immune to a disease you've already been exposed to, either through catching the disease naturally, or by having the disease proteins artificially introduced into your body during vaccination.

So, the bottom line is that biology is nanotech on an amazing scale, and offers spectacular possibilities for information density, processing speed, and reliability.  The downsides are that designing systems that work is a lot trickier on the small scale, and also the possibility of a mankind-killing genetically-engineered superflu.