with inserts equivalent to one, two, or four bases, the same effect was observedâa disrupted protein. But with an insertion of three, a protein product was still viable. Crick and Brenner deduced from this that the code of DNA works in sequences of three bases. This pattern is known as a reading frame: it places spaces in a length of DNA, exposing the meaningful triplets of letters.
Starting from there, an American and a German scientist cracked the first coded message in DNA in 1961 using a process of elimination. Instead of trying to work out how a naturally occurring sequence of DNA translated into a protein, Marshall Nirenberg and J. Heinrich Matthaei strung together a length of genetic code that only consisted of the base thymine (the letter
T
). They inserted this into the mechanics of a working cell and provided it with a ready supply of amino acids from which a protein might be built. They knew that there are only twenty amino acids from which all life is built, so in each of twenty different test tubes containing all of them, a single different amino acid within the mixture was tagged with a radioactive label. This meant that if a protein that resulted from their doctored DNA buzzed with radiation, they would be able to identify which amino acid was coded by a triplet of thymine bases. When they extracted the proteins from their various test tubes, the result was nineteen duds and one that set the Geiger counters buzzing. They discovered that a genetic code consisting exclusively of the letter
T
resulted in a protein consisting exclusively of the amino acid phenylanine.
And so the nature of the irrepressible code was known. Over the next few years, by varying the template, the unique triplets encoding the other nineteen amino acids were discovered, until by the end of the 1960s we had a complete readout of how DNA encodes proteins.
Here is a small section of DNA, part of a gene:
cctgggaccaacttcgcgaagcgggaagcccggcgg
Here is the same sequence broken into the triplet reading frame, as the cell reads it:
cct  ggg  acc  aac  ttc
 gcg  aag  cgg  gaa  gcc  cgg  cgg
And here it is again with each amino acid (here written in their abbreviated form) alongside each codon:
cct  ggg  acc  aac  ttc  gcg  aag  cgg  gaa  gcc  cgg  cgg
Pro
Â
Gly
Â
Thr
Â
Asn
Â
Phe
Â
Ala
Â
Lys
Â
Pro
Â
Glu
Â
Ala
Â
Arg
Â
Arg
That string of amino acids forms part of a protein.
So how could scientists tell where a gene begins and ends? There is also punctuation in the language of DNA. In the continuous run of
A
s,
T
s,
G
s
,
and
C
s, the cell knows where a gene begins because without exception, they all start with the letters
ATG,
the so-called start codon, like a capital letter at the beginning of a sentence. Similarly, all genes end with a period, a stop codon, of which there are three:
TGA, TAG,
and
TAA
. A reading frame for an entire gene always begins with
ATG
and ends with one of these three stop codons.
Proteins are, then, long strings of amino acids as decreed by the DNA that encodes them. They perform their functions by folding up into three-dimensional shapes; the grooves, holes, clamps, and pockets in their folded shapes give them all manner of abilities. 5 Proteins also team up to gain new purposes. For example, hemoglobin, which carries oxygen around your body in red blood cells, is made up of four proteins, together carrying a single atom of iron. The astonishing properties of spider silk are the result of the sophisticated complex of different proteins of which it is made, some of them neatly folded, others overlapping to create a high-tensile strength comparable with that of steel. Some proteins are enzymes, which catalyze bodily reactions, the metabolism in cells that keeps us alive. Others are sensory, like the ones embedded in the rods and cones of your retina, so specialized that they can detect a