The mass of the earth is given by my physics book (Halliday 6th Ed.) as 6 x 10^27 grams. If we made just one molecule of each protein containing n amino acids linked together, when would we run out of material? Make a guess. I found the results surprising.
Assume the earth is made of nothing but hydrogen, oxygen, nitrogen, carbon and sulfur. Clearly not true, but we’re going for what mathematicians call an upper bound. If mathematicians can get away with things like “consider a spherical cow” I can get away with this. (The cognoscenti may wish to go for a least upper bound). Proteins are linear chains of 20 different amino acids ranging in mass from glycine at 79 Daltons to tryptophan at 204. When linked together by an amide (peptide) bond, 18 Daltons of mass is lost (water is split out). So figure the average amino acid at 100 Daltons (roughly).
So there are 20 x 20 = 400 distinct proteins of 2 amino acids, 8000 with 3, 160,000 with 4, 3,200,000 with just 5. Shorties like this are called peptides (or polypeptides) and just when you start calling them proteins seems to be a matter of taste.
We’re figuring the mass of the typical amino acid at 100 Daltons, but a Dalton doesn’t have much mass. It is 1/12 the mass of a single atom of carbon-12, Avogadro’s number (about 6 x 10^23) of which have a mass of 12 grams. So one Dalton has a mass of 10^-24 grams (roughly).
The number of distinct proteins containing n amino acids is 20^n. The mass of each protein (in Daltons) is (roughly) 100 x n — depending on the amino acids chosen. The mass of the collection of distinct proteins of length n in grams is (20^n) x (100 x n) x (10^-24). It’s clear that we’re over 1 gram for the collection at only 24 amino acids (as 20^24 is much larger than 10^-24. How far over? 2^24 x 100 x 24 = 40,265,318,400 = 4 x 10^10 grams.
As noted, the mass of the earth is 6 x 10^27 grams. So we’re not too far away at 24 amino acids. Certainly no farther away than another 17 amino acids as 20^17 is much greater than 10^17.
So, the mass of the earth (which isn’t all carbon, hydrogen, etc… ) isn’t enough to make just one molecule of each of the possible proteins 41 amino acids long. 41 amino acids is a very small protein (some would call it a polypeptide). Just about every protein of biological interest is much larger. The champ is a muscle protein called titin which has 27,000+ amino acids.
So what? It means that chemists will never be able to explore more than a tiny morsel of the space of possible proteins. Perhaps computationally we will (I doubt it), but that’s the subject of a future post.
The above is a post I wrote for “The Skeptical Chymist” back in April of 2008 (using the nom de plume Retread). I hoped for a lot of comments (particularly showing how I was wrong, as being correct has a lot of implications). I did get the following interesting comment from Param Priya Singh.
Really Good! However this may not be true. Because the situation which has been discussed is only valid if all possible polypeptides are made- all at once. But in biological reality it may not be the case. What if the sequence space has been explored (by nature) gradually during millions of years? In that case at a particular instance not all, but a limited (but still very large) subset is being explored and is being evolved under the selective pressure. From Param Priya Singh
to which I replied
Param — thanks for your comments. Consider the following: Let us suppose there is a super-industrious post-doc who can make a new protein every nanosecond (reusing the atoms). There are 60 * 60 * 24 * 365 = 31,536,000 ~ 10^7 seconds in a year and 10^10 years (more or less) since the big bang. This is 10^9 * 10^7 * 10 ^10 = 10^26 different 41 amino acid proteins he could make since the dawn of time. But there are 20^41 = 2^41 * 10^41 proteins of length 41 amino acids. 2^41 = 2,199,023,255,552 = 10^12. So he has only tested 10^26 of 10^53 possible 41 amino acid proteins in all this time.
As per your suggestion, this is making one protein at a time. However, even if the hapless post-doc was able to use the entire mass of the earth (6 x 10^27 grams) every nanosecond to make a different set of proteins (one molecule of each), he would never have made all the possibilities for a protein of length of one of the two chains of hemoglobin (141 or 146 amino acids) since time began. Hemoglobin just isn’t that big as proteins go (the gene mutated in cystic fibrosis has well over 1000).
So write in and show me the mistakes in all this. If it stands, this back of the envelope calculation poses severe problems for a very popular theory.