OK protein design class, here’s a breather for you what with midterms coming up and all. Improving DNA polymerases with over 1000 amino acids was tough even with the quantum computers I handed out. For this assignment, all you have to do is design four 34 amino acid protein modules each of which recognizes a single DNA base (A, C, G, T) leaving the rest alone. No cheating using Zinc fingers — which recognize nucleotide triplets. Unfortunately there aren’t 64 such triplets, so this isn’t an academic exercise. Of course if you succeed the next assignment is a way to put them together in a protein (say 13 – 28 times) so you can recognize any sequence you want. Since 4^13 is 2^26 = 67,108,684, that’s not enough to make the sequence you are recognizing unique in the 3,200,000 nucleotide human genome. But 4^16 = 2^32 = 4,294,967,296 gets you there nicely (assuming random distribution of A, C, G and T, and that each accounts for 25% of the bases — something we know is NOT true, but let it pass).
Tough problem. No cheating using your quantum computers. There are ‘only’ 20^34 = 17,179,869,184 * 10^34 possibilities.
To make things a bit easier only 2 amino acids (at positions #12 and #13 of the module) account for the specificity. Even if you can’t actually design the module, your superior knowledge of organic chemistry should certainly allow you to choose the 2 amino acids at these positions giving you the specificity. Here there are only 20*20 – 20 = 380 possibilities.
Go to it lads and lassies.
If you get stuck have a look at Science vol. 333 pp. 1843 – 1846 ’11. Humble bacteria attacking plants have already done it. Almost enough to make you humble isn’t it. Of course it all arose by chance didn’t it? Didn’t it?