Today’s protein design assignment

OK protein design class, here’s a breather for you what with midterms coming up and all.  Improving DNA polymerases with over 1000 amino acids was tough even with the quantum computers I handed out.  For this assignment, all you have to do is design four 34 amino acid protein modules each of which recognizes a single DNA base (A, C, G, T) leaving the rest alone.  No cheating using Zinc fingers — which recognize nucleotide triplets.  Unfortunately there aren’t 64 such triplets, so this isn’t an academic exercise.  Of course if you succeed the next assignment is a way to put them together in a protein (say 13 – 28 times) so you can recognize any sequence you want.  Since 4^13 is 2^26 = 67,108,684, that’s not enough to make the sequence you are recognizing unique in the 3,200,000 nucleotide human genome.  But 4^16 = 2^32 = 4,294,967,296 gets you there nicely (assuming random distribution of A, C, G and T, and that each accounts for 25% of the bases — something we know is NOT true, but let it pass).

Tough problem.  No cheating using your quantum computers.  There are ‘only’ 20^34  = 17,179,869,184 * 10^34 possibilities.

To make things a bit easier only 2 amino acids (at positions #12 and #13 of the module) account for the specificity.  Even if you can’t actually design the module, your superior knowledge of organic chemistry should certainly allow you to choose the 2 amino acids at these positions giving you the specificity.  Here there are only 20*20 – 20 = 380 possibilities.

Go to it lads and lassies.

If you get stuck have a look at Science vol. 333 pp. 1843 – 1846 ’11.  Humble bacteria attacking plants have already done it.  Almost enough to make you humble isn’t it.  Of course it all arose by chance didn’t it?  Didn’t it?

Post a comment or leave a trackback: Trackback URL.


  • Curious Wavefunction  On October 27, 2011 at 3:29 pm

    I think your question makes the assignment sound much more challenging than what it likely is since it gives the impression that we are designing a protein from scratch. But in nature this happens not from scratch but by mixing and matching existing protein modules with less specific functions. Although challenging, the latter is far easier than the former and does not strain credulity.

    In case of the DNA recognition proteins, the starting point was likely a protein that recognized a couple of different DNA sequences with slight specificity. Evolution then acted on this pre-existing design to build in the exquisite specificity that you note. This is a process that nature uses all the time; just think of the several proteases with common ancestry that branched off to form the highly selective proteins in the blood cascade.

    The problem that you are talking about is really one in protein REdesign rather than completely de novo design. Nature does provide ample reasons to be humble, but not enough to defy belief.

  • luysii  On October 27, 2011 at 4:15 pm

    Fair enough. But, one of the points of the article, was how little we really understand about the amino acid sequence –> 3 d structure transformation, carried out innumerable times each second within us. If we did, we could certainly design such a 34 amino acid sequence (hardly large enough to even be called a protein). Also this is a perfect subject for CASP (or any software you have at hand). For more on CASP see

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: