Protecting groups

Most compounds of interest have more than one functional group, and during synthesis it is important to protect all but the one you are operating on (or to use a very selective reagent that ignores the rest).  An interesting compound has a 20 membered ring with several functional groups hanging from it (phenol, benzyl, a primary amine, 2 amides, and a large subsitutent which contains (among other things) a guanidine, an amide, and a cyclic tertiary amide).  Part of the ring is a disulfide.  Remarkably, this compound is made without the use of any protecting groups.

This post was brought to mind by chapter 24 of Clayden which discusses six protecting groups, using them and more in the next chapter which concerns organic synthesis.   

Before telling you what the compound is, it’s time for a great quote.

“Remember, if you don’t understand it right away, don’t worry. You never learn anything, you only get used to it.”  

This, from a book on LISP programming (Let’s Talk LISP) by Laurent Siklossy.  I’m not sure anyone programs in LISP anymore, but 30 – 35 years ago, LISP was the language of artificial intelligence and expert systems, something so hot at the time that people making a living by their brains regarded AI the way John Henry looked at the steam drill.  As a neurologist, I was interested in how a computer, which can be completely understood, could mimic brain function. In addition, computers were becoming personal (and even better, affordable) back then, so I had some fun trying to learn LISP and monkeying around with it.   LISP code is incredibly arcane, with more parentheses than most brains can comprehend. After one particularly horrible example, Siklossy put in the quote by way of encouragement. 

Well the compound in question is arginine vasopressin, which your body makes without the benefit of protecting groups (but using the incredibly complicated ribosomal machine, messenger RNA, transfer RNA and the enzymes which charge transfer RNA with amino acids).  Arginine vasopressin with its nine amino acids is very small beer compared to real proteins.

I think we’ve gotten far too used to the immense quantity of functional groups that proteins have, and the fact that, as far as we know, in the vast majority of cases, they don’t react with each other.  Out of the 20 amino acids, 3 are alcohols, 2 are carboxylic acids and yet intramolecular lactones haven’t been found, which isn’t to say they aren’t there, but has anyone really looked?  2 more amino acids have amides on their side chain, do they ever react switch places with the protein backbone?  Then there is lysine with its primary amino group dangling off the peptide backbone, just waiting to get into trouble, thanks to a linear chain of 4 methylenes.  

Sometimes rather amazing chemistry does happen deep within proteins.  Here is a post I wrote for Nature Chemistry.  Stuart Cantrill told me that since I wrote it, I can use it elsewhere

 OCTOBER 30, 2008

Chemiotics: Sherlock Holmes and the Green Fluorescent Protein

Posted on behalf of Retread

Gregory (Scotland Yard): “Is there any other point to which you would wish to draw my attention?”
Holmes: “To the curious incident of the dog in the night-time.”
Gregory: “The dog did nothing in the night-time.”
Holmes: “That was the curious incident.”

The chromophore of green fluorescent protein (GFP) is para-hydroxybenzylidene imidazolinone. It is formed by cyclization of a serine (#65) tyrosine (#66) glycine (#67) sequential tripeptide. It is found in the center of a beta barrel formed by the 238 amino acids of GFP.

What is so curious about this?

Simply put, why don’t things like this happen all the time? Perhaps nothing quite this fancy, but on a more plebeian level consider this: of the twenty amino acids, 2 are carboxylic acids, 2 are amides, 1 is an amine, 3 are alcohols and one is a thiol. One might expect esters, amides, thioesters and sulfides to be formed deep inside proteins. Why deep inside? On the surface of the protein, there is water at 55 molar around to hydrolyze them purely by the law of mass action (releasing about 10 kJ/Avogadro’s number per bond in the process). Some water is present in the X-ray crystallographic structure of proteins, but nothing this concentrated.

The presence of 55 M water bathing the protein surface leads to an even more curious incident, namely why proteins exist at all given that amide hydrolysis is exothermic (as well as entropically favorable). Perhaps this is why proteins contain so many alpha helices and beta sheets — as well as functioning as structural elements they may also serve to hide the amides from water by hydrogen bonding them to each other. Along this line, could this be why the hydrophilic side chains of proteins (arginine, lysine, the acids and the amides) are rather bulky? Perhaps they also function to sterically shield the adjacent amides. After all, why should lysine have 4 methylene groups rather than just one or two?

Now the serine-tyrosine-glycine tripeptide should occur by chance once in every 8000 tripeptides. The SwissProt database of proteins contains 144,041,553 amino acids in 399,749 proteins as of 14 October 2008. Does this tripeptide occur 18,805 times in the database as it should? If it doesn’t, is negative selection preventing it? If it does occur this often, have we missed other chromophores? Are there other tripeptides missing from SwissProt? If there are, does this tell us how to build other chromophores? Or does it tell us something important about protein structure?

I don’t have the skills to properly interrogate SwissProt or the Protein Data Bank, but I imagine that some of the readership does. Go to it. These are curious incidents indeed.

Post a comment or leave a trackback: Trackback URL.


  • Alexander  On May 13, 2010 at 7:02 pm

    S in SYG sequence is not essential for cyclization. That brings into view quite a lot of proteins

    BTW, there is, at least, HAL enzyme (histidine ammonia-lyase) with similar post-translational auto-catalytic thing in its active center. and it is exposed to solvent

  • J-bone  On May 17, 2010 at 3:57 pm

    Just a few things about amino acids (from an organic chemist’s standpoint):

    1) The reason you don’t see the lactones or lactams (actually, if the side chains of aspartate/glutamate or asparigine/glutamine were to cyclize you’d end up with an anhydride or imide) is that in the overwhelmingly aqueous environment of a cell the hydrolysis of these would be heavily favored, leaving you with the linear amino acid.

    2) Lysine stays out of trouble because the free amine is so basic that in neutral pH it will be protonated.

    3) Asparigine and Glutamine don’t have their sidechains incorporated into peptide backbones because the reactivity of that nitrogen is significantly lowered due to delocalization into the amide, leaving the free amine as the best nucleophile.

  • luysii  On May 19, 2010 at 8:53 am

    J-Bone — thanks for the comments.

    WRT #2 Clayden has the pKaH of n-butyl amine as 10.7, which should be pretty close to that of the lysine side chain. Physiologic pH is 7.3 – 7.4 and is maintained by the body in a very narrow range. So between 1/1000 and 1/10,000 lysines should be unprotonated at any one time. This should be enough for them to react to form something more stable.

    I agree that something like this is unlikely to happen on the surface of a protein for the reasons you state (I noted this rather obliquely in the 3rd paragraph from the end). The chromophore of GFP occurs inside the protective environment of a beta barrel.

    However, polar side groups are well known to occur ‘inside’ proteins. The 4 arginines (with their 3 nitrogens in a guanidino group) on an alpha helix found in the very apolar environment of the cell membrane are used by all sorts of ion channels to sense transmembrane voltage and produce allosteric shifts in the ion channel (e.g. opening and closing) accordingly.

  • Yggdrasil  On May 20, 2010 at 1:19 am

    It’s also important to note that proteins turnover inside the cell (old proteins are degraded while new proteins are synthesized to replace them). Protein side-chains do become involved in side reactions a variety of non-functional side reactions, for example, some may get oxidized, and some can even participate in Maillard reactions with sugars (yes, the same reaction that makes bread into toast). However, it is likely that these side reactions are slow and infrequent, and the rate of protein turnover is likely high enough to limit the number of damaged proteins floating around the cell.

    Interestingly, the process that targets proteins for degradation, ubiquitination, involves the attachment of the small protein ubiquitin to the protein to be degraded (for sake of completeness, I will note that ubiquitination serves other purposes and is not limited to targeting proteins for degradation). This attachment occurs through an isopeptide bond: a lysine in the protein of interest forms a covalent amide bond with the carboxy-terminus of ubiquitin. So, while glutamine and asparagine don’t get mixed up with the backbones of proteins, lysines certainly do.

  • Wavefunction  On May 22, 2010 at 11:17 pm

    A question; why is E2 necessary? It only seems to transfer the peptide from E1 to the substrate. Why not E1 to substrate directly?

  • Yggdrasil  On May 23, 2010 at 12:14 pm

    The answer probably has to do with specificity. Many different cellular systems use the ubiquitination process, and so it must be tightly regulated. The cell wants to be able to degrade protein A without also increasing the degradation of protein B. Consistent with this idea, if you look through the genome you will see that the number of E3 enzymes > the number of E2 enzymes > the number of E1 enzymes. For the most part, each E1 can ubiquitinate many E2s, but each E2 is ubiquitinated by only one E1. Similarly, each E2 could be used by a number of E3s, but each E3 uses only one specific E2. This system would allow E1 enzymes to create pools of ubiquitinated E2 enzymes ready for reaction, and regulating which E1 enzymes are present and active is one level of regulation. Furthermore, controling which E2 enzymes are available to be ubiquitinated by E1 enzymes provides another level of regulated. Finally, regulating which E3 enzymes are active and which substrates are ready for ubiquitination is yet another level at which the specificity of ubiquitination can be regulated.

    There are also cases where reactions between E2 enzymes create polyubiquitin chains. The interactions between two E2s catalyze the transfer of one ubiquitin to a specific lysine on the other ubiquitin. Here, the E2s regulate the specific identity of the polyubiqiutin chain (i.e. to which lysine in ubiquitin gets ubiquitinated), an important process because different types of polyubiquitin chains have different fates in the cell.

  • luysii  On July 4, 2010 at 6:32 am

    J-Bone: Have a look at Proc. Natl. Acad. Sci. vol. 107 pp. 107 pp. 11686 – 11691 ’10. The authors made a cyclic pentapeptide Lysine – Alanine – Alanine – Alanine – Aspartic acid, with an amide link between the side chains of lysine and aspartic acid. The amino group of lysine was aceylated and the carboxyl of aspartic acid was amidated. The compound was stable in human serum for over 24 hours ! They found that they could change the 3 alanines to other side chains and still have stability. Their focus was on drug design but the paper has larger implications — why doesn’t this happen more often. Time to search protein databases for a 1 – 5 relationship between lysine and aspartic acid, and se if any occur on the protein surface.


  • By amino protecting groups | Amino Acids on October 21, 2011 at 11:19 pm

    […] Protecting groups 2 more amino acids have amides on their side chain, do they ever react switch places with the protein backbone? Then there is lysine with its primary amino group dangling off the peptide backbone, just waiting to get into trouble, … .. […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: