Tag Archives: collagen

How ‘simple’ can a protein be and still have a significant biological effect

Words only have meaning in the context of the much larger collection of words we call language. So it is with proteins. Their only ‘meaning’ is the biologic effects they produce in the much larger collection of proteins, lipids, sugars, metabolites, cells and tissues of an organism.

So how ‘simple’ can a protein be and still produce a meaningful effect? As Bill Clinton would say, that depends on what you mean by simple. Well one way a protein can be simple is by only having a few amino acids. Met-enkephalin, an endogenous opiate, contains only 5 amino acids. Now many wouldn’t consider met-enkehalin a protein, calling it a polypeptide instead. But the boundary between polypeptide and protein is as fluid and ill-defined as a few grains of sand and a pile of it.

Another way to define simple, is by having most of the protein made up by just a few of the 20 amino acids. Collagen is a good example. Nearly half of it is glycine and proline (and a modified proline called hydroxyProline), leaving the other 18 amino acids to make up the rest. Collagen is big despite being simple — a single molecule has a mass of 285 kiloDaltons.

This brings us to [ Proc. Natl. Acad. Sci. vol 112 pp. E4717 – E4727 ’15 ] They constructed a protein/polypeptide of 26 amino acids of which 25 are either leucine or isoleucine. The 26th amino acid is methionine (which is found at the very amino terminal end of all proteins — remember methionine is always the initiator codon).

What does it do? It causes tumors. How so? It binds to the transmembrane domain of the beta variant for the receptor for Platelet Derived Growth factor (PDGFRbeta). The receptor when turned on causes cells to proliferate.

What is the smallest known oncoprotein? It is the E5 protein of Bovine PapillomaVirus (BPV), which is an essentially a free standing transmembrane domain (which also binds to PDGFRbeta). It has only 44 amino acids.

Well we have 26 letters + a space. I leave it to you to choose 3 of them, use one of them once, the other two 25 times, with as many spaces as you want and construct a meaningful sequence from them (in any language using the English alphabet).

Just back from an Adult Chamber Music Festival (aka Band Camp for Adults).  More about that in a future post

A very UNtheoretical approach to cancer diagnosis

We have tons of different antibodies in our blood. Without even taking mutation into account we have 65 heavy chain genes, 27 diversity segments, and 6 joining regions for them (making 10,530) possibilities — then there are 40 genes for the kappa light chains and 30 for the lambda light chains or over 1,200 * 10,530. That’s without the mutations we know that do occur to increase antibody affinity. So the number of antibodies probably ramming around in our blood is over a billion (I doubt that anyone has counted then, just has no one has ever counted the neurons in our brain). Antibodies can bind to anything — sugars, fats, but we think of them as mostly binding to protein fragments.

We also know that cancer is characterized by mutations, particularly in the genes coding for proteins. Many of the these mutations have never been seen by the immune system, so they act as neoantigens. So what [ Proc. Natl. Acad. Sci. vo. 111 pp. E3072 – E3080 ’14 ] did was make a chip containing 10,000 peptides, and saw which of them were bound by antibodies in the blood.

The peptides were 20 amino acids long, with 17 randomly chosen amino acids, and a common 3 amino acid linker to the chip. While 10,000 seems like a lot of peptides, it is a tiny fraction (actually 10^-18
of the 2^17 * 10^17 = 1.3 * 10^22 possible 17 amino acid peptides).

The blood was first diluted 500x so blood proteins other than antibodies don’t bind significantly to the arrays. The assay is disease agnostic. The pattern of binding of a given person’s blood to the chip is called an immunosignature.

What did they measure? 20 samples from each of five cancer cohorts collected from multiple geographic sites and 20 noncancer samples. A reference immunosignature was generated. Then 120 blinded samples from the same diseases gave 95$% classification accuracy. To investigate the breadth of the approach and test sensitivity, the immunosignatures 75% of over 1,500 historical samples (some over 10 years old) comprising 14 different diseases were used as training, then the other 25% were read blind with an accuracy of over 98% — not too impressive, they need to get another 1,500 samples. Once you’ve trained on 75% of the sample space, you’d pretty much expect the other 25% to look the same.

The immunosignature of a given individual consists of an overlay of the patterns from the binding signals of many of the most prominent circulating antibodies. Some are present in everyone, some are unique.

A 2002 reference (Molecular Biology of the Cell 4th Edition) states that there are 10^9 antibodies circulating in the blood. How can you pick up a signature on 10K peptides from this. Presumably neoAntigens from cancer cells elicit higher afifnity antibodies then self-antigens. High affiity monoclonals can be diluted hundreds of times without diminishing the signal.

The next version of the immunosignature peptide microArray under development contains over 300,000 peptides.

The implication is that each cancer and each disease produces either different antigens and or different B cell responses to common antigens.

Since the peptides are random, you can’t align the peptides in the signature to the natural proteomic space to find out what the antibody is reacing to.

It’s a completely atheoretical approach to diagnosis, but intriguing. I’m amazed that such a small sample of protein space can produce a significant binding pattern diagnostic of anything.

It’s worth considering just what a random peptide of 17 amino acids actually is. How would you make one up? Would you choose randomly giving all 20 amino acids equal weight, or would you weight the probability of a choice by the percentage of that amino acid in the proteome of the tissue you are interested in. Do we have such numbers? My guess is that proline, glycine and alanine would the most common amino acids — there is so much collagen around, and these 3 make up a high percentage of the amino acids in the various collagens we have (over 15 at least).