Tag Archives: post hoc propter hoc

Functional MRI research is a scientific sewer

First a primer about the science underlying functional Magnetic Resonance Imaging (fMRI). Chemists use MRI all the time, but they call it Nuclear Magnetic Resonance. Docs and researchers quickly changed the name to MRI because no one would put their head in something with Nuclear in the name.

There are now noninvasive methods to study brain activity in man. The most prominent one is called BOLD (Blood Oxygen Level Dependent), and is based on the fact that blood flow increases way past what is needed with increased brain activity. This was actually noted by Wilder Penfield operating on the brain for epilepsy in the 1930s. When a patient had a seizure on the operating table (they could keep things under control by partially paralyzing the patient with curare) the veins in the area producing the seizure turned red. Recall that oxygenated blood is red while the deoxygenated blood in veins is darker and somewhat blue. This implied that more blood was getting to the convulsing area than it could use.

BOLD depends on slight differences in the way oxygenated hemoglobin and deoxygenated hemoglobin interact with the magnetic field used in magnetic resonance imaging (MRI). The technique has had a rather checkered history, because very small differences must be measured, and there is lots of manipulation of the raw data (never seen in papers) to be done. 10 years ago functional magnetic imaging (fMRI) was called pseudocolor phrenology.

Some sort of task or sensory stimulus is given and the parts of the brain showing increased hemoglobin + oxygen are mapped out. As a neurologist as far back as the 90s, I was naturally interested in this work. Very quickly, I smelled a rat. The authors of all the papers always seemed to confirm their initial hunch about which areas of the brain were involved in whatever they were studying. Science just isn’t like that. Look at any issue of Nature or Science and see how many results were unexpected. Results were largely unreproducible. It got so bad that an article in Science 2 August ’02 p. 749 stated that neuroimaging (e.g. functional MRI) has a reputation for producing “pretty pictures” but not replicable data. It has been characterized as pseudocolor phrenology (or words to that effect). Keep reading you’re about to find out just why this was.

What was going on? The data was never actually shown, just the authors’ manipulation of it. Acquiring the data is quite tricky — the slightest head movement alters the MRI pattern. Also the difference in NMR signal between hemoglobin without oxygen and hemoglobin with oxygen is small (only 1 – 2%). Since the technique involves subtracting two data sets for the same brain region, this doubles the error.

Under two years ago, it was shown that 70% of people having functional MRIs (fMRIs) were asleep during the test, and that until then fMRI researchers hadn’t checked for it. For details please see
https://luysii.wordpress.com/2014/05/18/how-badly-are-thy-researchers-o-default-mode-network/. You don’t have to go to med school, to know that the brain functions quite differently in wake and sleep.

Recent work shows that functional MRI work is even worse. A devastating report in [ Proc. Natl. Acad. Sci. vol. 113 pp. 7699 – 7600, 7900 – 7905 ’16 ] showed that certain common settings in 3 software pacakages (SPM, FSL, AFNI) used to analyze fMRI data gave false positive results ‘up to’ 70% of the time. Some 3,500 of the 40,000 fMRI studies in the literature over the past 20 years used these settings. The paper also notes that a bug (now corrected after being used for 15 years) in one of them also led to false positive results.

Here’s a bit more detail on what they did. It turns out that analyzing one voxel (essentially a single MRI pixel) at a time produces valid results. The problem comes when multiple voxels (clusters) are analyzed together. Clusterwise inference considers both the strength of activity at spots throughout the brain as well as the size of the spots. When a parameter called the cluster defining threshold (CDT) is set too low, the analysis is more likely to be false positive. This was true for all 3 packages tested. Parametric statistical methods produce the problem (not for voxels but for clusters). It relies on Gaussian Random Field Theory (RFT) for clusters , which depends on two other assumptions (1) the spatial autocorrelation function has a squared exponential shape — e.g. Gaussian (2) the spatial smoothness of the fMRI signal is constant over the brain. Neither of these assumptions is correct. Those of you who’ve read Nassim Nicholas Taleb about the stock market know about ‘fat tails’. It turns out that the spatial correlation function has them. Here’s what a fat tail is all about. Human height goes fall quite nicely into a Gaussian distribution. There are 7 and 8 footers about but they are rare. If the human height distribution wasn’t Gaussian but had a fat tail, we’d see 12 and 15 footers.

If that wasn’t bad enough,the following is even worse (in my opinion). 40% of 241 recent fMRI studies didn’t report using well known methods for correcting for multiple testing. They may have done so, but every biomedical paper, and drug study says so explicitly. Not only that but drug studies are required to explicitly state the hypothesis (or hypotheses) they are testing.

This is probably why in the early days, fMRI researchers always confirmed their original hypothesis. They could test the massive fMRI for statistical rarity, and since the data was so large, find it and post hoc propter hoc publish it. Possibly they did so out of ignorance, but even so this is inexcusable