Tag Archives: Mismatch negativity

How Infants learn language – V

Infants don’t learn language like neural nets do. Unlike nets, no feedback is involved, which amazingly, makes learning faster.

As is typical of research in psychology, the hard part is thinking of something clever to do, rather than actually carrying it out.

[ Proc. Natl. Acad. Sci. vol. 117 pp. 26548 – 26549 ’20 ] is a short interview with psychologist Richard N. Aslin. Here’s a link — hopefully not behind a paywall — https://www.pnas.org/content/pnas/117/43/26548.full.pdf.

He was interested in how babies pull out words from a stream of speech.

He took a commonsense argument and ran with it.

“The learning that I studied as an undergrad was reinforcement learning—that is, you’re getting a reward for responding to certain kinds of input—but it seemed that that kind of learning, in language acquisition, didn’t make any sense. The mother is not saying, “listen to this word…no, that’s the wrong word, listen to this word,” and giving them feedback. It’s all done just by being exposed to the language without any obvious reward”

So they performed an experiment whose results surprised them. They made a ‘language’ of speech sounds which weren’t words and presented them 4 per second for a few minutes, to 8 month old infants. There was an underlying statistical structure, as certain sounds were more likely to follow another one, others were less likely. That’s it. No training. No feedback. No nothin’, just a sequence of sounds. Then they presented sequences (from the same library of sounds) which the baby hadn’t heard before and the baby recognized them as different. The interview didn’t say how they knew the baby was recognizing them, but my guess is that they used the mismatch negativity brain potential which automatically arises to novel stimuli.

Had you ever heard of this? I hadn’t but the references to the author’s papers go back to 1996 ! Time for someone to replicate this work.

So our brains have an innate ability to measure statistical probability of distinct events occurring. Even better we react to the unexpected event. This may be the ‘language facility’ Chomsky was talking about half a century ago. Perhaps this innate ability is the origin of music, the most abstract of the arts.

How infants learn language is likely inherently fascinating to many, not just neurologists.

Here are links to some other posts on the subject you might be interested in.

https://luysii.wordpress.com/2013/06/03/how-infants-learn-language-iv/

https://luysii.wordpress.com/2011/10/10/how-infants-learn-language-iii/

https://luysii.wordpress.com/2010/10/03/how-infants-learn-language-ii/

https://luysii.wordpress.com/2010/09/30/how-infants-learn-language/

Why we imperfectly understand randomness the way we do.

The cognoscenti think the average individual is pretty dumb when it comes to probability and randomness. Not so, says a fascinating recent paper [ Proc. Natl. Acad. Sci. vol. 112 pp. 3788 – 3792 ’15 ] http://www.pnas.org/content/112/12/3788.abstract. The average joe (this may mean you) when asked to draw a random series of fifty or so heads and tails never puts in enough runs of heads or runs of tails. This leads to the gambler’s fallacy, that if an honest coin gives a run of say 5 heads, the next result is more likely to be tails.

There is a surprising amount of structure lurking within purely random sequences such as the toss of a fair coin where the probability of heads is exactly 50%. Even with a series with 50% heads, the waiting time for two heads (HH) or two tails (TT) to appear is significantly longer than for an alternation (HT or TH). On average 6 tosses will be required for HH or TT to appear while only an average of 4 are needed for HT or TH.

This is why Joe SixPack never puts in enough runs of Hs or Ts.

Why should the wait be longer for HH or TT even when 50% of the time you get a H or T. The mean time for HH and TT is the same as for HT and TH. The variance is different because the occurrences of HH and TT are bunched in time, while the HT and TH are spread evenly.

It gets worse for longer repetitions — they can build on each other. HHH contains two instances of HH, while alterations do not. Repetitions bunch together as noted earlier. We are very good at perceiving waiting times, and this is probably why we think repetitions are less likely and soon to break up.

The paper goes a lot farther constructing a neural model, based on the way our brains integrate information over time when processing sequences of events. It takes into consideration our perceptions of mean time AND waiting times. We average the two. This produces the best fitting bias gain parameter for an existing Bayesian model of randomness.

See, you’re not as dumb as they thought you were.

Another reason for our behavior comes from neuropsychology and physiological psychology. We have ways to watch the electrical activity of your brain and find out when you perceive something as different. It’s called mismatch negativity (see http://en.wikipedia.org/wiki/Mismatch_negativity for more detail). It a brain potential (called P300) peaking .1 -.25 seconds after a deviant tone or syllable.

Play 5 middle c’s in a row followed by a d than c’s again. The potential doesn’t occur after any of the c’s just after the d. This has been applied to the study of infant perception long before they can speak.

It has shown us that asian and western newborn infants both hear ‘r’ and ‘l’ quite well (showing mismatch negativity to a sudden ‘r’ or ‘l’ in a sequence of other sounds). If the asian infant never hears people speaking words with r and l in them for 6 months, it loses mismatch negativity to them (and clinical perception of them). So our brains are literally ‘tuned’ to understand the language we hear.

So we are more likely to notice the T after a run of H’s, or an H after a run of T’s. We are also likely to notice just how long it has been since it last occurred.

This is part of a more general phenomenon — the ability of our brains to pick up and focus on changes in stimuli. Exactly the same phenomenon explains why we see edges of objects so well — at least here we have a solid physiologic explanation — surround inhibition (for details see — http://en.wikipedia.org/wiki/Lateral_inhibition). It happens in the complicated circuitry of the retina, before the brain is involved.

Philosophers should note that this destroys the concept of the pure (e.g. uninterpreted) sensory percept — information is being processed within our eyes before it ever gets to the brain.

Update 31 Mar — I wrote the following to the lead author

” Dr. Sun:

Fascinating paper. I greatly enjoyed it.

You might be interested in a post from my blog (particularly the last few paragraphs). I didn’t read your paper carefully enough to see if you mention mismatch negativity, P300 and surround inhibition. if not, you should find this quite interesting.

Luysii

And received the following back in an hour or two

“Hi, Luysii- Thanks for your interest in our paper. I read your post, and find it very interesting, and your interpretation of our findings is very accurate. I completely agree with you making connections to the phenomenon of change detection and surround inhibition. We did not spell it out in the paper, but in the supplementary material, you may find some relevant references. For example, the inhibitory competition between HH and HT detectors is a key factor for the unsupervised pattern association we found in the neural model.

Yanlong”

Nice ! ! !