Don’t Cheat Your Way to High-Level Listening Comprehension

(Note: This is an edited version of a post that I originally put here and here. Both of those threads include useful discussions and important clarifications.)

A YouTuber named MattVSJapan once mentioned that reading a lot can be a highly effective method for improving one’s listening comprehension, but that it’s ultimately a bad way of doing so, at least if your goal is to end up with a native-like accent.

His argument is as follows:

  • If you read a lot before you listen enough, you necessarily make up what the language sounds like in your head as you subvocalize during reading. Instead of hearing in your head what you’ve heard people say, you hear something that’s to some extent an arbitrary product of your own imagination. For natives, they start reading once they’ve learned to listen and speak well; for many foreigners, on the other hand, their reading ability develops far past their listening ability, and thus when they subvocalize they create in their head a voice that’s not exactly a realistic model of how natives speak.
  • When you read a lot, you burn into your mind a lot of sentence patterns. And then when you listen, your brain can make up for the gaps in your raw listening ability by systematically forming educated guesses about what words are being spoken in what order, based on what would make sense in the given situation.
  • Adding these two points together, we can derive this inference: Once you’ve spent a long enough time reading a ton but not listening very much, you’ve not only created in your head a partially imagined version of the sound system of the language, but you’ve also removed your brain’s natural incentive to move toward a realistic model of the sound system. Since your brain has a strong predictive base in the language, your brain is able to fill in the gaps created by its deficiency in raw listening ability, and thus no longer has the incentive to increase that raw listening ability. Once you reach this point, no matter how much you go back and listen, your brain will default to taking real input, using its powerful predictive powers to systematically fill in the gaps that your ear misses, and then converting it into the partially imagined sound structure you came up with while subvocalizing.
  • And finally, we must take into account that learning to hear the nuance in a language’s sound system is a necessary pre-requisite for learning to produce that nuance in one’s own speech. When speaking, our ability to hear properly is a major part of the trial-and-error system we use to improve our accent over time. If we’re unable tell the difference between native-like pronunciation of a certain sound and our foreign-like pronunciation, then we have little hope of moving toward native-like pronunciation.

    Thus, with our brain systematically converting real input into our partially imagined language, our accent stagnates. We end up with a foreign accent that gets very hard to fix, since we’re ultimately hearing not precisely what’s said but instead something that’s to some extent based on our imagination.

To be clear, I should mention that systematically making up for the gaps in what your ears directly pick up by taking the parts you did hear and then auto-completing the parts that you didn’t based on the context, the meaning you think the speaker is trying to convey, and the common patterns of words, is actually how listening comprehension is supposed to work in your native language, and thus how it should work in any foreign language that you learn to a high level. This notwithstanding, however, I still see it as a mistake to gain this pattern-completing ability by reading a lot (like most foreigners), rather than via listening (like all natives). While the endpoint of language acquisition should land you in a place where your listening comprehension is excellent in large part because you’ve developed a powerful predictive base within the language, natives necessarily take one path (reaching fluency by listening, only afterwards learning to read), and foreigners tend to take a different path (spending a huge amount of time reading before reaching fluency). The former works properly, while the latter systematically damages one’s ability to form a proper native-like accent. There’s a major difference between a native hearing a sentence just fine even though one of the words was mumbled by the speaker (a natural situation), and a foreigner hearing a sentence well enough for proper listening comprehension although they didn’t hear some of the words with enough nuance such that they would be able to re-produce those words themselves in a way that sounds native or close to native (an unnatural situation caused by problematic methods).

I believe that this line of inquiry may reveal to us the foundation of why foreign speakers generally end up with foreign accents, while native speakers seamlessly go from butchering the language as a small child to producing all the sound patterns properly as an adult. My basic hypothesis is this: How tuned your ears must be to initially acquire the language through listening (rather than through reading) is much higher than how tuned they must be to participate in conversations including rapid speech about complex topics, and thus your speech will sound much more natural if you learn first and foremost through listening and then participate in such conversations, than if you acquire a large chunk of the language through reading and then you participate. Natives acquire certain aspects of the sound structure not because they need to know them once they’re an adult, but rather because they needed to know them in the past in order to make it through the early stages of language acquisition as a child. Or, to be more precise, in many cases would would need such skills as an adult, since other natives would likely criticize them if their speech didn’t sound natural, but instead of waking up one day and realizing that they should probably watch Dogen’s course on pitch accent, they would have already acquired the system naturally. There’s a difference between the pressures that lead to acquisition during the early stages, and the point of the acquisition itself once you’ve gotten to a high level.

When you listen to a language for the first time, you can’t even tell where one word ends and the other begins, but if you memorize thousands of words in writing then you essentially ‘cheat’. You learn how to take the the originally unbroken streams of sound that make up spoken sentences, and break them into a sequence of discrete words, without having to figure it out yourself through listening. To re-state the general hypothesis that I italicized in the previous paragraph in terms of this specific example, I should state that it seems likely that there’s a lot of nuance that one’s ear must be tuned to in order to initially take spoken sentences and chunk them into individual words, which would no longer be necessary once one knows enough words and sentence patterns to know what words are being said in what order based less on raw listening prowess and more on non-listening-based knowledge of vocabulary and sentence patterns. Natives would necessarily acquire this nuance since they learn through listening to their parents and other people around them, rather than through memorizing lists of words and reading widely before speaking properly, and thus nuance phonetic awareness then subconsciously informs their pronunciation, pitch accent, and so forth. On the other hand, the average foreign learner doesn’t have to go through this initial process of using only raw listening ability to identify the constituent parts of sentences (since vocabulary lists compiled by others perform that function for them), and as a result they’re likely to end up with a less nuanced ear for the sounds. Our mouth naturally follows our ears, and thus if our ears cheat then our pronunciation doesn’t develop properly.

To make this analysis more concrete, consider a technique that I’ve been using for quite a while: While most learners memorize long lists of words, learn a model for the grammar of the language, and do a lot of reading before they’re able to speak very well, I’ve done most of my acquisition through just listening. When I want to learn new words, I listen to audio (e.g., a YouTube video where a Japanese person is talking about something), and then I try to understand; if a word comes up that I don’t know, I look it up with an app on my phone and then later add it to my SRS. It might seem roundabout to look for words in the wild, especially since I might listen to a video and not make out any new words. Surely I could just grab a frequency list and memorize a bunch of words, which I could then solidify through listening and speaking! But there’s good reason for what I do. The simple fact is that with this method, I’m unable to learn any new words that I can’t hear well enough in normal speech to then look up. This means that it’s unlikely that I’ll learn any word that my ears aren’t pretty well tuned to, since words like that would be hard to hear and then look up. The overall effect of this method is that I end up with a native-like hurdle for learning about the existence of a word and associating it with a meaning: Like a small child growing up immersed in the language, I can learn words only if I can hear them properly. Although my pronunciation, pitch accent, and so on in Japanese certainly have plenty of issues in need of smoothing out, without explicitly working on my accent this method has smoothly and naturally resulted in speech that’s much more natural than that of the average foreigner.

I originally started thinking about this when I asked myself the question: Why do native speakers acquire pitch accent without trying, whereas for foreigners it often goes over their head unless they look into it specifically? Certainly you don’t need a solid ear for pitch accent to understand native speech, so why do natives learn it so seamlessly? While this is an experimental thought, I wondered whether what’s going on is that early in the acquisition process for natives (and those employing a method that’s supposed to approximate native-like acquisition), attention paid to pitch accent helps the learner with tasks like taking the long sequence of phonemes making up a given sentence and chunking them into individual words, and therefore more foreigners wouldn’t be subconsciously incentivized to acquire a proper ear for pitch accent simply because they ‘cheat’ in this regard by having other people who have already learned the language compile lists of words for them that they can then drill). That is, natives acquire pitch accent by necessity, while foreigners circumvent this and thus end up with good listening comprehension despite not having developed an ear for pitch accent.

In light of this theory, I would recommend acquiring the fundamentals of Japanese through a method called repetitive listening. This is where you search for spoken content that’s interesting to you, and then listen to it over and over until you master it. For example, let’s say you find a YouTube video that’s 5-20 minutes long where a Japanese person is telling a story. You listen to it once and confirm that you’re interested in understanding it. You could then put it on a portable mp3 player, and listen to it a few times every day while taking a walk. Whenever a word comes up where you don’t know the meaning but you can make out the sound structure well enough to look it up, then you can look it up with an app like Imiwa. Since Imiwa conveniently saves your search history, every few days you can take your search history and add it to an SRS like Anki. If you use this technique on a regular basis for a long time, it will lead to you acquiring thousands of new words not only in context but also in a way where you weren’t able to ‘cheat’; you acquired them only because your ears were ready for them, and no sooner. This will ensure that you’ll develop listening comprehension in a way that’s natural, rather than using a volatile shortcut that harms you speech. And then once you get to a reasonably high level with listening comprehension, you can safely and effectively move onto incorporating a lot of reading.