Author Archives: Joseph Pater

More on vowel category learning

I came across two more papers that are very much worth taking a look at if you are interested in modeling vowel category learning. This paper by Schwartz and colleagues proposes a model of speech perception with what looks like an interesting connection between production and perception. Relevant to our concerns, it has a brief discussion to feature economy in vowels, with an early Ohala reference, and also some fascinating data on speaker-to-speaker variation in vowel height boundaries (which is correlated with production differences!) This paper by Sonderegger and Yu proposes a Bayesian analysis of compensation for coarticulation, and discusses some possible extensions to the modeling of change at the end. It builds on the work by Feldman and colleagues that I mentioned in class.

Feature economy in vowels?

Seth Cable and Brian Dillon had some interesting comments and questions about feature economy in vowels. I’ve pasted in the discussion below, and just want to add some references here. In a paper in a recent volume, Mackie and Mielke find that feature economy holds of vowel systems that emerge from simulations, even when explicit features aren’t used (!). That paper cites de Boer (2000) on modeling the emergence of vowel inventories, and other work in this vein is cited in the Boersma and Hamann paper we are reading this week (see also B&H for plenty of relevant discussion on dispersion). Brian brings up Mixture of Gaussian models of vowel category learning (see this week’s Kirby paper for further references). I suspect that the intersection between that work and iterated learning that he suggests below hasn’t really been explored yet. Here‘s Brian and colleagues’ Inuktitut paper, and here is a paper that talks about iterated language learning in a Bayesian framework. Also, here is a paper that derives a non-linguistic simplicity bias from a maximally uncommitted prior, and here is a paper that talks about iterated learning using a stipulated simplicity prior (the paper that some of us read with Micheal Lavine of Statistics last year).

*****

From Seth: It occurred to me randomly today that vowel systems aren’t generally model pictures of feature economy. Rather, folks tend to think that there’s strong pressure to keep vowels maximally distinct, leading ideally to a three-vowels system of [high front unrounded], [mid low unrounded] [high back rounded].

In fact, my hunch is that its comparatively rare for vowel systems to be perfectly economical wrt features, so that the language exhibits every possible combination of [+/- front], [+/- high], [+/- round].

If that’s right, what’s going on here? Is there some kind of countervailing pressure to keep vowels distinct? How can we model the interaction between these two pressures?

****

From Brian: 

If that’s right, what’s going on here? Is there some kind of countervailing pressure to keep vowels distinct?

I think that this is a huge part of this. Vowels that are close together are difficult to discriminate and difficult to acquire phonetic boundaries for and there are a bunch of models that formalize this intuition (Dispersion Theory a la Fleming, Wedel’s agent-based modeling based on communicative efficiency). That’s part of the reason we chose Inuktitut as our test case for modeling the acquisition of vowel categories… with a nice 3/5 vowel system, it’s not too hard to get Mixture Model-type techniques to correctly categorize the vowel space. The difficulty in acquiring vowels in this kind of unsupervised way grows a *lot* when you crowd the space even just a little bit more.

How can we model the interaction between these two pressures?

That’s a good question, I don’t know of any work that tries to explore both effects at once. You’re right to point out that very simple economy metrics in these two domains kind of point in different directions, though I think they may actually interact in important ways throughout the learning process rather than being opposing forces on the phonological inventories of a given language. I had a hunch once that the drive towards of feature symmetry in a phonological system actually helped alleviate some of the discriminability issues that drive dispersion-type effects, but I never got around to actually figuring out how to put together the model. So, Turkish front vowels are acoustically a mess, in the sense that front rounded /ø/ and /e/ are almost right on top of each other… and /ø/ is about a tenth as frequent as /e/. This made it a nightmare of a learning problem from a machine learning point of view, and a vowel that should’ve been eliminated from the Turkish system on purely dispersion type grounds. But if the Turkish learner somehow thought “gee, it’d be great if there was a FRONT rounded mid vowel to round out my feature system”, the idea went, then the learner might have an easier time honing in on that mysterious vowel by actively ‘searching’ for a vowel in the space of /ø/…. but of course all of that is theorizing about what might happen in a language that actually does have a perfectly symmetric feature system in its vowels.
But I guess if that were actually what were happening in the acquisition process, then I would predict a tendency towards more feature-symmetric vowel systems… which might be comparatively rare as you say. It’d be really interesting to model how strong each of these pressures must be in relation to the other to drive the kinds of patterns we see cross-linguistically.

******

From Seth: One quick thought about using dispersion theory, though, is that it seems like there might be a challenge encoding the inputs, just because dispersion theory constraints evaluate entire inventories, whereas feature economy was emerging as a result of learning individual segments…

******

From Brian:

w.r.t. Seth’s comment: I’m not sure I see the challenge encoding the inputs, but I suspect that’s because I’m assuming that dispersion-theoretic pressures aren’t necessarily overtly expressed in any grammatical calculation (hence there is no explicit ‘dispersion-enforcing constraint to speak of).
In that case my reference to Fleming is misplaced, I suppose! This is an mostly unexamined intuitive hunch, though, I’ll admit. It just seems that the observations that dispersion theory attempts to account for could find a number of explanations along the sort that we’ve been discussing in class: i.e. being the result of communicative efficiency and learnability, without being explicitly grammatical represented (necessarily). Here’s a simulation someone could run to test this idea: do a Kirby-style iterated generational learning simulation of the acquisition of vowel categories using EM-fit mixture models. I’d wager that the vowel centers would drift apart over time in the way that mimic dispersion theory… Now that I say that, I think there’s some stuff out there on this that I’m unaware of. I want to say one of Pierrehumbert’s students did something like this, but my memory fails me right now.
For what it’s worth, I have code that people could use if they want to learn how these models are fit, and there are plenty of off-the-shelf tools in R that people could use if they wanted to do some modeling without going (fully) down the mixture model rabbit hole.
****
From Seth: Another couple thoughts:
– Ideally, we’d want to see that, though dispersion forces (whatever those turn out to be) favor the restriction of [round] to back vowels/consonants, within the back segments, the forces of feature economy favor the appearance of [round] on *all* back vowels/consonants.
– To my limited knowledge, the only vocalic feature that one really finds non-economical use of is [round]. Just about any other feature contrast I can think of (in my own limited knowledge) is usually balanced across the segments (e.g., ATR, breathiness, creakiness, length). If this is right, it would be really neat to see this emerge as well…