I’d like to raise as a discussion topic the question of what the data are that we are trying to explain in generative phonology. In my view, the lack of clarity about this issue is a bigger foundational issue in our field than the lack of clarity about the goals we are pursuing, one of the discussion points I raised in my mfm fringe workshop presentation. It’s also a foundational issue not only for what I called Classical Universal Phonology in that presentation, but for just about any approach to phonology one can imagine. I should be clear that I don’t think that there needs to be a uniform set of data or goals. Rather, I think we’d be making quicker progress towards our broader shared goals of understanding the formal structure of phonologies, and explaining learning and typology, if we made our commitments in these respects more explicit in our work.
To get the discussion going, let me repeat the worry I expressed in the mfm fringe discussion, and mention some other data-related points that came up. When Marc van Oostendorp pressed me on my assertion that data issues were foundational issues, I brought up the lack of a definition of productivity as an example. It’s unfortunately too common that when an analysis or theory fails to capture some data pattern, the claim is made that the pattern is unproductive (e.g. that there are exceptions, that there are no alternations or that they are limited in some way, etc.), without applying the same scrutiny and criteria to the data that the theory is capturing. Probably even more common is that exceptions or variation are abstracted from, again without any clear criteria on when that can be done. My own belief is that productivity is gradient (see Hayes’ textbook ch. 9), and that we need theories that capture that gradience. But whether we are working with theories that are categorical or gradient in this respect, we need to define productivity if we are going to use it as a criterion for what data we need to explain.
In his question period at the fringe, Michael Becker pressed his interlocutors to provide evidence that the generalizations they saw in existing alternations were in fact encoded as generalizations in speakers’ minds. Becker’s approach, like that of a lot of other current work, is to test productivity experimentally. I’m on board with that program, but I’m also on board with good old analysis of corpus data (where ‘corpus’ includes the data from grammars and dictionaries that phonologists typically study), and I’m starting to get worried about what to do when the two sets of data point in very different directions. For example, the ‘stress heavy if penult’ part of the Latin stress rule is a nearly exception-free pattern in unsuffixed nouns in English. But as Claire Moore-Cantwell (p.c.) reports, it seems that it’s not particularly productive in nonce word productions/judgments. Claire has some good ideas about how to relate the corpus data to the judgments via learning, but it’s clear that the grammar is going to look very different from those posited for English from Chomsky and Halle (1968) onwards.
Wendell Kimper mentioned in his talk the issue that the set of attested human languages appears to be a small sample from the space of possible human languages. There are various kinds of statistical measures and data controls that we can use to determine how robust the typological generalizations are that we observe. But Kimper also reports that vowel harmony looked at that way may provide relatively little information, since many of the patterns of each type of harmony come from the same language families. My gut feeling, like Wendell’s I think, is that in those circumstances we should still keep going with the usual practice of just making an attested/unattested cut, and hoping that we are modeling signal rather than noise. But it is a worry, and probably one of the reasons that it’s good that we’re not putting all of our eggs in the typology modeling basket. A possible strategy is to focus on typological claims with a relatively large scope, for example, the size of stress windows, or the absence of sour grapes-style harmony (and the presence of spreading up to a blocker).