Please post your comments on the class and/or readings for October 23rd here.
5 thoughts on “Statistical learning and word segmentation”
Lisa
In Aslin and Newport (2012), the authors make mention of perceptual salience as a factor in constraining statistical pattern learning, giving examples of when this can occur. Obviously this makes sense when salience is highly predictive of the pattern to be learned, but what do the authors suspect is the contribution of salience alone? In other words, is it possible for salience to be ignored if an experiment is designed to make it utterly non-predictive? How do Aslin and Newport define perceptual salience?
I’ll admit that I am asking this question partially out of personal interest (I am working on a study where we are trying to disentangle the influence of cue “salience”, cue discriminability, and cue predictability on judgment).
Johnson and Tyler (2009) suggest that one of the ways that the infants could learn word segmentation is by attending to phrase level prosodic boundaries. This statement reminded me of an article by R. Santos “Bootstrapping in the acquisition of word stress in Brazilian Portuguese” Journal of Portuguese Linguistics 2(1), 93–114. It suggests children acquiring Brazilian Portuguese might start not from the bottom of prosodic hierarchy (as suggested by Demuth, for example) but from the top. Santos’ article presents evidence that BP acquiring children start with productions that correspond more to phonological phrases or parts of phonological phrases than to words, and with time they “zoom in” the precise productions. I couldn’t access the original article, but I found a link to a later work which mentions these findings: http://www.let.leidenuniv.nl/pdf/lucl/lwpl/2.1/santos.pdf
As mentioned in class, one extreme argument posits that differences in transitional probabilities are all that are needed for word learning to occur. Isn’t this just a fancy way of saying that learning will be better for patterns or sequences that are repeated more often than for patterns or sequences that are repeated less often? When studies attempt to isolate effects of “transitional probabilities†by eliminating all other “cues†or differences from the stimuli, are they isolating our ability to learn patterns based on repetition of those patterns, or are they simply isolating our ability to learn repeating patterns when the signal to noise ratio is near floor? If the latter is true, the paradigm doesn’t seem very useful. The problem with stripping stimuli of all other cues is the assumption that learning based on transitional probabilities is some kind of reducible or independent process. It is more likely that the additional cues facilitate learning based on transitional probabilities by increasing the signal to noise ratio, and allowing one to better detect the repeating pattern. A repeating signal that is easier to detect is likely easier to learn than a repeating signal that is harder to detect. An argument for cue-based learning is not necessarily independent of an argument for transitional-probability based learning. After all, a repeating signal that is easy to detect (i.e., contains additional cues) would be better learned that an equally strong signal (i.e., contains the same cues) repeated fewer times. If there is such an interaction, if cues and transitional probabilities are not independent when it comes to learning, why continue using a paradigm that attempts to isolate independent effects?
Ben, that’s an excellent point I think comes up surprisingly often in a lot of different literatures. There’s an instinct to attempt to reduce complex interactions and find out what is “really” happening by pretending we can pull a Physics and posit a spherical cow rolling down a frictionless hill.
I think it makes more sense to attempt to model your hypothesis in a lot of cases. It’s going to be difficult to ever determine you’ve finally eliminated all cues while also not affecting the signal to noise ratio, I think it’s a more achievable goal to describe a detailed hypothesis of the process at work and test it against existing data.
Dissociation paradigms seem to lead to 30 year long unsatisfying arguments about what really happened.
It seems to me that one reason people try so hard to isolate transitional probabilities from other learning mechanisms is that this is almost an ideological debate. Generative linguists claim everything is abstract rules, while others claim that everything is statistical. Since it’s as much ideological as intellectual at this point, it makes sense that people are going to great lengths to attempt to separate the two — there’s less room for subtle distinctions when you have a lot of polarized points of view in the conversation.
In Aslin and Newport (2012), the authors make mention of perceptual salience as a factor in constraining statistical pattern learning, giving examples of when this can occur. Obviously this makes sense when salience is highly predictive of the pattern to be learned, but what do the authors suspect is the contribution of salience alone? In other words, is it possible for salience to be ignored if an experiment is designed to make it utterly non-predictive? How do Aslin and Newport define perceptual salience?
I’ll admit that I am asking this question partially out of personal interest (I am working on a study where we are trying to disentangle the influence of cue “salience”, cue discriminability, and cue predictability on judgment).
Johnson and Tyler (2009) suggest that one of the ways that the infants could learn word segmentation is by attending to phrase level prosodic boundaries. This statement reminded me of an article by R. Santos “Bootstrapping in the acquisition of word stress in Brazilian Portuguese” Journal of Portuguese Linguistics 2(1), 93–114. It suggests children acquiring Brazilian Portuguese might start not from the bottom of prosodic hierarchy (as suggested by Demuth, for example) but from the top. Santos’ article presents evidence that BP acquiring children start with productions that correspond more to phonological phrases or parts of phonological phrases than to words, and with time they “zoom in” the precise productions. I couldn’t access the original article, but I found a link to a later work which mentions these findings: http://www.let.leidenuniv.nl/pdf/lucl/lwpl/2.1/santos.pdf
As mentioned in class, one extreme argument posits that differences in transitional probabilities are all that are needed for word learning to occur. Isn’t this just a fancy way of saying that learning will be better for patterns or sequences that are repeated more often than for patterns or sequences that are repeated less often? When studies attempt to isolate effects of “transitional probabilities†by eliminating all other “cues†or differences from the stimuli, are they isolating our ability to learn patterns based on repetition of those patterns, or are they simply isolating our ability to learn repeating patterns when the signal to noise ratio is near floor? If the latter is true, the paradigm doesn’t seem very useful. The problem with stripping stimuli of all other cues is the assumption that learning based on transitional probabilities is some kind of reducible or independent process. It is more likely that the additional cues facilitate learning based on transitional probabilities by increasing the signal to noise ratio, and allowing one to better detect the repeating pattern. A repeating signal that is easier to detect is likely easier to learn than a repeating signal that is harder to detect. An argument for cue-based learning is not necessarily independent of an argument for transitional-probability based learning. After all, a repeating signal that is easy to detect (i.e., contains additional cues) would be better learned that an equally strong signal (i.e., contains the same cues) repeated fewer times. If there is such an interaction, if cues and transitional probabilities are not independent when it comes to learning, why continue using a paradigm that attempts to isolate independent effects?
Ben, that’s an excellent point I think comes up surprisingly often in a lot of different literatures. There’s an instinct to attempt to reduce complex interactions and find out what is “really” happening by pretending we can pull a Physics and posit a spherical cow rolling down a frictionless hill.
I think it makes more sense to attempt to model your hypothesis in a lot of cases. It’s going to be difficult to ever determine you’ve finally eliminated all cues while also not affecting the signal to noise ratio, I think it’s a more achievable goal to describe a detailed hypothesis of the process at work and test it against existing data.
Dissociation paradigms seem to lead to 30 year long unsatisfying arguments about what really happened.
It seems to me that one reason people try so hard to isolate transitional probabilities from other learning mechanisms is that this is almost an ideological debate. Generative linguists claim everything is abstract rules, while others claim that everything is statistical. Since it’s as much ideological as intellectual at this point, it makes sense that people are going to great lengths to attempt to separate the two — there’s less room for subtle distinctions when you have a lot of polarized points of view in the conversation.