Category Archives: Computational linguistics

Pater and O’Connor awarded NSF conference grant

Joe Pater (PI) and Brendan O’Connor (Co-PI, Computer Science) have been awarded an NSF Conference Grant entitled “Perceptrons and Syntactic Structures at 60: Computational Modeling of Language”. The meeting will bring together leading researchers in cognitive science and artificial intelligence who specialize in the integration of linguistic theory with statistical approaches, especially neural networks. It will be held in conjunction with the first meeting of the Society for Computation in Linguistics, at the LSA in Salt Lake City in January.

Society for Computation in Linguistics established

From Gaja Jarosz:

The Society for Computation in Linguistics (SCIL) is devoted to facilitating and promoting research on computational and mathematical approaches in Linguistics. SCIL aims to provide a central forum for exchange of ideas and dissemination of original research results on computational approaches in any area of linguistics. These areas include but are not limited to computational models of human linguistic behavior, computational implementations of linguistic theories, and research on the computational properties of human language. In addition to providing a forum for researchers already working in these areas, SCIL seeks to provide an accessible venue for linguists wishing to learn more about computational methods and their applications in linguistics. In addition, SCIL aims to facilitate productive exchange between linguists and computer scientists and engineers working on language technologies. To these ends, SCIL hosts regular meetings (the first of which will be co-located with LSA 2018 in Salt Lake City, Utah) that feature high-quality research presentations and peer-reviewed proceedings to be published with the Association for Computational Linguistics (ACL) Anthology.

Emily Morgan in Psycholinguistics Friday at 10 and CLC Monday at 11

Emily Morgan will be speaking to the Psycholingusitics workshop next Friday March 3rd at 10 am in ILC N400. The title and abstract are below. We’ll also be discussing a (very) related paper of hers and Roger Levy’s in preparation for that visit in a Computational Linguistics Community meeting Monday Feb. 27th at 11 am in ILC N451. The paper is available here:

http://idiom.ucsd.edu/~rlevy/papers/morgan-levy-2015-cogsci.pdf

Title: Generative and Item-Specific Knowledge in Language Processing

Abstract: The ability to generate novel utterances compositionally using generative knowledge is a hallmark property of human language. At the same time, languages contain non-compositional or idiosyncratic items, such as irregular verbs, idioms, etc. In this talk I ask how and why language achieves a balance between these two systems—generative and item-specific—from both the synchronic and diachronic perspectives.

Specifically, I focus on the case of binomial expressions of the form “X and Y”, whose word order preferences (e.g. bread and butter/#butter and bread) are potentially determined by both generative and item-specific knowledge. I show that ordering preferences for these expressions indeed arise in part from violable generative constraints on the phonological, semantic, and lexical properties of the constituent words, but that expressions also have their own idiosyncratic preferences. I argue that both the way these preferences manifest diachronically and the way they are processed synchronically is constrained by the fact that speakers have finite experience with any given expression: in other words, the ability to learn and transmit idiosyncratic preferences for an expression is constrained by how frequently it is used. The finiteness of the input leads to a rational solution in which processing of these expression relies gradiently upon both generative and item-specific knowledge as a function of expression frequency, with lower frequency items primarily recruiting generative knowledge and higher frequency items relying more upon item-specific knowledge. This gradient processing in turn combines with the bottleneck effect of cultural transmission to perpetuate across generations a frequency-dependent balance of compositionality and idiosyncrasy in the language, in which higher frequency expressions are gradiently more idiosyncratic. I provide evidence for this gradient, frequency-dependent trade-off of generativity and item-specificity in both language processing and language structure using behavioral experiments, corpus data, and computational modeling.

CS 585 Natural Language Processing at Data Science Tea Tues. Dec. 13, 2:45

——————————————————————————–
What: tea, posters and conversations about NLP
When: Tuesday, Dec 13, 2:45 – 3:45 pm
Where: Computer Science Building Rooms 150 & 151
Who: You!  Especially MS & PhD students and faculty interested in data science.
——————————————————————————–

Join us at Data Science Tea for a poster session featuring course projects from Professor Brendan O’Connor’s CS 585 Introduction to Natural Language Processing, including 60+ posters on topics like sentiment analysis, sarcasm detection, identifying portmanteaus, analyzing song lyrics, creating timelines from news, detecting bullying tweets, and more, using machine learning and computational linguistic methods.

Amanda Doucette takes a position with Originate

Amanda Doucette has just accepted a software engineering position with Originate, in New York City starting July 2017 (recent PhD Presley Pizzo is also with this company). Amanda is completing majors in Computer Science and Linguistics, and is currently working on an Honor’s Thesis in computational phonology, applying Recurrent Neural Networks to phonological learning.

 

Doucette

LSA Special Session on Learning Lexical Specificity in Phonology

Claire Moore-Cantwell (UMass PhD 2016) and Stephanie Shih have organized a special session at the 2017 LSA, which features several UMass graduates as speakers and discussants. See below for the schedule and a summary by Claire and Stephanie.

When/where: Friday, January 6, 2:00pm to 5:00pm @ JW Grand Ballroom 7
Link to LSA website: http://www.linguisticsociety.org/session/symposium-learning-lexical-specificity-phonology

Schedule

Introduction by Joe Pater (2:00-2:10)

Part 1. Allomorphy & Alternations (2:10-3:35)

  • Michael Becker: Affix-specificity makes stress learnable
  • Brian W. Smith: Using phonotactics to learn affix-specific phonology
  • Discussion by Sharon Inkelas, Kie Zuraw

Part 2. Items & Classes (3:35-5:00)

  • Claire Moore-Cantwell: Concurrent learning of the lexicon and phonology
  • Stephanie S. Shih: Learning lexical classes for class-sensitive phonology
  • Discussion by Andries Coetzee, Jennifer Smith

Summary

The interaction of the phonological grammar with the lexicon is a necessary component in the phonological acquisition process and its end state, since the lexicon shapes and is shaped by phonology at potentially every stage of learning. The phonological grammar and lexicon share a complex relationship, as illustrated by the numerous phenomena in which phonological behavior exhibits lexical specificity: morphologically-conditioned phonology, lexical class-sensitive phonology, lexical exceptions to phonological patterns, and phonological variation in the lexicon. This relationship has heavily influenced the development of morphophonological theory. The current state of the field presents new challenges to understanding grammar and the lexicon. Access to natural language quantitative data now allows us to observe not only the empirical extent of lexical specificity across a phonological system but also the push-pull between massive variation and systematicity that exists in natural languages. Newly available empirical tools such as corpus methods, machine learning, and experimental techniques have accelerated investigations of learning and acquisition, as have developments in understanding psycholinguistic influences on phonology. This symposium brings together work that leverages these modern empirical developments and situates this new work within the broader landscape of phonological theory.

The symposium will address the following issues of learning lexical specificity in the grammar: When and how does a learner learn lexical specificity? How does the learner manage lexical specificity and natural language variation? How does lexical sensitivity differ or remain the same for learning alternations and allomorphy versus static lexical phonotactics? What are the relevant lexical items and categories for phonology? How specific does lexical specificity have to be? What is the optimal balance in grammatical design between representational efficiency and predictive accuracy and robustness? How is the trade-off between complexity and adequacy managed in grammar and learning of lexically-sensitive phonological patterns? How do the developing grammar and lexicon interact in learning? How do features of the lexicon such as lexical frequency influence the grammar?

Thanks to Brian Smith for having put this information together in a post!

Jarosz at UCLA

Gaja Jarosz gave an invited colloquium presentation on “Sonority Sequencing in Polish: Interaction of Prior Bias and Experience” at UCLA Friday Nov. 18th. The abstract is below.

Abstract. Recent work on phonological learning has questioned the traditional view that innate principles guide and constrain language development in children and explain universal properties crosslinguistically. In this talk I focus on a particular universal, the Sonority Sequencing Principle (SSP), which governs preferences among sequences of consonants syllable-initially. Experimental evidence indicates that English, Mandarin, and Korean speakers exhibit sensitivity to the SSP even for consonant sequences that never occur syllable-initially in those languages (such as [nb] vs. [bn] in English). There is disagreement regarding the implications of this finding. Berent et al. (2007) argue that these results can only be explained with reference to an innate principle; however, Daland et. al (2011) show that computational models capable of inferring statistical generalizations over sound classes can detect evidence for these preferences based on related patterns in the language input (and therefore no reference to innate principles is required). Building on these studies, I argue that English is the wrong test case: it does not differentiate predictions of these two hypotheses. I examine learning of syllable structure phonotactics in Polish, a language with very different sonority sequencing patterns from English. Polish provides a crucial test case because the lexical statistics contradict the SSP, at least in part. I review developmental evidence indicating that children acquiring Polish are nonetheless sensitive to the SSP, producing larger sonority rises more accurately in spontaneous production (Jarosz 2015, submitted). I then present results from two experiments investigating adult Polish native speakers’ phonotactic knowledge. The findings indicate that Polish native speakers’ phonotactic preferences are sensitive to the SSP and that this SSP sensitivity is not predicted by the computational models that succeeded for languages like English, Mandarin, and Korean. This suggests a crucial role of an inherent bias or a constraint on generalization from the input. At the same time, native speakers’ sonority-sequencing preferences are not entirely expected on the basis of SSP alone, suggesting an important role for experience as well. I discuss implications for modeling of phonological learning.

Lisa Green featured in The Journal of Blacks in Higher Education

Green-LisaFrom the article (see also earlier post): “A new study by researchers at the University of Massachusetts at Amherst chronicles the use of dialect in online communications using the Twitter app. The authors examined more than 59 million tweets by 2.8 million Twitter users. The goal of the study was to identify online language usage by African Americans so that search engines like Google will be better able to serve a more diverse population of users. The data will help computer programs recognize words, phrases and language patterns that are associated with language spoken by African Americans.”