Category Archives: Computational linguistics

Linzen colloquium Friday April 17 at 3:30

Tal Linzen, Johns Hopkins University, will present “What inductive biases enable human-like syntactic generalization?” in the Linguistics zolloquium series at 3:30 Friday April 17. An abstract follows. All are welcome! The Zoom link has already beensent out on department mailing lists. If you did not receive it and would like attend, please email Brian Dillon for the link.

Abstract
Humans apply their knowledge of syntax in a systematic way to constructions that are rare or absent in their linguistic input. This observation, traditionally discussed under the banner of the poverty of the stimulus, has motivated the assumption that humans are innately endowed with inductive biases that make crucial reference to syntactic structure. The recent applied success of deep learning systems that are not designed on the basis of such biases may appear to call this assumption into question; in practice, however, such engineering success speaks to this question in an indirect way at best, as engineering benchmarks do not test whether the system in fact generalizes as humans do. In this talk, I will use established psycholinguistic paradigms to examine the syntactic generalization capabilities of contemporary neural network architectures. Focusing on the classic cases of English subject-verb agreement and auxiliary fronting in English question formation, I will demonstrate how neural networks with and without explicit syntactic structure can be used to test for the necessity and sufficiency of structural inductive biases, and will present experiments indicating that human-like generalization requires stronger inductive biases than those expressed in standard neural network architectures.

Prickett in Phonology

Brandon Prickett has just published “Learning biases in opaque interactions” in the latest issue of Phonology. Congratulations Brandon!

https://doi.org/10.1017/S0952675719000320

Abstract
This study uses an artificial language learning experiment and computational modelling to test Kiparsky’s claims about Maximal Utilisation and Transparency biases in phonological acquisition. A Maximal Utilisation bias would prefer phonological patterns in which all rules are maximally utilised, and a Transparency bias would prefer patterns that are not opaque. Results from the experiment suggest that these biases affect the learnability of specific parts of a language, with Maximal Utilisation affecting the acquisition of individual rules, and Transparency affecting the acquisition of rule orderings. Two models were used to simulate the experiment: an expectation-driven Harmonic Serialism learner and a sequence-to-sequence neural network. The results from these simulations show that both models’ learning is affected by these biases, suggesting that the biases emerge from the learning process rather than any explicit structure built into the model.

UMass at RecPhon 2019

Many UMass folks past and present were at RecPhon 2019: Recursivity
in phonology below and above the word, 21-22 November 2019, Universitat Autònoma de Barcelona, Bellaterra. A number of former UMass visitors were co-organizers: Eulàlia Bonet, Joan Mascaró, Francesc Torres-Tamarit.

Invited speakers and UMass alumni Junko Ito and Armin Mester presented Recursivity in phonology below the word, while invited speaker and UMass alumna Emily Elfner presented Match Theory and Recursion below and above the word: Evidence from Tlingit. Faculty member Kristine Yu presented Computational perspectives on phonological constituency and recursion and graduate student Leland Kusmer presented Minimal prosodic recursion in Khoekhoegowab. Former visitor Gorka Elordieta presented joint work with emeritus faculty member Lisa Selkirk: Phrasing unaccented words in a recursive prosodic structure in Basque.

UMass folks at RecPhon 2019

Eulàlia Bonet, Armin Mester, Emily Elfner, Junko Ito, Kristine Yu, Leland Kusmer, Gorka Elordieta, Joan Mascaró

UMass folks at RecPhon 2019

Francesc Torres-Tamarit, Emily Elfner, Junko Ito, Armin Mester, Kristine Yu, Leland Kusmer, Gorka Elordieta

SENSUS at UMass, April 18-19, 2020

UMass is hosting “Sensus: Constructing meaning in Romance” on April 18-19, 2020. This is a conference on the formal semantics and pragmatics of Romance languages.

Areas: theoretical semantics and pragmatics and their interfaces with other domains, experimental methodologies, fieldwork, the study of variation and computational approaches

Venue: Integrative Learning Center at UMass Amherst (the ILC is a fully accessible building)

Invited speakers:

Luis Alonso-Ovalle
(McGill University)

Mariapaola D’Imperio
(Rutgers University)

Donka Farkas
(UC, Santa Cruz)

Organizers: Ana Arregui, María Biezma, Vincent Homer and Deniz Özy?ld?z

Event sponsored by the Department of Linguistics and the Department of Languages, Literatures and Cultures of UMass Amherst

Contact us at sensus@umass.edu

Details can be found here: http://websites.umass.edu/sensus/

David Smith talk, Monday Nov 18

David Smith (https://www.khoury.northeastern.edu/people/david-smith/) will present “Textual Criticism as Language Modeling: Viral Texts, Networked Authors, and Computational Models of Information Propagation” at 4 pm Monday Nov. 18th in ILC N400. An abstract is below.

This presentation is to a joint meeting of the Initiative for Data Science in the Humanities, and the Data Science tea. If you have any questions, contact Joe Pater at pater@umass.edu. David will be available for half hour meetings from 1 – 3:30 in the Linguistics department – sign up here.

Abstract

The era of mass digitization seems to provide a mountain of source material for scholarship, but its foundations are constantly shifting. Selective archiving and digitization obscures data provenance, metadata fails to capture the presence of texts of mutable genres and uncertain authorship embedded within the archive, and automatic optical character recognition (OCR) transcripts contain word error rates above 30% for even eighteenth-century English. The condition of the mass-digitized text is thus closer to the manuscript sources of an edition than to a scholarly publication. On the computational side, models that treat collections as sets of independent documents fail to capture the processes by which new texts are generated from existing ones.

In this talk, I will discuss several aspects of our work on “speculative bibliography” with computational methods. Starting from a simple model of the composition of historical newspaper pages, with applications to text denoising, I describe models of how texts transform their sources, applied to modern science journalism, medieval Arabic historians, and the generically hybrid forms in nineteenth-century newspapers. I conclude by discussing methods for inferring network structure and mapping information propagation among texts and publications.

This is joint work with Ryan Cordell, Rui Dong, Ansel MacLaughlin, Abby Mullen, Ryan Muther, and Shaobin Xu.

Graf colloquium Friday Nov 8 at 3:30

Thomas Graf, Stony Brook University, will present “Subregular linguistics for linguists” in the Linguistics colloquium series at 3:30 Friday Nov 8. An abstract follows. All are welcome!

 

Abstract

Drawing from computational work that is known as the subregular program, I will argue against two received views in linguistics: “phonology and syntax are very different’ and “subcategorization is a solved problem”.

  1. Cognitive parallelism
    Subregular notions of complexity can be applied to strings as well as trees. Doing so reveals that phonology and syntax are remarkably similar (and those parallels even extend into morphology and semantics). For instance, islands and blocking effects are instances of the same computational mechanism.
  2. Subcategorization
    Subcategorization (or c-selection) is rarely studied by linguists, but it is actually a source of tremendous overgeneration. Once again subregular notions of complexity can be used to address this problem. This isn’t just a mathematical exercise, but makes concrete empirical predictions about the nature of category systems, subcategorization, the status of empty heads, the DP-analysis, DM-style roots, and once again highlights parallels to phonology.

The general upshot is that subregular concepts, despite their computational origin, are intuitive and linguistically fertile: they address conceptual issues, bridge gaps between linguistic subfields, and make concrete empirical predictions. Subregular linguistics is just linguistics with some computational flavor sprinkled on top.

Disclaimer: This talk is 100% formula-free.

Phonology/Phonetics/Psycholinguistics Guru: Matt Goldrick

This week (October 21-25) we will have a special visitor in the department, a Phonology/Phonetics/Psycholinguistics Guru, Matt Goldrick! Matt will be visiting the department all week. He will be giving two tutorials and a general talk (see below for schedule). Everyone in the department and beyond is welcome to attend all of these events.  The schedule is rather complicated so please read it carefully – all events are scheduled to take place in N400 on Monday, Tuesday, and Wednesday of next week. Both tutorials are about Gradient Symbolic Representations and involve some hands-on software applications – one is focused on Phonology and the other on Processing. The talk is intended to be a general talk for the whole department. Matt is also available for individual meetings while he is here – please contact him directly about that.

SCHEDULE

Talk – “The acoustic effects of blended representations: co-production”
Tuesday 1:30-2:30

Phonology Tutorial
Gradient Harmonic Grammar (gradient underlying representations and learning models for them)
Instructions: Bring a laptop that can access the internet; you’ll be using Google Sheets to aid in calculations of harmony for candidate sets.

Monday 1:15-2:30
Wednesday 12:30-2

Psycholinguistics Tutorial
Gradient Symbolic Processing (connectionist implementations of GSR and software for generation, learning, and parsing of CFGs)

Instructions: Bring a laptop with jupyter installed (https://www.anaconda.com/distribution/). You’ll need an environment with python 3, and you should have these libraries installed: numpy, matplotlib, pickle, re.
Monday 4-5:30
Wednesday 4-5:30