Gender representation in constructed example sentences – 2023 LSA Institute at UMass Amherst

Hadas Kotek

This talk surveys two projects concerned with the representation of women and men in example sentences in linguistics. (Collaborations with Paola Cépeda, Katharina Pabst, Kristen Syrett, Sarah Babinski, Rikker Dockum, and Christopher Geissler.)

The first half of the talk focuses on constructed example sentences in syntax textbooks. Following the 20th anniversary of Macaulay & Brice (1997: M&B)’s survey of example sentences in 11 syntax textbooks, we present an analysis of 6 recently published textbooks. We find that the gender skew and stereotypes reported in M&B are almost all still present: Among other findings, men are twice as likely to occur as subjects, agents, and experiencers. Men additionally engage in intellectual activities, have diverse occupations, and engage in violent activities.

The second project examines example sentences published in all papers published in Language, Linguistic Inquiry, and Natural Language & Linguistic Theory between 1997-2018. We find striking similarities to prior work, but are able to provide greater detail. Among our findings: men are more than twice as likely to occur as subjects, agents, and experiencers; men engage in violence and exhibit negative emotion while women are often referred to using kinship terms and express positive emotion. These trends remain stable over time and across journals, and do not vary by the language of example.

I conclude the talk by discussing the importance of raising awareness of biases in research and teaching materials among individual researchers and (especially) instructors, as well as how we can improve on the current standards. Time permitting, I will additionally discuss recent relevant findings concerning gender bias in Large Language Models such as ChatGPT, and the relevance of this kind of work for current advancements in NLP research.