The University of Massachusetts Amherst
Categories
Uncategorized

AI music generation is now really, really good(*)

I just found out how good AI music generation has become, and have spent a little time finding out what I could about the technology. This is all completely new to me – I play and record music, and I know something about AI (mostly related to language), but I hadn’t previously looked into what the state of the art was in music generation in the current era of Large Language Models. I’m pretty blown away by how realistic sounding AI generated music now can be, in much the same way that I was blown away by how LLMs could generate English text that is indistinguishable from human-generated writing. There are undoubtedly interesting questions about *how* these models work, and to what extent they are using processes similar to humans, parallel to the questions linguists and computer scientists ask about LLMs. So far I haven’t been able to find out anything about how the models work at all, and assume they largely work in a similar way to the language systems. What I have been able to quickly find something out about are the societal implications of the emergence of this technology and the ethical and legal issues around its creation and use, and I’ll share some of that along with a few of my own initial thoughts. If I were teaching a course on AI, I’d definitely have plenty to work with for a few classes on music, and it looks to me like the case of music is a good illustration of some general points about AI and its implications.

So first, why do I think AI music generation is so good? It’s because I’ve just listened to some music that a couple people have produced using current music generation software. The platform I will focus on is the new version of Suno (v. 4, released in Beta Nov. 2024). Michael J. Epstein has made a series of Facebook posts on an experiment he did with creating a song with the new Suno, submitting it to Spotify playlist placement lists, and tracking its performance. His first post from Nov. 23, along with a Dec. 3 update is here, and his Jan. 3 update is here. The song currently has had 64,593 listens. As any musician who has music on Spotify knows, this is a lot. My band’s highest number is 5,604, and the reason for it getting that high made for another story.

You can hear the song in Spotify by following the link in the title above, and you can hear a preview here by pushing the play button. It starts with a finger-plucked acoustic guitar accompanying a female voice, with bass, electric guitar and drums coming in as the song progresses. As far as I can tell, nothing about the song sounds “fake”, more than how all recorded music is fake to varying degrees, especially now. There are no analogues of the “extra fingers” that sometimes tip you off that an image is AI-generated (in the Nov. 23 post, Epstein points to some recent work indicating that “[t]he days of extra fingers in AI art…are over”). Epstein notes that none of the playlist curators he submitted his song to “identified anything (at least out loud) as inauthentic about the music” (Jan. 6 comment on Facebook, not linkable).

Some more examples of authentic sounding AI-generated music, as well as plenty of food for thought on the societal implications of this new technology can be found in the above podcast that Epstein’s Jan. 3 post links to. In the second part of a Dec. 27 episode of “On the Media” entitled “How AI and Algorithms Are Transforming Music”, Mark Henry Phillips talks about the existential crisis that his own experiments with current music generation have created for him as a composer and producer of commercial music. The podcast includes samples of the music from his experiments, and compares them to his non-AI assisted productions in terms of their quality, and also in terms of the effort needed to produce them. Phillips suspects that he will soon no longer be able to make a living as a musician, as companies start to directly use the technology themselves.

Alongside pointing out the threats that this technology poses for the already precarious ability of musicians and music producers to make a living, Epstein and Phillips note its potential for their own creative practices. This software now allows you to upload recordings that it will extend. They both mention looking forward to using that technology to help finish off demos, and Phillips provides an example of a horn part that the software created for one of his songs. This resonated with me, since I have lots of my own compositions and productions in various stages of incompleteness, so I decided to try Suno myself.

I’m actually not sure if I will wind up using Suno or similar technologies in my own music work, for a few reasons. The first is that I enjoy making music largely because of how different it is from my academic day job in linguistics. I love getting together with the other people in my band and playing our songs – it’s just as much of a chance to hang out with friends as anything else. And when I’m playing music on my own, what I’m usually doing is fooling around on the guitar finding new riffs and chord progressions, and coming up with vocal parts (hence the pile of unfinished songs). I’m not averse to using technology – I enjoy working with Logic recording software for instance – but I find it hard to get going on that, especially after a day of doing my “real” work on the computer.

The second is that I didn’t find my initial experiments with Suno very inspiring. I tried uploading a bit of one of my Voice Memo recordings that had me singing with my acoustic guitar, strumming fast. I gave Suno the style prompts “indie rock, post-punk, indie pop” and asked it to extend it. I also gave it some prompts for lyrics. The extensions it created did sound like a human performer, but they were unusable to me. They were very bland sounding, both the style, which ended up being middle of the road pop-folk, and the lyrical and melodic content. I was using the free Suno, and this was the first thing I did. I don’t doubt that if I worked more with the paid version, I could get some useable ideas and sounds, based on Epstein and Phillips’ reports and results. But at least for now, that seems like too much work, and not the type of work I want to be doing when have time for music.

I also tried just having Suno generate a couple pieces of instrumental music based on style prompts. I thought these were terrible. Again, they sounded like “real” music (though less than Epstein and Phillips’ examples), but I didn’t think they were good examples of the styles, and I didn’t enjoy listening to them. The most egregious example was the response to “punk, 1970s, New York City”, which if forced to categorize myself, I’d call “video game hair metal”. This experience does make me wonder if there will still be a need for commercial music producers after all (though undoubtedly fewer of them, given the speed at which AI-assisted producers will be able to work).

Music generated by Suno v. 4 Jan. 6, 2025, with style prompts “punk, 1970s, New York City”.

The final reason that I may end up not using this technology is an ethical one. Suno and Udio are being sued by a group of major record labels in lawsuits co-ordinated by the Recording Industry Association of America (RIAA). In its response to the lawsuits (p. 9), Suno says that their model’s

 “training data includes essentially all music files of reasonable quality that are accessible on the open Internet, abiding by paywalls, password protections, and the like, combined with similarly available text descriptions.”

Suno’s response at the above link is worth reading in full, and a summary of it and of the RIAA response to the response, can be found in this article.

I very much value the protection of creators’ rights, and of their ability to make a living, so there is a big part of me that would be happy to boycott both music and language generation software, insofar as they interfere with these. But I have not thought nearly enough about these issues, and I am certainly onboard as well with the critiques of the music industry in the Suno response.

In his podcast, Phillips draws some analogies between how human musical composition works, and how the music generation software works, and suggests that there is a closer link between those than between text generation and writing. It’d be interesting from a scientific standpoint to try to look at those connections in more detail. In linguistics there is currently a great deal of controversy over whether the LLMs are useful as models of human language (for two poles of the debate, see Piantadosi and Kodner et al.). There is also something about the way that Phillips describes the human and the computer creation process that brings to mind a potential argument that using web available training data is analogous to how humans wind up creating original music. But even in the case of human creation, questions of authorship, copyright, and fair use are incredibly difficult to arbitrate, both ethically and legally.

Update Jan 7: Michael J. Epstein shared with me in a message these details about the process for making his songs in Suno, which highlight how much human intervention and creative choice there is in the work he did, and also how much faster it was than non-AI work: “…it does take a lot of thought and practice to prompt to get what you want. For every song I was posting, it probably took 20 initial generations of it to pick a baseline and then another 20+ section regenerations with dynamic prompting. So, I was not just dropping something in and releasing the output. That said, it’s obviously a trivial amount of work relative to writing and recording songs the way I have been for decades.” And in case you are curious about the playlist placement services he used, I saw this in one of his Facebook comments: “I used Sound Campaign and Playlist Push and had success with Sound Campaign and not much with Playlist Push, but I think it’s more about the genre issue. I do hear many people do not have success with services like these, so it’s definitely hit or miss, and I suspect the more generic, boring, and mainstream your music is, the better it will do…”

*Update Jan 8: I just added a parenthesized asterisk to the title, after having done another quick experiment in Suno. It had already been feeling to me that I needed to qualify what I meant by “good”, and now I’m sure I need to. What I mean is that it seems capable of generating music that is indistinguishable from non-AI music, at least in some genres, and when used correctly. Here’s a good example of how it is not good in a broader sense (besides the questions of ethical and societal goodness).

I asked Suno to generate songs in the style of “Balinese gamelan” and “Javanese gamelan”. I am no expert, but I could likely accurately sort examples of these two styles of music, and I wanted to see how Suno would do on a non-western musical system. It wound up producing a variety of things that I would label soundtrack or elevator music, and as far as I can tell none of the instruments sounded like they were from a gamelan orchestra and none of the songs used anything but standard western musical structure. There are huge issues in AI language generation about it being used with low resource languages, creating errorful examples of writing in those languages that might be taken as genuine because they use the correct orthography and get some other things right. At least no one will take these to be examples of gamelan music!

Music generated by Suno v. 4 Jan. 8, 2025 with style prompt “Balinese gamelan”, example 1.
Music generated by Suno v. 4 Jan. 8, 2025 with style prompt “Balinese gamelan”, example 2.
Music generated by Suno v. 4 Jan. 8, 2025 with style prompt “Javanese gamelan”, example 1.
Music generated by Suno v. 4 Jan. 8, 2025 with style prompt “Javanese gamelan”, example 2.
Categories
Uncategorized

Places We’re Not Allowed: Les Dérailleurs Pandemic “Hit”

In March 2020, I noticed that one of the songs on our 2018 EP started getting a bunch of plays on Spotify. I tweeted about it, asking if anyone knew what I could do to figure out what was going on. Henning Ohlenbusch suggested that it was probably because it was put on a popular playlist, and told me that I could see this in the Spotify Artists profile. He was right.

“Places We’re Not Allowed” had been put on a “Lockdown Playlist”, alongside such other thematically titled songs as “Don’t Stand So Close to Me”. Like most people who care about musicians getting paid, I’m not a fan of Spotify, but it was a fun diversion in those early pandemic days to watch the song get plays, wonder how many it would get, and see where people were listening to it. The graphic above shows that it had gotten 2.5k listeners by April 2020 (the percentage change is presumably due to a bug coming from counting zero as 0.01). As of January 2023 it’s a couple hundred shy of 5k.

Here’s the song on Soundcloud – we put all of our music there so people don’t have to use the streaming services if they don’t want to (and here‘s the Spotify link in case you want to help us get to 5k!).

Through some luck, Les Dérailleurs managed to stay active in the pandemic playing in person as well. Two of us are volunteers with Flywheel, and we were able to rehearse in the big space in Easthampton Old Town Hall, many, many feet away from each other. We also got some drum tracks recorded in there (huge room sound!) that we plan to use in an upcoming release. And just as Flywheel was moving out of that space, we got a very nice invitation from Elizabeth MacDuffie and Mark Alan Miller to play at the 15th Meat for Tea release party in March 2021. It was a virtual event, and Mark prerecorded our segment. Here’s “Places”, which we finally released in January 2023.

Live at Sonelab March 2021.
Categories
Uncategorized

How do you say the second syllable of cotton?

My Sounds of Englishes class is taking over my life. This morning I heard my daughter say “cotton”, and I had to make a recording of her and everyone else in the room. She pronounces the second syllable with a vowel, and it sounds something like “in”. You can hear (and see in Praat) the second syllable in isolation in the video. The video also has me and another person pronouncing it with a syllabic nasal as it is pronounced in most North American varieties (I think!), and my daughter’s younger brother doing it with a vowel, and also with a [t] between the vowels, rather than a glottal stop (all of the others have glottal stop rather than [t] as is again probably the way it is said by most adult North Americans).

My daughter’s pronunciation of this type of word appears to be a feature of at least some varieties of Western Massachusetts English. I first heard about it maybe 10 years ago when a student in Sounds of Englishes pointed it out. More recently, I was at a dinner with non-linguist friends and they mentioned that one of their kids pronounces “kitten” in this Western Mass way (they are from elsewhere originally), and the other does not. Coincidentally, a couple days after that, I came across Joey Stanley’s excellent blog post on Utah English, in which this is a well known, and somewhat stigmatized, pronunciation. As far as I know, there is no stigma attached to the Western Mass vowel+nasal pronunciation.

Update Nov. 6, 2022: A 2021 paper by Eddington and Brown looks at production of vowel+nasal second syllables in words of this type in four states, and also at how these productions are perceived in terms of speaker characteristics (e.g. education, place of residence). It’s seeming increasingly like it’s everywhere, but that people think it’s a mark of their own region’s dialect.

Update Dec. 13: It seems my daughter has this vowel in other unstressed syllables too. In terms of phonetic transcription, she seems to vary between [ɨ] and [ə] in the second syllable of “salad”, but has consistent [ɨ] in “cotton”. On alternations between these vowels, see the “Rosa’s roses” paper (my daughter seems to have [ɨ] for the second syllable of both of “Rosa’s” and “roses”). Also, I just reread the first paragraph of this post and it’s a weird mix of trying to write for a general audience and for linguists. Oh well…

Categories
Uncategorized

“Canadian” vowel shift in Western Mass

There are a lot of vowels in English, and they don’t seem to be comfortable in the space they are in. They are constantly moving, pushing (or pulling) each other around. The Great Vowel Shift, which happened from about 1400 to 1700, is responsible for a lot of the mess in the English spelling system, with written letters having multiple pronunciations (how many ways can “ough” be pronounced? Ought, though…), and vice versa (how many ways can you write the vowel sound in “ways”?).

Less well known are the modern vowel shifts, but the more they are studied, the more likely it seems that are happening everywhere English is spoken. William Labov first documented the Northern Cities Shift, which can make “buses” produced by someone in Chicago or Detroit sound like “bosses” to most other North Americans, and “block” sound like “black”.

The Canadian shift (see the concluding paragraph below on its current name) has largely gone in the opposite direction of the Northern Cities Shift. Because of this, when I first moved to Western Massachusetts from Canada, a potential landlord thought I was telling him I had a cot (and seemed puzzled that I thought he needed that information) when I was in fact telling him I had a cat.

A 2022 study by Matt Gardner and Rebecca Roeder provides a particularly clear picture of the Canadian shift in their Figure 8. Each of the arrows corresponds to a vowel whose pronunciation differs according to the age of the speakers (which is called a change in “apparent time”). The speakers are all from Victoria British Columbia, and range in age from 14 to 98.

The vowel from “cat” is represented by TRAP in Fig. 8. This diagram is plotting vowels according to acoustic measurements that correlate with the height and frontness of the tongue. Vowels closer to the top of the vowel plot are produced with the tongue higher in the mouth, and vowels further to the left are articulated with the tongue further to the front of the mouth. So the new pronunciation of TRAP has a lower, less front vowel.

In the Northern Cities Shift, the TRAP vowel has moved frontwards and up, towards the DRESS vowel. You can hear this in a speaker from New York City saying “bad”, or a speaker from Chicago saying any short-a word. In the Canadian shift, the vowel has gone in the other direction, towards what you might be familiar with in many Boston speakers’ pronunciation of “father”. The relatively back position of the Canadian TRAP vowel may be part of why it is used in the Canadian pronunciation of “pasta”, since it is not far from the Italian vowel in that word.

Surprisingly, another recent study, published by Monica Nesbit and James Stanford in 2021 finds that the TRAP vowel has shifted in the Canadian direction in Western Massachusetts rather than in the Northern Cities direction. This is shown in a pair of vowel plots in their Figures 5 and 6, showing the positions of the vowels for the oldest and their youngest speakers. This diagram uses BAT for the TRAP vowel. For the older speakers, it is remarkably high and front relative to DRESS, but for the younger speakers it is quite a bit lower and a bit less front.

This raises a puzzle for me: why did my young interlocutor mistake my Canadian “cat” for “cot” if we had the same TRAP vowel? My guess is that Western Mass is home to wide range of vowel systems, including those of natives of other areas with Northern Cities-style raised TRAP and fronted LOT vowels. There is probably also some regional variation within Western Mass along the I-91 corridor. Meghan Armstrong-Abrami, a linguist native to East Hartford, says she thinks people from Holyoke and Springfield often sound similar to Hartford speakers. It would also be useful to know exactly where the Canadian and Western Mass vowels are relative to each other: we can’t tell by comparing these vowel plots since they use different scales.

If you want to hear a Western Mass speaker, you can listen to a podcast of Bill Dwight, who has been called the spirit of Northampton (he was born in Holyoke). The students in my UMass “Sounds of Englishes” class and I will be listening to him and analyzing his speech, and it will interesting to see where it lands in the crowded English vowel space.

As you may have noticed in the figure caption for the Canadian shift, it has now been given the name of low-back-merger shift. The renaming has happened for two reasons. First, it’s clearly not just a Canadian thing: as well as the new Western Mass example, it’s long been known to have happened in California. Second, the new name gives information about an important characteristic of these shifts that we haven’t gotten to yet: that the lowering of the TRAP/BAT vowel is accompanied by the merging of the THOUGHT and LOT vowels (low back vowels). For Canadians, and increasing numbers of speakers in the United States, “cot” (a LOT word) and “caught” (a THOUGHT word) are pronounced the same. And in the vowel plots above from both Canada and Western Mass, you can see that the older speakers produced the THOUGHT and LOT vowels differently, and the younger ones produced them the same. The Western Mass case is particularly interesting because it seems to reverse the chronology of the change from what happened in Canada: according to Nesbit and Stanford, the lowering of the TRAP vowel happened before the the merging of the THOUGHT and LOT vowels.

Update 11/24: Just got myself a copy of Edward McClelland’s How to Speak Midwestern, which is a wonderfully accessible, informative, and fun introduction to the Northern Cities Shift and much more. I read it in one sitting, and feel like he got the linguistics right – I also learned a lot.

Update 12/3: Here is a plot of Bill Dwight’s vowels from a word list reading. As in the above plots, the labels indicate mean values for the vowel classes. He falls in between the two generations whose vowel plots are shown above, in having TRAP lowering but no THOUGHT/LOT merger. TRAP lowering first is the order Nesbit and Stanford infer from their statistical analysis of a larger set of speakers. He was born in 1955, in between the birth years of the older and younger speakers.

And here is a plot of mine: a more centralized and slightly lower TRAP than Dwight, a lower and more central BAN, and no LOT/THOUGHT contrast (or MARY/MERRY/MARRY).

The fact that my TRAP is between Dwight’s LOT and TRAP may well be related to the cot/cat confusion I experienced. It is also interesting that Dwight’s TRAP is not as low and central as the younger speakers in the Nesbitt and Stanford study. It seems likely that TRAP would only be that low and central if LOT is in the further back and higher part of the space, as it is for me and the younger Western Mass speakers.

Categories
Uncategorized

Labov’s “A Life of Learning”

I sent out a tweet with the slight hope of finding samples of the Martha’s Vineyard diphthong centralizing/raising that Labov studied. Edward Flemming pointed me to Labov’s 2009 Charles Homer Haskins Prize Lecture, which features audio of Donald Poole from 1961, and five other participants in Labov’s research: Jacob Schissel, NYC 1963; Larry Hawthorne, South Harlem 1967; Celeste Sullivan, South Philadelphia 1973; Jackie Garopedian Chicago 1986; Latasha Harris, West Philadelphia 2001. The talk is about what Labov learned from these people, about the nature of language, and much more.

Audio of Labov’s 2009 “A Life of Learning” talk with synced slides

The ACLS site currently only hosts the audio, and the slides were not available on Labov’s website, so I asked him for them. He graciously shared them with me, and you can get them from this link. He also gave me permission to make a video with the synced audio, which you can see in the above embedded YouTube video, or download as an .mp4 file from this link.

Laurel Mackenzie pointed me to a Wayback machine archive of an ACLS web page that has the complete transcript of the talk, the figures, and the audio.

Josef Fruehwald shared some memories: “[I] was lucky to be there for the talk, unforgettable…When he spoke about the little girl who would fight in school, you could hear a pin drop.”

The talk really is fantastic. My Sounds of Englishes students are now listening to the whole thing, and I’ve given them this assignment:

This video is a restropective of Bill Labov’s career, and serves as a fantastic introduction to the field he essentially created, sociolinguistics. He chooses to foreground 6 of the people he worked with as participants in his research. We get to hear each of these people speaking. Labov discusses both the importance of *what* they were saying, and aspects of linguistic structure.

For each one, please:

1. Write a short reflection of your own on what they are saying.

2. Say something about the linguistic structure of the speech.

For most of them, Labov does talk about the study he was doing, so for them, you can simply summarize. In those cases, include Labov’s discussion of social factors. For the ones that Labov does not discuss in terms of structure, you will have to make your own observations.  For each of the 12 answers (6 each), write just 2-4 sentences. Submit your response as a .pdf.

Assignment for Ling 370 Fall 2022
Categories
Uncategorized

UMass and Hampshire County Covid-19 data from the 2021-22 school year

UMass Amherst released weekly new case counts throughout the 2021-22 school year. The following is a comparison of the UMass new case counts to those from the MassDPH for Hampshire County. Most of the UMass cases are presumably also Hampshire County cases. The DPH uses declared county of residency; although the declared residency of the UMass cases is not publicly available, it seems likely that at least 3/4 of the UMass cases appear in the Hampshire County data. The Hampshire County population is 160,830 in the 2019 census, and the UMass population is 29,300 faculty, staff and students according to campus communication from the Public Health Promotion Center (Sept. 16, 2021). The UMass population that reports Hampshire County as their residence thus likely makes up about 20% of the county population.

Per capita weekly new case rates for Hampshire County and UMass Amherst. Rates are per 100,000 people, and the weeks end at the indicated dates.

To compare the case numbers from these partially overlapping populations, we can take the standard approach of relativizing the raw numbers to population size, as rates per 100,000 people. 100 new cases per week per 100K and above is considered “high” in the CDC transmission levels, and 200 per 100K is the bar used in their community levels along with hospitalization data. The above graph converts the raw weekly numbers (see the end of this post for those numbers and their source) to per capita rates using the populations from the last paragraph. The UMass rates in this graph range from a low of 34.1 per 100K for the week ending November 19th to a high of 1556.3 for February 15th.

New case rates are affected by testing, and because there was likely much more (asymptomatic) testing in the UMass population than in the general Hampshire County population, the actual relative disease prevalence was likely somewhat lower than these per capita rate comparisons would indicate. (Test positivity has the reverse bias: more testing tends to lead to a decreased percent positive.) It is therefore generally more informative to focus on changes in relative new case rates across time than to compare UMass to Hampshire County at a single point in time.

At some points, the UMass per capita rates were lower than Hampshire County – this was true from the week ending October 5 2021 through to the end of the first semester, with the exception of November 9th. (It was also true during the winter break, but the UMass population was of course much lower during this period, so these rates are artificially low). Since there was more testing at UMass, it is likely that there was in fact less disease prevalence there than in Hampshire County as a whole during the first semester after the first month. It is probably relevant that the UMass population had a much higher vaccination rate than the general Hampshire County population, that an indoor mask mandate was in place throughout the first semester, and that UMass was using wastewater surveillance testing with adaptive PCR testing.

The UMass per capita rates were also much higher at some points. This was the case at the beginning of both semesters. The first day of classes in the second semester was January 25. The UMass rate two weeks later was 1419.8 per 100K, compared to 477.5 for Hampshire County. The UMass rates approximated the Hampshire County ones in the middle of the semester, but were more than double on April 5th (348.1 vs. 121.9). The pattern of much higher UMass rates generally continued through to the end of the spring semester. It could well be relevant that the UMass Amherst indoor mask mandate was lifted March 9th.

It is not impossible that the changes in relative new case rates across time are due at least in part to changes in testing, but this seems especially unlikely to have played much of a role in the dramatic increase in the UMass cases relative to Hampshire County in the second part of the second semester. There appear to be no publicly available data that could be used to explore this possibility (see below).

=====

Data details

UMass data are from https://www.umass.edu/coronavirus/dashboard. They are currently unavailable at that site; the data for these graphs was copied from the presentations available during the 2021-22 school year. All of the data used for these graphs can be found in the Numbers file at this link; other formats available on request.

MassDPH data are from the downloadable dashboard data at https://www.mass.gov/info-details/covid-19-response-reporting. UMass cases seem to appear in the MassDPH data with a report date of about two days later, so the Hampshire County weeks in these graphs were chosen to end two days later, on the Thursday. The dates in the graphs are the Tuesdays.

The raw new case counts are shown below. The UMass data for April 28 (165 new cases) and May 3 (188) are the updated counts supplied the week after. They were originally given as 132 and 156 respectively. There were no UMass data released November 23 or December 28. The numbers from November 30 and January 4 were 77 and 360 respectively. They have been split over the two weeks.

Weekly new case counts for Hampshire County and UMass Amherst. Weeks end at the indicated dates. See the text for more information.
Categories
Uncategorized

Les Dérailleurs 2003

Frank Sinistra, Rainy Stanford, and me playing as Atomica at Flywheel in 2005.

Les Dérailleurs have a new EP called 2003. Here is a pre-release version on SoundCloud, followed by the cover art by Luke Cavagnac, and the story of how this recording from 2003 is just being released now (and why it’s being released as Les Dérailleurs). It was released on the streaming services August 1 2022 (release party in Kingston Friday August 5th 3-5 Black Dog patio, Northampton TBA).

EP cover art by Luke Cavagnac

The Story

In 2003 I had been in Northampton Massachusetts for a few years, and was really enjoying the music scene (Thurston Moore playing at a bowling alley!) but hadn’t been able to make any connections with people to play with. I decided I would make some demo recordings to try to make some progress on that.

I had just discovered one-time Buzzcock Howard Devoto’s band Magazine, and loved the way the synths sounded, so I bought a Linn guitar pedal. It’s all over the recordings. I was also aiming for a generally disco punk thing like Gang of Four. I used the drum machine in my Casio keytar for rhythm tracks to play along with that were mostly muted in the final mixes, except a bit in Work this Thing, and one from my Linn pedal that survived in Under There.

My friend and high school bandmate Grant Ethier agreed to add real drum tracks, so I packed up my Apple G3 and Korg mixing board/A-D converter and took them up to Kingston Ontario and set them up in his basement. I told him I was hoping for a sort of disco feel, and we listened to Bohannon, another obsession of mine at the time. He nailed the recordings really quickly, and I left that same day with stereo mixes of the drums (he’s a great sound engineer as well as a great drummer). Funnily, the bass drum head in the picture above is a hand me down from Grant, with the cover image from the Thirteen Engines Perpetual Motion Machine record (gone now, but I still have the Radio Shack disco ball).

Me and Grant in my parents’ basement, about 1981

I don’t know if I wound up giving the recordings to any prospective musical partners. I met Frank Sinistra, the Atomica bass player in the picture above, around that time. He and I are still playing music together – including two of the songs in these recordings – 20 years later in Les Dérailleurs. I recently listened to them while working on some new recordings (hopefully to be released by the end of the summer), and enjoyed them, so asked Mark Miller to master them. I knew he would get what we were trying to do – I think he did a great job.

Categories
Uncategorized

Representing and learning stress: Grammatical constraints and neural networks

This new NSF grant is currently being processed so this information is here temporarily.

Joe Pater (PI), Gaja Jarosz (co-PI), total costs $386,226

Public summary: Languages are systems of remarkable complexity, and linguists and computer scientists have devoted considerable effort to the development of methods for representing those complex systems, as well as computational methods for learning the system of a given language. This effort is driven by the desires to better understand human cognition, and to build better language technologies. This project draws on the theories and methods of both linguistics and computer science to study the learning of word stress, the pattern of relative prominence of the syllables in a word. The stress systems of the world’s languages are relatively well described, and there are competing linguistic theories of how they are represented. This project applies learning methods from computer science to find new evidence to distinguish the competing linguistic theories. It also examines systems of language representation that have been developed in computer science and have received relatively little attention by linguists (neural networks). The research will engage undergraduate and graduate linguistics students at a public university. Linguistics has a much higher proportion of female students than computer science, and this project aims to address gender imbalance in STEM. 

From a linguistic perspective, learning stress involves learning hidden structure, parts of the representation that are not present in the observed data and that must be inferred by the learner. A given pattern of prominence over syllables is often consistent with multiple prosodic representations. The approach to hidden structure learning used in this project applies the general technique of Expectation Maximization, which in pilot work achieved good results on a standard test set. Intriguingly, many of the languages that this learner failed on in the test set are ones that are in fact cross-linguistically unattested. This project expands the set of tested languages to include more of the range of systems found cross-linguistically, and further explores the possibility that typological gaps have learning explanations. It compares hypotheses about the constraints responsible for stress placement by comparing how well they support the learning of attested systems, and whether they can help explain typological gaps. Pilot work also found indications that a neural network could learn generalizable representations of the data; the project is further testing this method. All of the software developed in this project is being made freely available, as is a database of the stress systems of the world’s languages. 

Categories
Uncategorized

Linguistics as cognitive science

Presentation to the College of Humanities and Fine Arts’ 5 at 4 series
March 9, 2022

Pater, Joe. 2019. Generative linguistics and neural networks at 60: foundation, friction, and fusion. Language 95/1, pp. e41-e74. 

Leading questions

How is knowledge of language represented?

How is language learned?

The broader questions of how human knowledge is learned and represented have been given two general kinds of answer in cognitive science since the field emerged in the 1950s.

Birth of cognitive science

Noam Chomsky 1957
From Pater (2019)
Frank Rosenblatt with the Perceptron image sensor late 1950s
From Pater (2019)

1980s: Cognitive science as a field

Cognitive science became a recognized interdisciplinary field in the early 1980s, thanks partly to funding from the Sloan Foundation. Barbara Partee of Linguistics collaborated with Michael Arbib of Computer Science to secure Sloan funding to establish interdisciplinary CogSci at UMass Amherst.

Barbara Partee circa 1977. University Photograph Collection (RG 120_2). Special Collections and University Archives, University of Massachusetts Amherst Libraries

The fight over the English past tense

David Rumelhart from https://www.psychologicalscience.org/observer/david-rumelhart
James McClelland from https://en.wikipedia.org/wiki/James_McClelland_(psychologist)

Rumelhart and McClelland (1986) present a Perceptron-based approach to learning and representing knowledge of the English past tense (e.g. love, loved; take, took):

Scholars of language and psycholinguistics have been among the first to stress the importance of rules in describing human behavior…

We suggest that lawful behavior and judgments may be produced by a mechanism in which there is no explicit representation of the rule. Instead, we suggest that the mechanisms that process language and make judgments of grammaticality are constructed in such a way that their performance is characterizable by rules, but that the rules themselves are not written in explicit form anywhere in the mechanism.

Rumlhart and McLellan (1986) On Learning the Past Tenses of English Verbs, pp. 216-217
Steven Pinker 1994
Alan Prince 2013 MIT

Twenty years ago, I began a collaboration with Alan Prince that has dominated the course of my research ever since. Alan sent me a list of comments on a paper by James McClelland and David Rumelhart. Not only had Alan identified some important flaws in their model, but pinpointed the rationale for the mechanisms that linguists and cognitive scientists had always taken for granted and that McClelland and Rumelhart were challenging — the armamentarium of lexical entries, structured representations, grammatical categories, symbol-manipulating rules, and modular organization that defined the symbol-manipulation approach to language and cognition. By pointing out the work that each of these assumptions did in explaining aspects of a single construction of language — the English past tense — Alan outlined a research program that could test the foundational assumptions of the dominant paradigm in cognitive science. 

Steven Pinker (2006) Whatever Happened to the Past Tense Debate https://escholarship.org/uc/item/0xf9q0n8

Fusion: Optimality Theory

Paul Smolensky from https://cognitivesciencesociety.org/rumelhart-prize/
John McCarthy from https://en.wikipedia.org/wiki/John_McCarthy_(linguist)

Now: Neural networks’ third wave

Modern computers are getting remarkably good at producing and understanding human language. But do they accomplish this in the same way that humans do? To address these questions, the investigators will derive measures of the difficulty of sentence comprehension by computer systems that are based on deep-learning technology, a technology that increasingly powers applications such as automatic translation and speech recognition systems. They will then use eye-tracking technology to compare the difficulty that people experience when reading sentences that are temporarily misleading, such as “the horse raced past the barn fell,” with the difficulty encountered by the deep-learning systems. 

From Brian Dillon’s 2020 NSF award abstract https://www.nsf.gov/awardsearch/showAward?AWD_ID=2020914&HistoricalAwards=false

This project draws on the theories and methods of both linguistics and computer science to study the learning of word stress, the pattern of relative prominence of the syllables in a word. The stress systems of the world’s languages are relatively well described, and there are competing linguistic theories of how they are represented. This project applies learning methods from computer science to find new evidence to distinguish the competing linguistic theories. It also examines systems of language representation that have been developed in computer science and have received relatively little attention by linguists (neural networks).

From NSF project summary of “Representing and learning stress: Grammatical constraints and neural networks”, Joe Pater PI, Gaja Jarosz co-PI
Categories
Uncategorized

CDC Covid transmission levels (2021 vs. 2022)

The Shoestring Covid tracker uses the CDC community transmission levels released as part of their Feb. 12 2021 guidance for school reopening. They are based on a combination of a per capita weekly new case rate and the test positivity rate, as shown in this table:

The Shoestring Covid tracker uses just the new case rate, since the test positivity rate is often not available, and because it usually wouldn’t matter (e.g. it would be very unusual to have a “low” new case rate and the greater than 5% positivity that would change the classification to “moderate”).

These transmission rates can be used for decision making, for communities, businesses and institutions, or individuals. For example, from July 27th 2021 until Feb. 25th 2022, the CDC recommended that masks be worn indoors in communities with substantial transmission or greater, that is, with 50 or more new cases per 100K in a week. On Feb. 19th 2022, Bob Wachter, Chair of the UCSF Department of medicine, published a Twitter thread explaining his reasoning for maintaining a similar level (10 per 100K per day = 70 per week) as a threshold for indoor mask wearing, even given the changed circumstances from mid-2021. (Update 3/24: this piece from Inside Medicine supports the 50 per 100K threshold for universal indoor masking).

On Feb. 25th 2022, the CDC released new community levels and masking guidance. The new metric uses a single new case rate threshold of 200 per 100K per week, combined with the per capita rate of new Covid-19 hospital admissions, and the percentage of staffed hospital beds occupied by Covid-19 patients, to classify communities as having Low, Medium or High levels. The indoor mask recommendation applies for communities with a High level, which is reached when new cases exceed the 200 per 100K per week rate, and there are either 10 or more new admissions per 100K, or 10% or more hospital beds occupied by Covid-19 patients. This is four times the previous new case rate threshold, plus an added hospitalization rate requirement.

The CDC 2022 guidance is controversial. On the day it was released, the president of the AMA issued a statement that included the following:

But even as some jurisdictions lift masking requirements, we must grapple with the fact that millions of people in the U.S. are immunocompromised, more susceptible to severe COVID outcomes, or still too young to be eligible for the vaccine. In light of those facts, I personally will continue to wear a mask in most indoor public settings, and I urge all Americans to consider doing the same, especially in places like pharmacies, grocery stores, on public transportation…

Gerald E. Harmon MD, President American Medical Association, Feb. 25 2022, https://www.ama-assn.org/press-center/press-releases/ama-statement-cdc-covid-19-updates

The new levels are of limited value for individual decision making. For example, at the Medium level, the guidance states that “[I]f you are immunocompromised or high risk for severe disease [t]alk to your healthcare provider about whether you need to wear a mask and take other precautions (e.g., testing)”. The Medium level could occur with any new case rate. Presumably, a healthcare provider’s advice on masking should take the community transmission level into account, but the new CDC guidance provides no basis on which that could be done. In addition, individuals may want to take precautions as new case rates rise, before hospitalizations begin to increase and trigger a change in the CDC 2022 community level. Furthermore, the new CDC guidance gives no help in determining the circumstances under which individuals might want to wear masks to protect community health, as the president of the AMA urges us to do in the above statement.