The Syntactic Structures wikipedia page.
Lasnik (2003) Syntactic Structures Revisited.
Adger (2017) on Autonomy of Syntax
Leiber (1975: “Noam Chomsky: A Philosophic Overview”) on the motivation for Chomsky’s (1957) demonstration of the inadequacy of a Finite State characterization of syntax (p. 78; here is the whole chapter):
Pereira (2000) “Formal grammar and information theory: Together again” has a useful demonstration of a simple structured probabilistic model that gives “Colorless green ideas sleep furiously” relatively high probability. See this blog post for a recent discussion of issues around probabilistic formulations of well-formedness.
Colin Cherry’s (1957) book “On Human Communication” is an accessible overview that includes information theory and linguistic theory (and even Bayesian inference, but for signal in noise, not as a cognitive hypothesis). I’ve put the cover and the chapter on information theory here. It was an assigned reading when Barbara Partee was part of the first graduating class in MIT linguistics (thanks to Barbara for her copy!).
D. Terence Langendoen (1967) The Nature of Syntactic Reundancy, p. 303-314 of Tou ed.:
It has clearly been demonstrated that the syntactic structure of language is far richer than that of any “transitional probability” or stochastic model. The transformational-generative theory of Chomsky is probably the closest approximation to the “truth” that linguists have yet come up with.
Pullum, Geoffrey K. (2011) On the mathematics of Syntactic Structures. Journal of Logic, Language and Information 20, 277-296.
Ivan Sag’s notes on the history of generative grammar.
Friederici, Angela D. “Processing local transitions versus long-distance syntactic hierarchies.” Trends in cognitive sciences 8.6 (2004): 245-247. A recent study by Fitch and Hauser reported that finite-state grammars can be learned by non-human primates, whereas phrase-structure grammars cannot. Humans, by contrast, learn both grammars easily. This species difference is taken as the critical juncture in the evolution of the human language faculty. Given the far-reaching relevance of this conclusion, the question arises as to whether the distinction between these two types of grammars finds its reflection in different neural systems within the human brain.
Artificial grammar learning meets formal language theory: an overview. W. Tecumseh Fitch, Angela D. Friederici. 2012. (part of a theme issue on “Pattern perception and computational complexity”).
George Miller (2003: 141-142):
During those years I personally became frustrated in my attempts to apply Claude Shannon’s theory of information to psychology. After some initial success I was unable to extend it beyond Shannon’s own analysis of letter sequences in written texts. The Markov processes on which Shannon’s analysis of language was based had the virtue of being compatible with the stimulus–response analysis favored by behaviorists. But information measurement is based on probabilities and increasingly the probabilities seemed more interesting that their logarithmic values, and neither the probabilities nor their logarithms shed much light on the psychological processes that were responsible for them.
I was therefore ready for Chomsky’s alternative to Markov processes. Once I understood that Shannon’s Markov processes could not converge on natural language, I began to accept syntactic theory as a better account of the cognitive processes responsible for the structural aspects of human language. The grammatical rules that govern phrases and sentences are not behavior. They are mentalistic hypotheses about the cognitive processes responsible for the verbal behaviors we observe.
This encyclopedia article cites McCulloch and Pitts (1943) as the origin of FSMs.
More context: Automata Studies (1956), Shannon and McCarthy (eds.)
The classic paper of McCulloch and Pitts (1943) showed that all logical functions could be effected by simple mathematical abstractions of neurons. (Incidentally, Kleene (1956), in clarifying the results of McCulloch and Pitts, introduced the connection between “regular expressions” and “finite-state machines,” thus initiating an important part of the field of computer science; but these developments turned away from the problem of brain modelling.).
Kyle Johnson’s Introduction to Transformational Grammar class notes.