Chomsky pro and con

The syntax book is coming along– I have about 300 pages written.  This project has required reading more by and about Chomsky than is, perhaps, compatible with mental health.

My general position on Chomsky is to defend him to linguistic outsiders, and complain about him to insiders.

In general the defense is going to be in the book– you can hardly talk about modern syntax without recognizing his influence and his discoveries. Generative grammar (GG) from the ’60s was galvanizing… a huge array of transformations and rules and weird syntax was quickly found.  An early book like John Ross’s Infinite Syntax! (1967; it was published under that title in 1986) is highly technical yet displays the contagious exuberance of discovering new things. Whether or not you like the theories, the facts remain, and we can no longer relegate syntax to a six-page section after doing the morphology, as the Port-Royal grammar did.

GG appealed almost at once to computer programmers, which is remarkable if you think about it: few programmers looked at the classic Latin, Greek, or Sanskrit grammars and said I want to program that! If anything, this part of the charm of GG is far more accessible today!  I’ve been creating some web toys that allow modeling transformations; they allow GG to come alive in a way that ’70s textbooks couldn’t really show.

So, on to the complaints!  One is more of a sad groan: it is really hard to keep up with Chomskyan syntax– it changes every ten years, often in dramatic ways. And Chomsky’s own books have become increasingly unreadable. I can barely follow The Minimalist Program; it seems to be barely about language at all.  He seems to prefer abstract pseudo-algebra to discussing actual sentences. The one exception to this generalization is Language and Problems of Knowledge (1988), which was written for the general public and shows that the man can write understandably when he wants to.

Generally speaking, other people have had to tidy up his theories and make them into readable textbooks. I’ve appreciated Andrew Carnie’s Syntax: A Generative Introduction and David Adger’s Core Syntax: A Minimalist Approach.

The dude has a right to change his views over time; still, one might complain that so many of the changes are pointless or don’t seem to move toward greater correctness. Yet he has a way of stating his present views as if they were the only ones possible. In The Minimalist Program (1995), he’s gotten out of the habit of even arguing for his position, or acknowledging that there are any other views at all.

This must put Chomskyan professors into a terrible position.  Imagine teaching, for years, that you must have an N below an N’ below an N” even if you’re dealing with a single word like John, and carefully marking up students’ papers when they lack the extra nodes. And then Chomsky decides one day that all this is unnecessary.  Or, you have NPs for years and then are told that they are DPs.  Or you learn phrase structure rules, only to have them thrown out.  Even without looking at the many other syntactic theories, shouldn’t all this bring in some healthy doubt?

I know it’s almost impossible for humans, but really we should assign our statements a probability value– e.g. it’s 60% likely that the head of “these frogs” is an N, 40% that it’s a Det– and then take serious note of the fact that stacked hypotheses plummet in probability. If you think idea A is 90% probable and so is idea B, then idea AB is 81% probable. And idea ABC is 73% probable, and so on. And anything as complex as X’ theory or Minimalism is made up of dozens of stacked hypotheses.

I wish that Chomskyan syntacticians would take a lesson from math, or computer programming: the same problem can be solved in multiple ways. As a simple example, look at the multiple proofs of the Pythagorean Theorem. GG in the 1960s (not just Chomsky) was convinced that there was a right solution to any syntactic problem. (And it tended to see any linguist problem as a syntactic problem.) This attitude has continued, and it’s rarely acknowledged that it may just be wrong.

So, when we look at Minimalism, and Word Grammar, and Relational Grammar, and Generalized Phrase Structure Grammar, and Construction Grammar, and Lexical Functional Grammar, and Arc Pair Grammar, and so on… it’s possible that none are entirely wrong. It’s quite possible, even likely, that the same language can be described in multiple ways.

Some of these systems probably will die with their creators, and that’s fine. On the other hand, I think relational grammars in general will continue, because they offer a needed corrective. Chomskyan syntax concentrates on constituent structure, and relational grammar on, well, relations between words. You can diagram a sentence either way and learn something each time; each approach is also better for different languages.

(Minimalism makes a great effort to represent relations, and yet does so very clumsily.  Really, try to get a Chomskyan to explain what a “subject” is, or what a “passive” does. Relational grammars start with these things, often not bothering to show the details of constituent structure.)

Another example is case assignment. X’ theory and Minimalism treat this very seriously and in the most cumbersome way. Here I feel that they’ve lost their way: was this really a terrible problem that needed to be solved again? Traditional grammar was actually pretty good at case assignment, and used far simpler terminology. GG’s forte is not handling case, it’s handling transformations.

Another lesson from programming is relevant: elegance is relative to the machine. Syntacticians (again, not only Chomsky) have spent way too much time worrying about the efficiency of their systems. Just one example: X’ theory decides that having general rules VP > NP, VP > NP NP, VP > NP PP, etc., is messy and we should instead let entries in the lexicon specify what frame they need: e.g. cry doesn’t need any object, put needs NP PP. Does that make the grammar simpler or not?  Maybe for the grammarian; we don’t know if it’s better or not for the brain.

The thing is, we know almost nothing about the machine we’re running on, i.e. the brain.  You can’t optimize for a machine if you don’t know its specs (and its hangups and limitations). The very little we do know suggests that our way of thinking about algorithms is probably a very bad way to think about mental abilities. Brains are not like a CPU running a program. They are more like 100 billion CPUs each running an extremely simple program. Its methods (e.g. in the visual system) run toward millions of sub-processes addressing tiny little problems (“is there an edge here?” “did my little bit of data change recently?”).

I don’t think any linguistic theory really makes use of this information, though cognitive linguistics may be getting there. One corollary I’d strongly suggest, though: the brain is probably fine with messy, encyclopedic data in multiple formats. Everything can be linked or stitched together; very little is refactored as new data comes in. Half of the general rules that linguists discover probably don’t exist at all in the brain; they’re historical facts, not brain facts.

I just finished an older introduction to Chomsky, Chomsky’s Universal Grammar by V.J. Cook (1988), and it’s actually more annoying than Chomsky. That’s because it foregrounds all of Chomsky’s worst ideas: universal grammar (UG), the language organ, and principles & parameters. Cook leads off with these things because he presumably finds them the most interesting for the outside world. Ironically, by his own account, these are precisely the ideas that are largely ignored or rejected by psychologists, language acquisition specialists, programmers, and language teachers.

Chomsky’s first books were quite neutral about the psychological status of his grammar– he was after a description of the sentences produced within a language, nothing more, and did not claim that speakers used that grammar in their brains. He has since become ever more convinced that not only is grammar preprogrammed in the brain, it’s programmed according to his system. And yet he develops his system entirely based on intra-theoretical concerns; he has never had any real interest in biological systems, genetics, neural behavior, or even language acquisition.

He even maintains that word meanings are innate, a position which is positively barmy. He finds it perfectly obvious that a word like climb is innately provided by UG. When you hear this sort of thing (Chomsky is not the only offender), take note that words like climb are all they talk about. Did the ancients really have genetically given concepts like airplane, microscope, neutron, phosphate, compiler?  How about scanning electron microscope?  How about generative semantics? It’s simply impossible that our poor little genomes can contain the entire OED; it’s also easily demonstrable that concepts do not neatly coincide between languages. (Takao Suzuki’s Words in Context has a very nice demonstration that words like break are far more complex than they seem, and can’t be simply equated to any one Japanese word.)

On the plus side, innatism on words doesn’t really come up much; on the minus side, Chomsky has doubled down on innatism in syntax, in the form of principles and parameters. This is the idea that UG contains all possible rules, and an actual human language consists of just two things: a small list of binary settings, and a rather longer list of words.

The examples that are always trotted out are pro-drop and head position. Supposedly, Italian is pro-drop and English is not– that is, English requires pronouns and Italian doesn’t.  Got that? Might have to explain more. Not quite so cut-and-dried as all that. Think about it.

Head position is a little harder to explain, but it basically means that the ‘important’ word goes first or last. English is head-first, because we have V + O (kill things), prepositions (on top), and head-first NPS (the one who laughs). Japanese is head-last in all these areas.

One problem: this is hardly binary either. There are good arguments that some languages don’t have VPs at all. Chinese has both prepositions and postpositions, and it isn’t alone. English is only mostly SVO; there are exceptions. Relative clauses arguably don’t attach to the noun at all in Hindi, but to the sentence. Chinese has RelN order combined with V+O.

You could ‘solve’ all this by multiplying parameters. But that only reveals the meta-theoretical problem: the parameters notion makes the theory unfalsifiable. For any weird behavior, you just add another parameter. Cook claims that no one’s found any grammars that don’t match UG, as if that’s a point in favor of the theory.  In fact it’s a point against: UG has been made refutation-proof.

Chomskyans do make a pretty strong claim: children should be able to set a parameter based on very little input– maybe just one sentence, Carnie boldly says. And the evidence from language acquisition… does not back this up at all. Children do not show evidence of randomly using one parameter, suddenly learning a setting, and thereafter getting it right. They show little evidence of learning overall abstract rules at all, in fact.  They seem to learn language construction by construction, only generalizing very late, once they know thousands of words.  See my review of Tomasello for more on this. (Also discussed in my ALC.)

Finally, there’s the infuriating arguments for the language organ. Chomsky, and each of the Chomskyan textbooks, invariably bring out the same tepid arguments. For some reason they always bring up Question Inversion. E.g. what’s the question form of this sentence?

The man who is tall is John.

Chomskyans love to run this through a number (usually two) of impossible algorithms. E.g., reverse every word:

*John is tall is who man the?

Or, reverse the first two words:

*Man the who is tall is John?

All this to come up with the apparently amazing fact that we reverse the subject and the verb:

Is the man who is tall John?

This is supposed to demonstrate the importance of constituent structure: the “subject” is not a single word, but the entire phrase The man who is tall. And this is entirely true! Constituent structure is an important building block of syntax, in every goddamn theory there is. It doesn’t prove UG or any version of Chomskyan syntax.

Plus, the “non-UG” alternatives are pure balderdash; even a completely dense child can see that no one talks that way. Cook talks about “imitation” as a (wrong) alternative to UG, but the default position would seem to be that children are trying to imitate adult speech. Their initial questions are very obviously simplified versions of adult questions. E.g. where bear? instead of where is the bear? The only rule the child needs here is to use only the words it understands.

There’s a gotcha in the sample sentence: there are two is‘s; a child might be tempted to invert based on the wrong one:

*Is the man who tall is John?

The claim is that children get this right without ever hearing examples of such nested sentences. This is the “poverty of the stimulus” argument: children learn or know things that they can’t pick up from the evidence. But the Chomskyans never check to see if their assumption is correct. More empirically oriented linguists have, e.g. Geoffrey Sampson, who found that corpora of language use do include quite a large number of  model sentences.

Moreover, the Chomskyans show little interest in what errors children do make, or what that might mean for syntax. It’s not the case that children always get questions right. They particularly get wh- questions wrong: What that man is doing?

Now, all the language organ stuff turns out to be, in the end, not very important. The Chomskyans forget it by Chapter 3 and so can we. Still, it annoys me that Cook or Steven (The Language Instinct) Pinker lay such emphasis on it, as if it was the Key Insight that separates Chomsky from the clueless masses. The fact that the same simple arguments and examples come up in each exposition should be a clue: this is more an incantation than any kind of knowledge.

Ironically, for such a Chomsky booster, Cook has a short passage that makes some trenchant criticisms of X’ theory. “One feels that the same sentences and constructions self-perpetuate themselves in the literature…” He regrets that X’ theory seems to have narrowed its focus considerably: there’s little discussion of the wide variety of constructions that earlier GG looked at.

Whew! Sorry, had to get all that off my chest.  Now to see if I can finish The Minimalist Program.