languages


The second draft is almost done, so it’s time for a page on the book on my site.

Syntax-Front-Cover

What’s in the book?  Well, I just wrote a whole page on that, so just go read it!

Advertisements

I think I’ve written a book. Now we must see whether this is so. As was foretold in the prophecies, this is where I ask for readers.

elvisleft

Contact me if you’re interested and have the time over the next few weeks— markrose at zompist dot com. I usually get more offers than I can handle, so get your offer in fast. 🙂

If you’ve only read the LCK, that’s fine; if you’re a Herr Professor Doktor of linguistics, that’s also fine.

I just finished Language acquisition and conceptual development, edited by Melissa Bowerman and Stephen Levinson (2001), and I want to write down what I learned while it’s still fresh in my mind.

You may recall the book report on Everett & Jackendoff and their feud over innatism. The issue there is Chomsky’s longstanding contention that language learning is far too hard for children, therefore they must have a head start: grammar and vocabulary are already hard-wired into their brains. All they have to do is figure out which of a small series of switches to flip to get the grammar (“oh! verbs go last here!”) and work out that dog means Inbuilt concept #293291.

This book is a report from the trenches of language acquisition; if anyone knows how it goes, these people do. I note, by the way, that this is one of the few fields dominated by women: 20 of the 30 authors of these papers are female. Yay for linguistics!

There is no knockout punch— unsurprisingly, there’s a lot we don’t know about how children learn languages. And this book, at least, doesn’t have too much to say about how children learn syntax, much less whether they do so using Minimalism, Arc Pair Grammar, Role & Reference Grammar, etc. It’s mostly about the first three years, the first words learned, and what that tells us about children’s conceptual system.

The biggest news seems to be:

  • Children understand things far earlier than was once supposed. E.g. Piaget thought that children didn’t acquire the notion of object permanence till 3 years or so; we now know they have it at 5 months. He also thought that children didn’t understand the concept of time till about 8; but in fact they are clearly able to remember and refer to past events, and anticipate and refer to future events, at not much more than 1 year of age.
  • At the same time: universal, basic concepts are more elusive than ever. Languages really do divide up conceptual space differently, and this is evident in children’s speech from the beginning.

The object permanence result is due to better, cleverer technique: rather than relying on the baby’s actions, we only check what they’re looking at. Basically: babies can be surprised, and look longer at unexpected outcomes. So you show them a doll being placed behind a screen, then remove the screen. They’re surprised if they see no doll there, or two dolls.

Many of the authors refer to Quine’s problem. Quine envisioned a linguist eliciting words from a native. A rabbit goes by, and the native says gavagai. Does this mean “rabbit”, or “hop”, or “fluffy tail”, or “unspecified set of rabbit parts”?

Now, the linguists can’t bring themselves to say that Quine is just being a jerk. But there’s a pretty clear answer to this problem: we aren’t tabulae rasae; we’re animals with a hundred-million-year evolutionary history of perceiving objects, especially moving objects, and double especially animals. Some things are very salient for humans— we’re built to see rabbits as objects with a characteristic shape, size, and activity pattern. We’re not built to focus on rabbit tails or miscellaneous rabbit parts.

Early proposals were that children use some all-purpose generalizations: words are likely to refer to the most salient entities; words are normally not synonymous.

Going beyond this, there were assumptions that children would learn nouns before verbs, closed-class form words before content words, shape before materials, and that they would probably learn universal concepts first. This little list of assumptions turns out to be wrong: it depends on the language.

  • Many languages are far more verb-oriented than English. Kids still learn a lot of nouns, but sometimes the proportion of verbs is far higher.
  • Often very specific verbs are learned before abstract spatial words.
  • English children learn to pay the most attention to shape; Maya kids pay the most attention to material.

As for universal concepts, it’s worth looking in detail at an example provided by Levinson. The language is Tzeltal.

Pach-an-a bojch ta y-anil te karton-e.
bowl-put-cause.imp gourd at its-down cardboard-that

The intended translation is “Put the bowl behind the box.” But just about every detail in Tzeltal is different.

  • The shape and spatial information is largely encoded in the verb, not in nouns. Pach– means “place a bowl-shaped vessel upright on a surface.”
  • Corollary: the two NPs refer mostly to material. Bojch is really a word for a gourd; karton can refer to anything made of cardboard.
  • “Behind” is a relative term, which doesn’t exist in Tzeltal. Instead, an absolute frame of reference is used. “Downward” can refer to absolute height, but here it refers to horizontal location, because of a geographical particularity: Tzeltal territory is on a slope, so “downhill” also means “northward”.

Do children really master this system? Of course; they have a pretty good grasp of the slope system by age three. They also master a wide range of very specific verb forms rather than relying heavily, as English-speaking toddlers do, on “up/down”.

Another neat example: English toddlers quickly learn to distinguish “put ON” from “put IN”. Korean children divide up this semantic space quite differently, using at least seven verbs.

  • kkita means “fasten tightly”– this includes putting the top on a pen, placing Lego bricks together, putting a piece in a puzzle, placing a cassette in its box, or buttoning a button.
  • nehta means “place loosely”– e.g. put a book in a bag, or a toy in a box.
  • pwuchita is used for juxtaposing surfaces– e.g. placing a magnet on the fridge.
  • nohta is used for placing things on a horizontal surface.
  • for clothes, you have ssuta for hats, ipta for the body, sinta for the feet.

All this is fascinating because philosophers and linguists are apt to take English categories and assume they are universal concepts: UP, DOWN, IN, ON. Nope, they’re just projecting English words onto Mentalese. There is no stage where children use “universal” concepts before using language-specific ones. (Indeed, there’s evidence that children understand the language-specific concepts well before they can say the words.)

Does all this “affect how you think”? Of course. Levinson tells an amusing anecdote: he almost got his truck stuck in quicksand when his Australian Aborigine companion told him to “swerve north quick”. Levinson just couldn’t calculate where north was fast enough.

There’s also interesting tidbits like, did you know that there is a gradient between comitative and instrumental? It goes like this:

1 – give a show with a clown
2 – build a machine with an assistant
3 – captured the hill with his squad
4 – performed an act with an elephant
5 – the blind man crossed the street with his dog
6 – the officer caught the smuggler with a police dog
7 – won the appeal with a highly paid lawyer
8 – found the solution with a computer
9 – hunted deer with a rifle
10 – broke the window with a stone

In English, as you can see, we use “with” for all of these. In a multitude of languages, these meanings are divided up linearly. E.g.

  • Iraqi Arabic: 1-8 vs 9-10
  • Swahili: 1-6 vs 7-10
  • Slovak: 1-9 vs 10
  • Tamil: 1-2 vs 3-10

That’s pretty neat!

Anyway: there’s still a lot of argument on how exactly children learn, whether they start with particular cognitive abilities, whether they have particular linguistic abilities. Many authors point out that innatism doesn’t really help reduce the problem. E.g. to see if dog matches Inbuilt concept #293291, you pretty much have to have a sense of what a dog is. If you have that, what good is the inbuilt concept?

You could try to save innatism by multiplying the number of inbuilt concepts. E.g. you include the 10 steps of the comitative/instrumental gradient, and both Korean and English positioning concepts, and both English and Tzeltal directional systems. But this is only complicating the child’s problem. Rather than finding quick matches between the words they hear and a small number of universal concepts, they have to consider hundreds or thousands of alternative conceptual systems.

It’s also worth pointing out that parents are far more helpful than Quine’s native informant. People don’t just say words at random. As Michael Tomasello emphasizes, language is often presented as a commentary on a situation the child already understands, such as moving toys around with her mother. There’s a lot of repetition; the parents’ language is emphatic and simplified; the parents are not trying to confuse the child with talk of bags of rabbit parts.

BTW, this is in theory the last book I’m consulting for my syntax book.  So, I’ll soon have a first draft, at least.

 

I wanted to talk about my latest syntax toys, so I decided to post all three of them: ggg, gtg, mg.

To fully understand them, you’ll have to wait for my upcoming syntax book. But in brief: they are all apps for generating sentences.

  • ggg rearranges strings. You can use this for the toy grammars that syntacticians and computer programmers always start their books with, but it can handle everything in Syntactic Structures. I’ve loaded it with some interesting sample grammars.
  • mg is the equivalent for the Minimalist Program. It’s actually way more fun than reading Chomsky, in much the same way it’s much more fun to try painting a watch than to watch paint drying. I’ll explain the basics in another post.
  • gtg rearranges trees. The idea is that the program knows about syntactic structure, so you can have rules that talk about or rearrange an NP, no matter what’s in it.  You can do this in ggg only by writing rules that apply to elements before they’re expanded into subtrees.

I’m going to talk some more about gtg, since I’ve been working avidly on it for the last few weeks.

I showed some of these to a non-linguist friend, and I think he was polite but didn’t get it. That’s fine; like I say, it requires a book to explain. But from his questions, like “Could you write poetry with it?”, it was clear that he expected it to be something rather different– a wide-ranging text generator.

That is, he was more or less measuring intelligence by the size of its vocabulary.  gtg only knows about a dozen nouns and a dozen verbs (and some other stuff). It would be possible to add a hundreds more, but that’s not the point.  The point is to model basic English syntax.  That’s hard enough!

It’s not an ultra-hard problem by any means, or I couldn’t have done it in a few weeks. On the other hand, I had Chomsky’s and other linguists’ rules to start with!

The thing is, English speakers all know these rules… unconsciously. Which means you’re not impressed when you see someone produce a simple but correct sentence. Well, let’s see how aware you are of the rules.  Here are some variants of sentences:

  • The fish were caught by her
  • She has eaten fish
  • She must like fish
  • She’s eating fish

That’s passive, perfect, modal, and progressive. All four can occur in one sentence. Without trying out alternatives in your head, what order do they appear in?

Here’s another: some sentences require an added do, some don’t:

  • We don’t keep kosher.
  • Did you take out the trash?
  • What does the fox say?
  • We aren’t going to St. Ives.
  • Can’t you keep a secret?

Again, without trying it out in your head, just from general knowledge: can you state when this added do appears?

Or, can you say precisely you use he and when you use him?  If you are a conlanger or you know an inflected language, you probably immediately think “He is nominative.”  Well, what about in Sarah wants him to move out? Him is the subject of ‘move out’, isn’t it?  (It’s not the object of want. What does Sarah want? “Him”? No, she wants “for him to move out”.)

The rules aren’t terribly difficult… indeed, if you look in the boxes on the gtg page, they’re all right there! But they’re difficult enough to make a fairly involved computing problem.

Now, syntacticians devising rules like to use formal notation… but they almost always supplement it with English descriptions. Programming forces you to be much more explicit.

Now, when I began the program, I started out with rules that looked something like this:

T:+:Neg≠Aux:^do

If you look at mg, the rules are still like that… and since I wrote that a few months ago, I don’t even remember how they work. But besides being unreadable, such rules are very ad hoc, and hide a bunch of details in the program code.

What I ended up doing instead was writing myself a tiny programming language.  This forced me to come up with the smallest steps possible, and to encode as little grammatical information as possible within the program itself.

Here’s an example: the rules for making a sentence negative.

* negative
maybe if Aux lex not insert Neg
maybe if no Aux find T lex not insert Neg

The first line is a comment. The rest are commands.

  • Maybe says that a rule is optional– the program will execute it only sometimes.
  • If looks for something of a particular category, in this case an auxiliary verb. If it’s not found, we skip to the next rule. If it is, we remember the current location.
  • Lex not means to look up the word not in the lexicon and keep it on the clipboard.
  • Insert says to insert what’s on the clipboard into the sentence at the current location.

Note that this mini-language only has two ‘variables’, what I’ve called the clipboard and the current location. I haven’t found a rule yet that requires more than that.

The help file for gtg explains all the commands and annotates each of the grammatical rules I used.

This is not how syntacticians write their rules; but one conclusion I’ve come to after reading a bunch of syntax books is that all the formalisms are far less important than their inventors think. Chomsky started people thinking that there was One True Theory of Syntax, but there isn’t. It’s less like solving the Dirac equation and more like proving the Pythagorean theorem: there are many ways to do it, and the fact that they look and feel different doesn’t mean that most of them are wrong. Writing rules in this simple language worked out for me and it’s no worse than, say, the extremely unintuitive rules of Minimalism.

Can you use these toys for writing grammars for your language or conlang?  Well, best to wait for the book to come out, but in general, sure, you can try.

I have to warn you, though: it’s not quite as straightforward as using the SCA, and plenty of people have trouble with that.  You have to think like a programmer: be able to break a problem into tiny pieces, and work out all the complications.

On the other hand, tools like gtg can help keep you honest: if the rules don’t work, the program produces crappy sentences, so you know something’s wrong. Plus it keeps you thinking about a wide variety of sentences. (Good syntacticians can quickly run through a bunch of transformations in their heads, applying them to some problem. When you’re new to the concept, you can think only about simple sentences and miss 90% of the complications.)

Also, I hope to keep improving the program, so it may be easier later on.

The library has slim pickings on linguistics, but it happened to have a couple of books on opposite sides of the innatism debate: Ray Jackendoff’s Patterns in the Mind, and Daniel Everett’s Language: The Cultural Tool.

conan-jungle

File photo of Everett in the Amazon researching Pirahã

Overall judgment: both books are full of interesting things; both are extremely convinced of their position; both reduce their opponents (i.e. each other) to straw men.

It’s a lesson, I suppose, in letting one’s speculations get ahead of the evidence. Many a Chomskyan book has a long annoying section on how children could not possibly learn language; the arguments are always the same and they’re always weak. The Chomskyans’ problem is that they don’t spend five minutes trying to think of, or combat, any alternative position. They present the “poverty of the stimulus” as if it’s an obvious fact, but don’t do any actual research into child language acquisition to show that it’s really a problem.

Yet Everett doesn’t do much better on the other side. He’s all about language as a cultural invention, and he mocks the Chomskyans’ syntax-centrism and their inability to explain how or why Minimalism is embedded in the brain and the genome, but he doesn’t really know how children learn languages either.

My sympathies are far more with Everett, but an honest account has to admit: we just don’t know. Well, the third book I picked up is a massive tome called Language acquisition and conceptual development, so I’ll report back if it turns out we do know.

Sometimes the two authors cover the same facts– e.g. what’s going on with Broca’s and Wernicke’s areas in the brain. Their account is different enough that it seems that both are cherrypicking the data. Jackendoff doesn’t mention the non-linguistic functions of these areas, while Everett pooh-poohs that they’re language-related at all.

What ends up being most valuable about both books is when they’re talking about other things. Everett is full of stories about Amazonian peoples and languages; Jackendoff has a very good section on ASL.  (They both also have quite a bit of introductory linguistics, which I could have used less of. Sometimes it’s a pity that academics only have two modes, “write for the educated high schooler” and “write for each other”. I suspect their editors overestimated just how many people entirely ignorant of linguistics would read each book. I guess I’m lucky that readers of my more advanced books can be assumed to have read the LCK.)

I don’t mean to sound entirely dismissive. In fact Jackendoff makes the Chomskyan case about as well as it can be made (far better than Chomsky ever does); but if you find him convincing, make sure you read Everett to get a fuller perspective.

(I may also be unfair to Jackendoff calling him a Chomskyan; apparently he’s broken with Minimalism. He also makes a point, when pointing the reader to books on syntax, to include a wide range of theories, something Chomsky himself and his acolytes don’t bother to do.)

I don’t always get a chance to combine linguistics and gaming, so STRAP ON.

Overwatch-Hammond-Wrecking-Ball

So, Overwatch is getting a new hero who’s a hamster. An adorable hamster piloting a deathball.  It’s pretty neat, check it out.

PCGamer has an article today on the hero’s official name, Wrecking Ball, and why many people prefer the hamster’s name, Hammond. Which I kind of do too. Though the French name is even better: Bouledozer.

But the article has a density of linguistic errors that made me simmer.  Kids these days, not learning basic phoneme and allophone theory.  Listen:

The three syllables in Wrecking Ball use three main sounds: the ‘r’ sound, the ‘i’, and the ‘ɔ:’. …you position your tongue and lips very differently when you pronounce these sounds, and you can feel this when you say it. To make the ‘r’ sound in ‘wre’, you curl your tongue up to the roof of your mouth. To make the ‘i’ sound in ‘king’, you keep your tongue up high but bring it forward to the front of your mouth while stretching out your lips. Finally, to make the ‘ɔ:’ sound in ‘ball’, you put your tongue low and bring it to the back of your mouth while also bringing your lips together.

OK, everything sounds complicated when people don’t have the terms to discuss it. There’s only one big error– they’ve confused [i] as in machine with [ɪ] as in bin. You stretch your lips for [i] but not [ɪ]. Anyway, the word isn’t that complex: /rɛkɪŋbɔl/. You pronounce much harder words many times a day. (Try strength, or Martian, or literature.) In rapid speech it will probably simplify to [rɛkɪmbɔl] or [rɛkĩbɔl].

In other words, saying Wrecking Ball puts your tongue and lips all over the place with no clean pattern or loop to connect the sounds.

Huh?  Words do not need any “clean pattern or loop”.  There are some patterns to English words (phonotactics), but “wrecking ball” is absolutely typical English.

And it doesn’t stop there: the ‘wr’ consonant blend is naturally awkward in the same way the word ‘rural’ is awkward, and the hard ‘g’ and ‘b’ in Wrecking Ball put unnatural stops in your speech.

The wr isn’t a blend, it’s one sound [r]. Rural is mildly awkward because it has two r sounds, which wrecking ball does not.

Edit: Alert reader John Cowan points out that some speakers do have [i] in final –ing; also that initial /r/ may be always labialized. For me, there’s some lip rounding in /r/ in all positions.

There is no hard g in wrecking. There is no such thing as a hard b.  Stops are not unnatural; heck, let me highlight all the ones the author just used:

And it doesn’t stop there: the ‘wr’ consonant blend is naturally awkward in the same way the word ‘rural’ is awkward, and the hard ‘g’ and ‘b‘ in Wrecking Ball put unnatural stops in your speech.

I highlighted nasal stops mostly because the dude is terribly concerned with what the tongue does, and tongue movement for nasal stops is exactly the same as for non-nasal stops.

Compare that to Hammond, paying close attention to the way your mouth moves when you say it. Not only is Hammond two syllables instead of three, it also barely uses your tongue. Your lips and vocal chords do most of the work, which, ironically, is why it seems to roll off the tongue. Plus we get the added alliteration of Hammond the hamster.

Hammond is [hæmnd], with syllabic n. I’ll grant that it’s two syllables long, but I don’t know why the author is so focused on tongue movements– presumably he’s not aware that he’s moving his tongue for æ and the final [nd]?

It’s true that Wrecking Ball contains two liquids, which is hard for some children, but shouldn’t be a problem for adults. (And English’s syllabic n, not to mention the vowel æ, are hard for many foreigners.)

As for alliteration, Hammond Hamster is maybe too cutesy. They didn’t call Winston Gary Gorilla.

(In the French version, Roadhog and Junkrat are Chopper et Chacal, which is actually a pretty nice alliteration, calling out their partnership.)

Of [the longer] names, five end on long vowels: Orisa, Zarya, Symmetra, Zenyatta and Lucio. Interestingly enough, four of these five end on a long ‘a’ because it’s an easy and pretty sound for punctuating names (which, if you’re wondering, is also why so many elves in high fantasy settings have names like Aria).

Argh: these are not long a; that’s the vowel in mate. These end in shwas, [ə].

And while we’re at it, Tolkien is largely to blame for elven names, and in this long list of his elven names, just one has a final -a. He liked final [ɛ] far more. If other writers use more, they are probably thinking vaguely of Latin.

If the dude really doesn’t like the name, all he has to say is:

  • It’s longer
  • It’s final-stressed.

Names are a tiny bit awkward if they have two stressed syllables, especially if they end in one. The only other Overwatch hero with this stress pattern is Soldier 76, and he’s usually just called Soldier. But it’s not that awkward; it’s also found in such common expressions as Jesus Christ, Eastern Bloc, Lara Croft or U.S.A.

 

 

 

 

After 20 years, I’ve rewritten the Verdurian reference grammar.

fl-verd.jpg

The main motivation was my syntax book.  I want to be able to tell conlangers how modern syntax can deepen your conlang, and I figured I should make sure I have a really good example.

Now, you’ll see that I did it without drawing a single syntactic tree. That never seemed to be necessary, though I do have some discussion of transformations, and I mark subclauses and talk about underlying forms. The main influence of modern syntax is in adding more syntactic stuff, and thinking more about how things interrelate.

To put it another way, if you don’t know much modern syntax, you’ll write one relative clause and call it a day.  But once you’re familiar with syntax, you start to think about what you can relativize, and how nonrestrictive relative clauses work, and headless clauses, and what’s the underlying form for headless time clauses, and such things.

I also took the opportunity to add glosses to all the examples, provide a new long sample text, redraw the dialect map, add new mathematical terminology, add pragmatic particles, and in general update the presentation to how I write grammars these days. I also html-ized the Verdurian short story I translated long ago. And subcategorize all the verbs in the dictionary. And provide margins.

FWIW, though much of the content is similar, it’s all been rewritten– I very rarely simply copied-and-pasted. Plenty of little things have been added, and some old bits removed. (E.g. the descriptions of the dialects, which I hope to expand on in more detail.)

An example of a little change: the morphology section no longer goes case by case, a method that makes it hard to look up forms.  And I changed the expository order to nom – acc – dat – gen, which makes it easier to see when the nom/acc forms are the same. (If it’s good enough for Panini, it’s good enough for me.)

Verdurian is still not my favorite language (that would be “whatever language I created last”), but the problems are mostly lexical.  And it’s a little too late to redo the vocabulary yet again.  At least I can say I’m pretty happy with the syntax now…

 

 

 

Next Page »