languages


I’ve been working on a few projects. One is making flags for the nations of Ereláe. You can see what I’ve done so far here.

flagexx

Rather more surprising: I’ve been translating a French novel. No, you haven’t heard of it; that’s why I’m translating it. 🙂  I hope the author won’t mind my naming it: it’s Damien Loch by Shan Millan. It’s a fantasy novel,  but a rather satirical and contrarian one. It’s more or less “What if your asshole neighbor turned out to come from another world… but he was still an asshole?”

One of the fascinating bit about translating is how styles and wording differ between even quite similar languages. Conlangers, take note! One way to put it is: if the word-for-word gloss from your language sounds just like English, you probably haven’t worked out your language’s style enough.

For me at least, the problem is that reading the French original, the French style starts to sound natural, and the English ends up strange and wooden. So of course I have to go over the English and make it sound, well, English.

In return, Shan is translating the Language Construction Kit— the book— into French! I tried to do this myself a few years ago, and didn’t get very far, mostly because of the same style problem. I could make a French version myself, but it would be horribly awful. 

Anyway, this will eventually be very exciting for English fantasy fans, and French conlangers.

The other project came out of my work on syntax: I decided to finally update the Verdurian grammar. In the syntax book I want to explain how you can use modern syntax to inform your conlang’s grammar, you see, and I thought I’d better do it myself first. (Not that there wasn’t syntax in the grammar before, but now there’s more.)

I’ve also taken the opportunity to make the grammar easier, and harder. Easier, in that I can explain some things better, and get rid of what I now think are confusing presentations. (Also, there will be glosses for all the examples, a practice I now think is indispensable.) Harder, in that I don’t feel that I have to explain basic linguistics in every grammar, especially since the ‘easy’ route is already there in the form of guided lessons.

(No finish date yet, but it shouldn’t be too long.)

(I’m also hoping to include actual Verdurian text, for people who have the right font.)

Advertisements

Is there a good methodology or series of questions one should ask themselves when determining what the “alphabetical order” will be of one’s alphabet or other writing system? Is there any particular reason why “A” should be before “B” and that before “C”?

—Colin

Great question— the answer may be a bit disappointing. The obvious thing to do is to look at natural language alphabetical orders. Only…

  • The alphabet was really only invented once— by the Canaanites, some time after 2000 BC. Everyone else, including the Jews, Arabs, Greeks, and Romans, adapted their system and kept their order.
  • We really don’t know why that order prevailed. No one even seems to have any good guesses. The World’s Writing Systems never, so far as I could see, covers the topic.

(There are actually two attested ancient orders, you can see a comparison here.)

(Also, technically, the Canaanite writing system was a consonantal alphabet, or abjad. Later, partial vowel symbols were used. The Greeks were the first to represent all their vowels.)

So far so disappointing, but we also have the example of the Brāhmī script, which is the ancestor of Devanāgarī and other Indian and SE Asian scripts. This arose around 300 BC, and the interesting thing about it is that its order is phonetically motivated. Letters are grouped by point of articulation (it starts क ख ग घ ka kha ga gha), and the secondary order is from the back of the mouth to the front: ka…, ca…, ṭa…, ta…, pa… Finally there’s semivowels and then sibilants. A linguist couldn’t have done a better job. The Brāhmī order very likely influenced the order in Korean and Japanese.

(The a‘s aren’t just part of the letter— in these systems a symbol has an inherent vowel. So क alone is ka. You add diacritics for other vowels: कि ki, कु ku, etc.)

There’s one other scheme that might appeal to you: mnemonics. A real-world example is the iroha order for the Japanese kana. It’s a poem which includes every character in the syllabary just once, and still serves as an alternative order for the kana.

Since there aren’t many real-world examples, I think a conlanger is also entitled to use any crazy system they can come up with…

When you’re reading and writing about syntax, then you see syntax everywhere.  E.g., I just found this gem on Twitter:

Quoted in NYT is not something I was expecting to get when becoming a socialist.

Let’s do some syntax!  First, what is this?  It’s not quite like anything else in my bestiary of transformations. It’s an extraposition of a V’ from the VP, but I can’t make parallels with other auxiliaries:

?Quoted in the NYT is not something I was expecting to be.

?Printed by a major publishing firm is not something I was expecting to be.

*Brought down three prime ministers is not what Brenda thought she would ever have.

The first two are maybe marginal. But simpler statements definitely fail:

*Eaten tripe and onions is not something that I have.

*Eating tripe and onions is not something that I am.

*Compared to a troglodyte is not something that I’ve got(ten).

The sentences seems closest to Pseudo-Clefting, but it doesn’t quite work:

What ended the Martian threat was bacteria.

What he dreams of is being profiled by both Forbes and Dungeon.

What he never expected was getting quoted in the NYT.

*What he never expected getting was quoted in the NYT.  

On the other hand, other uses of get seem to work:

A case of 200-year old wines is not something I was expecting to get from my grandfather.

A rock is what I got.

So I think the best I can come up with is that the Twitter sentence works by analogy from the physical to the auxiliary sense of get. This would help explain why the sentences with get sound better than those with be.

It’s fine, by the way, if you don’t quite accept the original sentence. I’m not 100% sure I do either, but I don’t find it clearly ungrammatical either.

If you’re a conlanger, this construction is worth thinking about— not that you should copy it, but are there any other areas where the syntax can be stretched like this? It’s all too easy to just come up with a straightforward example of (say) the passive, and never think about possible, impossible, and in-between variants.

Today’s reading: Grammaire générale et raisonnée de Port-Royal, by Antoine Arnauld and Claude Lancelot, published in 1660.  Let’s call it PRG. You can get it here, in the 1803 edition. My understanding was that it was a precursor to Chomsky, and in fact he claimed as much in a book, Cartesian Linguistics (1965).  Spoiler: it isn’t.

grammaire

Undoubtedly your first question will be: as a grammar from Renaissance times, how does it compare to the word of Šm Fatandor Revouse, Pere aluatas i Caďinor? In overall coverage and linguistic knowledge, it’s fairly similar— for instance, PRG, like Šm Revouse, stumbles in the phonology section through not having any vocabulary or notation for phonetics; both are reduced to talking about letters.

On the other hand, PRG is fairly free of the sort of metaphysical and nationalistic nonsense that Šm Revouse indulges.  In particular, they never claim that the ancient languages are better than the modern, nor do they try to find spiritual categories or whatever within language. They acknowledge at several points that many aspects of language are arbitrary, and vary between languages. (They do sometimes appeal to the notion of ‘elegance.’)

By the way, see here for an argument from W.K. Percival that there is really no such thing as “Cartesian linguistics” at all, that PRG was not particularly innovative or Cartesian, and that Descartes’ idea of language, to the extent he had any, had very little resemblance to Chomsky’s.

Anyway, what is PRG? It’s not really a grammar at all, either of French or the ancient languages. It could be called a sketch of a comparative grammar, or an overview of the concepts needed to study grammar. So it starts with sounds, then discusses nouns, pronouns, adjectives, cases, verbs, etc.  It never gives enough information to fully cover any topic or tell you in detail how a language handles it, but it does define all grammatical terms, gives examples, and opines on what the functions of each thing are.

Chomsky felt that his notion of “universal grammar” was prefigured here, but I’d say PRG starts from the pretty obvious fact that a similar grammatical analysis can be used for the major languages of Europe. PRG never really runs into a fact about modern French that can’t be described using the terms of classical grammar.  So, for instance, they are perfectly aware that French nouns don’t have case, but they find it useful to relate subjects to the Latin nominative, PPs with de to the genitive, and PPs with à to the dative.

The languages covered are very few: of ancient languages, Latin, Greek, and Hebrew; of modern, French, Italian, and Spanish. There are a couple of references to German; none at all to English, and nothing on languages the authors surely were aware of: Basque, Bréton, Dutch, Portuguese, Arabic.

Chomsky went so far as to assert that PRG prefigured his “surface and deep structures”. This is completely absurd; PRG talks about things like subjects and predicates and propositions, but this was bog-standard thinking about language since ancient times. They come a little closer in this passage on adjectives:

When I say Invisible God has created the visible world, three judgements occur in my mind, contained in this proposition. Because I judge first that God is invisible. 2. That he has created the world. 3. That the world is visible. And of these three propositions, the second is the principal one, and the core of the proposition; the first and the third are incidental to it.

The idea that the adjective invisible applied to God represents a proposition God is invisible reoccurs in generative grammar. On the other hand, it is not part of a transformational view of language, nor it is part of a systematic treatment of semantics. It’s really a pretty basic observation about adjectives… if you want to say what an adjective is, you’re almost bound to observe that it says what something is like. It doesn’t mean that you’ve invented deep structure, or phrase structure rules.

There are some interesting bits where the authors try to relate meanings to other meanings. E.g. they say at one point that Petrus vivit ‘Peter lives’ is equivalent to Petrus vivens est ‘Peter is living’, or that Pudet me ‘I am ashamed’ is equivalent to Pudor est tenens mihi ‘Shame is had by me’. You could generously relate this to generative semantics, except backwards: GS tends to make verbs primitive, while PRG tries to restate verbs in terms of adjectives or nouns.

But we really have to avoid overinterpreting texts in terms of current theories. PRG is, by modern standards, hobbled by a lack of semantic terms and frames of reference. The authors didn’t have predicate calculus to think about, or Minsky’s idea of frames, or Fillmore’s idea of semantic roles, or Rosch’s prototypes or fuzzy categories, or Lakoff’s ideas on categories and metaphors.

They’re doing the best they can with the concepts they have. On verbs, for instance, they reject the old idea that verbs represent actions or passions, pointing out (quite rightly) that there are stative verbs which are neither. They propose that the essence of a verb is that it affirms something— that is, it asserts a proposition about something. The prototypical affirmation is the word est “is”, which is why they restate Petrus vivit as Petrus vivens est. Essentially they’re reducing sentences with verbs to things they have already discussed: objects and attributes.

They have a very short chapter on syntax, whose content is rather disappointing. It amounts to these observations:

  • Some words have to agree with each other, to avoid confusion.
  • Some words require each other (e.g. nouns and subjects), and some words depend on another (e.g. adjectives on nouns).
  • When everything is stated well, without rhetoric, the order of words is “natural” and follows the “natural expression of our thoughts”.
  • However, sometimes people want to be fancy, and they omit words, insert superfluous words, or reverse words.

I’m guessing they were in a hurry to wrap up, because they certainly knew Latin well enough to know that the basic sentence order was different in Latin and French, but also could be more freely varied.

A minor point of interest: PRG frequently, like generative grammar, gives negative examples— things we don’t say. This was by no means common in grammars— Whitney’s Sanskrit grammar, for instance, doesn’t do this.

Should you run out and read it? Eh, probably not, especially as it turns out it’s not a precursor at all to modern syntax. It is interesting if you want to know how early rationalists approached grammar, e.g. if you wanted to write something like Šm Revouse’s grammar for your own conlang.

 

 

 

I’m up to page 220, which probably means I’m half done with the Syntax Construction Kit. So it’s time for another progress report.

The last book I read, Robert Van Valin’s An introduction to Syntax, is perhaps the least useful on the details of syntax, but the most useful on what syntax has been doing for the last forty years. There are two overall strands:

  • A focus on constituent structure, the path taken by Chomsky.
  • A focus on relations between words: semantic roles, valence, dependencies.

That’s really helpful, and it’s a better framing than the division I learned in the 1980s between Chomskyan syntax and generative semantics.  The problem with that was, in effect, that GS disappeared. So it kind of looked like the Chomskyans won the battle.

But like Sith and Jedi, you can never really get rid of either side in this fight. In many ways GS simply regrouped and came back as Cognitive Linguistics. Plus, it turns out that many of the specific GS ideas that Chomsky rejected in the 1970s… came back in the ’90s as Minimalism. In particular, semantic roles have a place in the theory, and even the semantic breakdown of verbs (show = cause to see) that GS emphasized years ago, and that Chomsky at the time bitterly resisted.

Also, an unexpected side path: in order to understand and explain a lot of modern theories, I’m having to re-read papers I read for my first syntax classes, nearly forty years ago. My professor had pretty good taste in what would prove important.

There’s two challenges in writing this sort of book.

  • How to communicate that Chomsky isn’t the only game in town, without simply writing a brusque travelog of maybe a dozen alternatives
  • How to make this useful and interesting for someone who just wants to write conlangs, man

Van  Valin scupulously divides his page count between the constituent and the relational point of view. I will emphasize relations far more than I originally intended to, but I’m still going to focus on constituent structure. Partly that’s because there’s so much to cover, but it’s also because I’ve already written quite a bit about relations and semantics in my previous books.

But in general, I’m trying for breadth of syntactic data, not depth in Minimalism (or any other school). The problem with the latter approach is that you may learn to create a syntactic tree that your professor won’t scribble red marks over, but you won’t learn why that particular tree is so great. Every theory can handle, say, WH-movement.

Hopefully, that will address the second challenge as well.  As the Conlanger’s Lexipedia gives you a huge amount of information about words, my aim with this book is to give you more things to put in your syntax section than you thought was possible. And hopefully some pretty weird things. Wait till you see a Bach-Peters sentence.

Plus, web toys! I don’t know why more syntax books haven’t been written by computer programmers; it’s a natural fit. Though I have to say: Chomsky should have run his ideas on Minimalism past a programmer. Some of Minimalism is beautifully simple: you can set out the basic structure of a sentence with a minimum of rules. Then, to handle tense and case, question, and movements, you have to add an amazing superstructure of arbitrary, hard-to-generalize rules. The idea is to get rid of ‘arbitrary’ rules like Passive, but the contrivances needed to do so seem just as arbitrary to me.

 

Here it is!

Generated sentence did that frog not sit on these fat big mice?

Note, it’s not minimal, it’s Minimalist. By that I mean, it’s generated by a program that uses Minimalist theory to build sentences.  Here’s the final tree:

CP
    C
        C:Q
        T
            V:did
            T:Past
    TP
        D
            D:that
            N:frog
        T
            T:<Past>
            VP
                Neg:not
                VP
                    D:<that frog>
                    V
                        V:sit
                        P
                            P:on
                            D
                                D:these
                                N
                                    A:fat
                                    N
                                        A:big
                                        N:mice

Still not clear?  I’ve spent the last few days creating a program to model Minimalism.  And I don’t even like it much as a syntactic theory! But I like it for its ambition: give some simple rules for building up a sentence word by word.  This is not, as you might expect, using phrase structure rules; it really is built up word by word, from the bottom up. And that makes it a natural match for programming.

For instance, the above derivation started with the word mice, randomly selected from a list of possible nouns. It then searches the lexicon for things that can be linked with a noun— basically, determiners or adjectives.  So it builds up a prepositional phrase (PP), then looks for something that can be linked with a PP.

The verb sit is marked in the lexicon as waning  PP and also a D. We’ve got the PP, so we can merge sit into the tree. The rules do not allow extending the tree downwards, only upwards, so to get a D we have to find another subtree (that frog), then merge to the left.

The stuff above that… well, that takes a lot more explaining than I can fit in a blog post; you’ll have to wait for the Syntax Construction Kit for that. As a teaser, though, when you see <things in brackets>, they’ve been moved up the tree to another spot; and some of the superstructure handles Do-support— that is, the fact that English requires an inserted do to handle questions that have only bare verbs.

Along the way the program handles determiner agreement (which is why we have these mice),  verbal inflections, and pronoun case (which didn’t happen to be triggered here).

Anyway, I’ll show you the program later; I’m not done with it, though it has about all the features I expect to have.  A lot of it is quite general; you could use it for a conlang or something, if you happened to really like Minimalism.  But some things are pretty kludgy, partly because Minimalism is clunky in spots, partly because English is. Do-support, for instance, is a really weird mechanism.

(Also, I know, didn’t the frog… would be more colloquial, but the current output is at least grammatical, so I may or may not fix that.)

 

I was out with a friend last night, and he asked about the book I’m working on, and I said it was on syntax.  So he asked, reasonably enough, what’s syntax?

Well, how do you answer that for a non-linguist?  This is what I came up with.

Suppose you want to make a machine that spits out English sentences all day long.  There should be no (or very few) repetitions, and each one should be good English.

How would you make that machine in the simplest way possible?

That is, we’re not interested in a set of rules that require the Ultimate Computer from Douglas Adams’s sf. We know that “make a human being” is a possible answer, but we’re looking for the minimum. (We also, of course, don’t want a machine that can’t do it— that misses some sentences, or spits out errors.  We want the dumbest machine that works.)

One more stipulation: we don’t insist that they be meaningful. We’re not conducting a conversation with the machine. It’s fine if the machine outputs John is a ten foot tall bear. That’s a valid sentence— we don’t care whether or not someone named John is nearby, or if he’s a bear, or if he’s a big or a small bear.

That machine is a generative grammar.

The rules of Chomsky’s Syntactic Structures are in fact such a machine— though a partial one.  And along with the book I’m creating a web tool that allows you to define rules and let it generate sentences with the Syntactic Structures rules, or any other set.  It works like a charm.  But the SS rules were not, of course, a full grammar.

Now, besides the amusement value, why do we do this?

  • It’s part of the overall goal of describing language.
  • It puts some interesting lower bounds on any machine that handles language.
  • As a research program, it will uncover a huge store of facts about syntax, most of them never noticed before.  Older styles of grammar were extremely minimal about syntax, because they weren’t asking the right questions.
  • It might help you with computer processing of language.
  • It might tell you something about how the brain works.

I said we wouldn’t worry about semantics, but in practice generative grammar has a lot to say about it. Just as we can’t quite separate syntax from morphology, we can’t quite separate it from semantics and pragmatics.

You might well ask (and in fact you should!), well, how do you make such a machine?  What do the rules look like?  But for that you’ll have to wait for Chapter Two.

At this point I’ve written about 150 pages, plus two web toys.  (One is already available— a Markov text generator.)

I mentioned before that my syntax books didn’t extend much beyond 1990. Now I’ve got up to 2013, kind of. I read a book of that date by Andrew Carnie, which got me up to speed, more or less, on Chomsky’s middle period:  X-bar syntax, government & binding, principles & parameters. The good news is that all this is pretty compatible with what I knew from earlier works, especially James McCawley.

I’m also awaiting two more books, one on Minimalism, one on Construction Grammar.

Fortunately, I’m not training people to write dissertations in Chomskyan (or any other) orthodoxy… so I don’t have to swallow everything in Chomsky.  (But you know, rejecting Chomsky is almost a full time job. He keeps changing his mind, so you have to study quite a lot of Chomsky before you know all the stuff you can reject.)

Next Page »