I’ve updated the Numbers from 1 to 10 page!  For the first time in, well, many years.

Note: if you don’t see the new page, because the old page is cached, you may have to hit shift-refresh.

The major features:

  • It now makes extensive use of Unicode to finally present the numbers as they were intended to be seen. (If you can’t see all the characters it’s dredged up, check the notes page for how to download comprehensive Unicode fonts.)
  • As a corollary, I’ve started to include the native writing system for key languages.
  • The families are color-coded to help you navigate.
  • The page uses Javascript to allow you to customize the results.

Now the story behind the update. The original source file was an enormous Mac Word 5.1 file. To generate the html files, I would output the source file into RTF (which is how you were supposed to access .doc files). Then I ran a custom C program that converted the RTF into html.

So far so good, only my old PowerPC died a few years back, which meant I could no longer run Mac Word 5.1, which meant I couldn’t generate the RTF or the html files, which meant no updates period.

Sigh, Mac Word 5.1, released in 1991, was a thing of beauty. It had little of the cruft of later versions of Word, I had all the commands in muscle memory, and on the PowerPC it was damn fast. Plus it never crashed. I had to switch to Word 2008 when I needed Unicode, but I kept using 5.1 until I couldn’t. I’ve gotten used to Word 2008, but it is just not the reliable workhorse that 5.1 was. It crashes unpredictably with certain large files, especially if they have a lot of formatting— as many of my books do.

Word 5.1 had a neat feature that I used extensively for the numbers list: you could overtype characters. This was necessary to represent the many many arcane and wacky characters that linguists have used over the last couple centuries to write their grammars and wordlists. Word 2008 can read these, but apparently can’t create them.

I had long envisioned a database or a text document that could hold the numbers, letting the web page itself be very simple. I was a bit worried that the database would be huge and slow, but then I remembered that most web pages these days pull down megabytes of cruft.

So, the source file is now plaintext.  I still use Word to create it, because it looks better there and I can use bolding to help me navigate, but all I do to make the plaintext file is copy and paste into TextEdit. It turns out that the whole file is only 400K, far smaller than the 1.4M html file that was the old mondo partly-Unicoded version. The text file is human-readable, but some pretty simple Javascript reads and prettifies it for the actual web page.

There are probably some typos in the file, due to quirks in the old process which I may have missed or messed up during the conversion process.  On the other hand, there were a lot of kludges in the old html version; the new version is much closer to the original sources.

I haven’t dealt with the sources page yet. (The problems are similar; it will be a another fairly tedious project to update the document and access page.)

Edit: The sources page is done now too! As with the numbers page, you can zero in on specific regions.

If you happen to be a linguist or for some other reason study the less-spoken languages, I’m always open to additions and corrections, and finally I can make them again.

 

Advertisements