At Mefi, someone linked to this very interesting account of failure at a programming startup. There’s a lot to say about Lawrence’s story, starting with the fact that, now 40 years on, dev shops still don’t understand The Mythical Man-Month. Also, that Agile does not give you the magical ability to skip the step where you work out the architecture and internal APIs of your app.
But I want to focus on this: “The idea is brilliant: Natural Language Processing as an interface to interact with big Customer Relationship Management tools such as SAP.”
I’m’a let you finish, but no, it’s not brilliant.
Instead of the user clicking a button to edit old Contacts, they would type, “I want to edit the info about my contact Jenny Hei,” and we would then find the info about Jenny Hei and offer it to the user so they could edit it. That was the new plan.
This was a brilliant idea. Salespeople hate software. They are good dealing with people, but they hate buttons and forms and all the other junk that is part of dealing with software.
People always think that the ideal program would understand English, so all you had to do is talk to it about your problem, and it goes and does it.
At least one early example is Asimov’s “Second Foundation”, from 1949. A teenage girl, Arkady, is using a word processor to type a paper (which happens to give us the exposition for the story). But she’s interrupted by a real world conversation, the conversation gets recorded in the paper, and she apparently has no way to edit or delete the extra comments– the paper is ruined and she has to start over. What a horrible UI!
Let’s go back to Lawrence’s example. Posit, for a moment, that the UI works as intended: you can type
I want to edit the info about my contact Jenny Hei.
and the app gets ready to do just that. Awesome, right?
Yes, the first time. When the alternative is looking over an unfamiliar program to find the Contacts button… let’s not even talk about having to watch Youtube tutorials to learn how to use the program, as I’ve had to do with Blender… then just speaking an English sentence sounds very attractive.
The second time, that’s OK too. The tenth time, especially ten times in a row… you’re going to wonder if there’s a better way. The thousandth time, you’re going to curse the programmer and his offspring to the third generation.
If you’re a programmer… is this how you want to program? Do you normally write in COBOL?
SUBTRACT A B C FROM D GIVING E
Admit it, you thought it was pretty neat when C let you say
a = a + 1
If you actually had to use Lawrence’s interface, you’d breathe an enormous sigh of relief if someone installed a mod that let you type
EDIT CONTACT JENNY HEI
And you’d be even happier if the mod allowed you to hit the Contacts button, type J in the search box which is automatically enabled, and hit enter.
It’s not that interfaces can’t get too arcane! You can get a lot done if you know EMACS really well… but for most people it’s about as easy to master as quantum mechanics. A WYSIWYG word processor is much nicer. But notice that we don’t edit by saying
HIGHLIGHT THE WORD “CAN”
THE NEXT ONE IN THE FILE PLEASE
REPLACE IT WITH THE WORD “CAN’T”
MOVE THE CURSOR TO THE END OF THE DOCUMENT
Who has time to type all that? Or say it, for that matter?
Would you want to drive your car that way? No, for the same reason you wouldn’t want to drive it with the WASD keys. Spoken language is just not very precise. (Do you think you could direct a robot on how to change lanes? First, how do you communicate exactly how far to turn the wheel? Second, you probably don’t know how yourself— only your cerebellum knows.)
And this is all assuming that you can program a computer to understand spoken commands. Lawrence’s team evidently didn’t realize that what they were asked to do was implement an AI, at a level that has never been done.
What you probably can do is a more forgiving COBOL. That is, you create a toy world, not unlike Terry Winograd’s SHRDLU, and work out an English-like code to manage it which is rigid in its own way, but happens to recognize a lot of keywords in different orders. For instance, maybe it can handle all of
I want to edit the info about my contact Jenny Hei
Edit the record under Contacts for Jenny Hei
Search for Jenny Hei in Contacts.
Find Jenny Hei using the Contacts file and let me edit.
Good work! Now are you quite sure you also allowed these?
I should like to modify the particulars about Jenny Hei, a contact.
Get me Contacts; I’m’a edit Jenny Hei’s record.
Change Jenny Hei’s name to Mei. She’s under Contacts.
That record I added yesterday. Let me change it.
Lemme see if I… I mean, I need to check the spelling… just let me see the H’s, OK? Oh I’m talking about the ‘people I know’ feature.
Is there a Hei in the Contacts thingy? It might be Hai. First name is Jennifer. Did I record it as Jenny?
Natural language is hard. It’s fractally hard. You may be able to interpret simple sentences– like in all those Infocom games of the ’80s– but actual language just throws on construction after construction. Linguists have been writing about English syntax for more than sixty years and they’re not done yet.
And that’s before we even get into incomplete or ambiguous queries! The user leaves off the key word “Contacts”, or isn’t clear if they’re adding or editing, or gives the name wrong, or gives the name right only it’s recorded wrong in the database, or gives all the edits before saying what they apply to, or the name in question sounds like a command, or the user is malicious and insists that the contact’s name is Robert’); DROP TABLE students;– …
The more you produce the illusion that your app is intelligent, the more users will assume it’s way more intelligent than it is. And when that fails, they will be just as annoyed and frustrated as if they had to learn to push the Contacts button in the first place.
I know a bunch of people are jumping up and down and saying But Siri! Well, first, Google “siri fails”… this could eat up your whole morning. Siri is quite impressive (and has a megacorporation behind its extensive programming), but it has a relatively limited domain (the basic apps on your phone), and also– so far as I know, I don’t have an iPhone– it can’t get deeply into trouble, so its errors are funny rather than devastating.
One of programmers’ oldest dreams, or snares, is to write an interface that’s so simple to use that the business analyst can write most of the app. I’ve fallen for this one myself, more than once! The sad truth is even if you do this task pretty well, non-programmers aren’t going to be able to use it. To program, you have to think like a programmer, and this doesn’t change just because you make the code look like English sentences. I’ve addressed this before; the basic point is, it doesn’t come easily to non-programmers to think in small steps, to remember all the exceptions and hard cases before they come up, or to understand the data structure implied by a process.
Again, this isn’t to say that most app UIs are OK. Nah, they’re mostly horrible. But a) people will learn them anyway if they have to, and b) improving them is almost never a matter of making them more like natural language.
Lawrence notes that “salespeople hate software”, and I’m sure he’s right. However, he focuses on the “forms and buttons”, as if these were the sticking point. They’re not. Salespeople like making money– Joel Spolsky joked that they’re like The Far Side‘s Ginger
except that instead of only hearing “Ginger” they only hear “money”. Which is great! Software companies need people to go out and make sales. But salespeople are not jazzed about doing paperwork, or database work, which includes editing the contact page in SAP for Jenny.
The irony is that Lawrence, later in the article, runs into exactly the same situation with other developers, but doesn’t make the connection. What devs hate doing is documentation. Lawrence wants his fellow devs to keep a couple pages in the wiki up to date with their APIs, and they just won’t do it, unless he nags them to death. Is this a UI problem, as Lawrence thinks SAP has? No, it’s a motivation problem, or a mental skillset problem, or something… and whatever it is, it’s even harder than natural language programming.
(All this doesn’t mean a natural language interface would never be a good idea. Though come to think of it, large companies that only handle voice recognition for their customer service phone number… that sucks. They’s too slow and they fail at recognition half the time. So if you’re a programmer writing an app used by the executives of those companies, that’s when you write a program that requires spoken natural language input.)
Edit: One more thought. Talking about editing the contact list presupposes that the user understands “editing” and “the contact list”. In this context, this is supplied by SAP itself: the customers can be presumed to understand that application’s processes and categories. Right? It’d be interesting to know how close the customers’ user model is to the actual workings of the product. (Hint: don’t assume it’s very close.)
If a user agent was really smart enough to understand English, I’d expect it to be smart enough to fill in gaps. Do you need to specify “Contacts” if there is only one Jenny Hei in any name field? If there’s only one Jenny under Contacts, can I leave out “Hei”? Can I define my own categories and tags and use those? If I were talking to a human, I could say “I’m meeting Jenny on the 18th” and they’d break that down into steps: find the table Jenny is in, add a note about the meeting, find the calendar, add the appointment, set up an alarm for the 17th. If your app can’t do all this, you don’t have “natural language processing”, you have a verbose but limited special command language.