Motivation

This page describes my personal motivation that led to the PULC project.

The motivation

From a very early age, I was fascinated by the notion of an artificial device that exhibits intelligence through behaviors such as understanding language, making decisions, exhibiting personal opinions and preferences, etc. The thought that there is an object that on the one hand is intelligent and really understands me but on the other hand is just a machine that humans built exuded an air of almost mystical and spiritual mystery and fascination on my imagination.

As a kid watching “The Transformers” on TV as well as “Star Trek: The Next Generation” (Lt. Commander Data was my favorite character) and reading Asimov‘s robot stories, I was really excited about the idea of becoming a scientist who builds and programs such artifacts. I imagined that as in Asimov’s stories, I would come in the morning to the lab where there would already be an intelligent robot, and I would have a conversation with it which would be a lot of fun. At some point, the robot, being imperfect, would say something strange and that would be amusing, and then I would poke around in the robot’s brain and fix the problem.

Of course, I was very naive. As a kid, it didn’t occur to me that all this is just fiction, and in reality, humans don’t know yet how to build such artifacts. So when I later studied Artificial Intelligence, I got a little disappointed, because I had expected to be taught how to build a Data, but all I was taught and read about was very basic things. Still, the spark from my early days did not disappear, and it led me to pursue the current research direction.

One approach: harvesting natural phenomena to advantage

There are many areas of engineering that face complicated natural phenomena. What engineers usually do in those areas is try to gain some benefits from them by various means, first trying to pick the low hanging fruit, and gradually find more sophisticated ways to gain advantages from the phenomena. The latter is done by using more sophisticated models that are often adaptations of scientific theories. The scientists themselves model the phenomena initially using simple approximations and then gradually making the models more sophisticated.

As an example in engineering, consider oil extraction. At first, the easiest thing is to go to those places that have large concentrated repositories of oil, and just pump it out. After these sources are exhausted, more advanced technology needs to be developed to allow the pumps to reach the repositories that are harder to reach, or to allow the extraction of oil from less pure sources, or to increase the efficiency of the process, and so on.

Natural Language is a very complicated phenomenon. So one way to deal with it is by the same approach. From a scientific perspective, one could first devise crude statistical models of the most easy-to-predict aspects of language or of the most frequent phenomena, and then gradually refine them. From an engineering perspective, one could view the huge collection of text that exists out there as a huge resource and think up various ways that it could be “mined”, “harvested”, and exploited to advantage. One example is internet search engines. The low-hanging fruit is to use just keyword strings to filter documents. That’s obviously a very crude method. Then a bit more knowledge about words and morphology could be used. That’s still a very crude method, but it improves the quality of the yield a bit, and so on.

Real understanding

This kind of endeavor is not quite what I dreamed about when I was a kid. I did not dream about some device that merely exploits natural language data in some way or another for some useful purpose (although I obviously want to use such devices if they exist, e.g. I benefit very much from having better search engines and email spam filters). What I always dreamed about was a device that gives me the feeling it really understands me. Now that I know it is so much harder to do than I thought as a kid, I revise my goal. Not by abandoning my dream, but by narrowing it down. The purpose of the PULC project is still to make the computer really understand the user’s wishes and do it well, although it’s ok if that means the computer could do only a very small number of things.

A good example of such an application is a dialogue system on a particular topic that employs a textual interface. In such cases, it is quite possible to build very high quality devices that really understand what you are saying to them and what the goal or task you want to achieve is. This can be done by collecting many examples of conversations you want to have with the computer, and analyzing them fully, figuring out all the knowledge needed for these exact conversations to take place, and putting it all in the computer. At any stage, what the system can already do, it does very well, but there may be additional input sentences that the system does not yet understand. By using an interactive process of interacting with the system, finding its gaps in understanding, and adding more knowledge to deal with those gaps, a very high quality system can be constructed, that addresses almost all possible conversations on that particular topic.

Such a system appeals to my original dream because I feel it really understands me. It is also what the customer often wants. When I worked at Baobab Technologies, one of the central requirements that our customers had was that the system should never make guesses about what the user wants. The reason is that if the system makes a wrong guess, the user would need to correct the system. Correction sub-dialogs are notoriously hard to deal with, especially if the system keeps making guesses during those dialogs, which require even more corrections. Human users are already very frustrated with automated systems, and the last thing our customers wanted was to frustrate them even more with systems that keep misunderstanding them. They required that the system would know the limits of its own knowledge, and they preferred a “I don’t understand” response (and possibly redirecting the user to a human operator) to a potentially wrong guess.

By the way, another requirement of our customers for the banking transactions dialogue system was that the designers should have total control over the NL output that the system generates. This is related to legal issues — the bank would fear being sued by the system’s user if the system did not supply the user with 100% accurate information. Therefore, we had to use a precise generation algorithm where we had full control, and we could not use an algorithm that approximates this process after being trained on example outputs.

Such a system can be built from reusable components. For example, the parser and a lot of the general linguistic knowledge are domain independent and could be used in all instantiations of the system in various domains. So working in a restricted domain does not in the least bit mean working on a toy problem or that the system is not portable to other domains.

Statistical techniques can still be useful for certain practical purposes

The most obvious one is to collect data on conversations that people would like to have with the system (through Wizard-of-Oz experiments), and to calculate statistics on the frequency of inputs in those conversations. These could help the system’s designers prioritize all the kinds of knowledge they want to put into the system so that the knowledge that is needed for the more frequent inputs would be entered first.

The precise models and knowledge put into the system might be too general or generate too many possibilities. Statistics about which of those structures turned out to be actually useful in real conversations could be used as a heuristic method for pruning the structures that were never useful (although a more principled solution would be to devise more accurate models and knowledge).

In case where there could be more than one way to interpret what the user said, statistics about previous similar situations could provide a heuristic to help prioritize the interpretations in a list so that those interpretations that were more frequently used in the past would be checked first. Still, in a high-precision system, it is the real modeling and understanding of the domain, and not the heuristics, that ultimately decides whether or not an interpretation makes sense.

Another idea for other applications is to use high-quality understanding on those inputs that the system has been prepared to deal with well, and resort to statistics and other heuristics on unexpected inputs, while still alerting the user that the response is just a guess rather than an accurate answer.

Statistical methods could be used in aiding the system designers accumulate knowledge. Rough techniques could be used to “harvest” the low-hanging fruit from texts on the internet, where these methods could propose some rules of knowledge. Of course, being a crude approximation, a lot of the rules they propose could be wrong. But humans could sift through these proposed rules and select the useful ones to add to the system. It might be easier to supply humans with a first stab at such rules, generated by crude methods, rather than having the humans write such rules from scratch.

But all these are heuristic additions. For the kinds of systems that aim at precise understanding of NL, there is no substitute to feeding the computer with high-quality knowledge.

The current state in NLP and why we need more people to work on precise understanding

Almost all people in NLP today are taking the engineering approach that I mentioned before. They are trying to first harvest the most they can from the NL input with the simplest statistical methods possible, and only when they hit a limit, they try to make the methods a bit more sophisticated. They are also interested in producing the kinds of applications that can rely on using only the statistical heuristics.
I think that the field of NLP would benefit from more work on precise understanding. There are several reasons.

First, I think it is really interesting and fun to work on precise understanding, and it is really rewarding when you feel the system really understands what you say and what you want from it (especially because computer applications today, not only in NLP, are pretty bad at being intuitive to humans and really understanding what the user wants).

Second, there are many practical applications that could already benefit from precise understanding. One example was mentioned above – dialogue systems on specific domains, such as travel reservations, banking transactions, and technical support, where the computer does not just ask you “say the name of the destination airport” but where you could say as much or as little as you want, and the computer would ask you what it needs to know. (I mention here a textual interface because speech recognition today is not accurate enough and introduces noise into the system. Lots of people today are happy chatting with each other using internet messenger applications, so they would be thrilled to have a travel reservation system they could chat with — that would be so much better than the current situation where people need to poke around various airline and hotel reservation websites.)

Another example is natural language interfaces to databases. Today there are tens of thousands of people who interact with database user-interfaces, but these interfaces are crude forms that do not allow to specify more complex queries. A system that would allow the user to type a short natural language question and would provide precise answers would be invaluable. More generally, there are today some knowledge bases that model a specific domain with high-quality and completeness, but that knowledge is accessible only to trained experts because the interfaces are very technical. A precise natural language interface to such KBs would allow more people access to these sources. As long as statistical techniques for “broad coverage” cannot supply precise and high-quality enough understanding in particular domains of interest, we also need to have research on high-quality precise understanders.

Finally, a research field is healthy the more different directions of research are pursued. Statistical NLP techniques are very useful for many practical applications, but they currently target “broad coverage” rather than very high-precision deep understanding. Research on precise understanding could contribute to the necessary long-term basic research by providing proof-of-concept implementations that understand more sophisticated NL phenomena. In fact, this is what has already happened. Current “board coverage” statistical parsers are possible thanks to 40 years of linguistic research on syntax that occurred before 1990. That research allowed some level of consensus to exist regarding what to put in the 300-page document that specified the annotation scheme for the Penn Treebank.

Related

Logic Puzzles: A New Test-Suite for Compositional Semantics and Reasoning
First chapter of my dissertation.