Here is a very brief overview of a system that can read a logic puzzle text of the sort shown here, and correctly solve it. The code for this system can be found on GitHub.
The main parts
The system has two main tasks:
- Understanding the text
- Solving the puzzle
As mentioned in the previous post, the first task is easy for human readers but hard for computers, while the reverse is true for the second part.
What does it actually mean to “understand the text”? This is a complex question in the philosophy of language, but in the context of our system, it means: computing a precise representation of the information conveyed by the text. The representation is given using some formal language, in which expressions have clear and unambiguous meaning. For example, the truth conditions of the third constraint in the puzzle shown here may be expressed using this first-order logic (FOL) formula:
∀x. [(Room(x) ∧ Exhibited_in(e,x) ∧ Exhibited_in(f,x)) → ¬∃y. [Sculpture(y) ∧ y≠e ∧ y≠f ∧ Exhibited_in(y,x)]]
One of the main challenges we face is that we want to compute such well-defined representations to capture the meaning of natural language expressions that are often ambiguous or incomplete. Thus, for sentences that have some ambiguity, more than one meaning representation should be computed.
Understanding the text
The first task starts by pre-processing the text to identify the preamble, the questions, and each question’s list of choices. Then, each preamble sentence is fed to the NLAnalyzer, which does the entire linguistic and logical analysis of the sentence, and outputs a FOL formula which expresses the truth conditions of the sentence (more than one formula are computed for ambiguous sentences). The assumption in each question, as well as each answer choice, are also processed in a similar manner.
The NLAnalyzer is the major part of the system, and the majority of the posts here will go into a lot of details about it. The main stages of the analyzer are:
- Tokenization
- Morphological analysis
- Syntactic analysis
- Semantic analysis
- Pragmatic analysis
- Logical analysis
Additionally, each type of linguistic phenomenon will require a separate discussion on how it should be handled on one or more of these levels. These phenomena include: events, quantifiers, anaphora, ellipsis, comparative expressions, plurality, reciprocals, and many others.
Solving the puzzle
The goal of this part is to determine for each question which of its answer choices is correct given the assumptions in the preamble and in the question. The output shown to the user is the correct answer for each question, together with their justifications: a linguistic and logical analysis showing how the system reached these conclusions.
If all parts of the input puzzle text (i.e. the preamble, questions, and answer choices) are unambiguous, we get a finite a set S of FOL formulas that encode the main constraints of the puzzle. For each question i, we get a formula Qi that encodes the question’s additional assumption, and a formula Cij (1 ≤ j ≤ 5) that encodes choice j of question i. Most questions have the form: “If <additional assumption>, which of the following statements <instruction>?” where <instruction> has several possibilities, shown in the left column:
Instruction | Find the unique choice j in [1..5] such that: |
must be true / cannot be false | S, Qi ⊨ Cij |
must be false / cannot be true | S, Qi ⊨ ¬Cij |
may be true / not necessarily false | S, Qi ⊭ ¬Cij |
may be false / not necessarily true | S, Qi ⊭ Cij |
The symbol ⊨ means logical entailment in FOL (and ⊭ means: the consequence is not logically entailed by the assumptions). Such logical entailment, or lack of it, can be determined by feeding the formulas to a FOL Theorem Prover. While FOL, in the general case, is undecidable, logic puzzles of the sort we handle here always deal with a small finite set of objects and relations between them (these puzzles are essentially small Constrain Satisfaction Problems). Therefore, a FOL Theorem Prover should always terminate with an answer (either “follows” or “does not follow”).
For each question, exactly one answer choice is correct. So e.g., if question i is “which of the following must be true”, the system needs to find the unique j such that S, Qi ⊨ Cij. If this holds true for no choice, or for more than one choice, something went wrong during the process of understanding the preamble, the question, and/or one or more of the answer choices.
If one or more of the input pieces are ambiguous and produce more than one FOL formula, then all the possible combinations should be considered, until one is found which yields exactly one correct answer choice for each question.
It should also be noted that sometimes not all the information required for solving a logic puzzle is explicitly mentioned in the text. For example, the puzzle here does not mention that a sculpture may not be exhibited in more than one room (simultaneously). Human readers know this from their general world knowledge that a physical object cannot be in more than one location at the same time, but computers lack this knowledge.
In general, representing and acquiring such world knowledge is a very hard problem in artificial intelligence (AI). However, in the case of logic puzzles, this kind of knowledge can be recast as mathematical properties of the relations mentioned in the puzzle. For instance, the unique location constraint on sculptures is equivalent to constraining the mapping from sculptures to rooms to be injective (one-to-one). The computer could systematically search for such implicit constraints that, in combination with the explicitly stated constraints, yield exactly one answer per question.
More information about the puzzle solver will be given in a separate post (I’ll update the link here once it’s ready).
Questions we need to address
To create the system, we need to address the following questions, which are studied in the fields of Computational Semantics and Knowledge Representation and Reasoning:
- What meaning representation language (MRL) should we use to express the meaning of the NL texts?
- In the context of our application, what do particular NL sentences mean? How should their meaning be represented in the MRL?
- How can this representation be calculated from the meaning of functional terms and the morphological and syntactic analysis of the text?
- What algorithms should be used?
- How should we handle ambiguity?
- How can the MRL support the kinds of inference we need for our application?
- How do we fill in the gap of background knowledge that is assumed by the text and is not explicitly stated in it?