Automatic Understanding of Natural Language Logic Puzzles

[This page is a draft. Please help me improve it by sending me comments.]


Logic Puzzle… They are perhaps the most dreaded parts of the GRE and LSAT exams. We can easily understand what these texts mean, the situations and constraints they describe, and the ensuing questions. But actually solving them? Well, that’s a completely different matter! It’s hard for us humans to think in our heads about all the possibilities and constraints, especially under the exam’s time constraints. A few example puzzles are shown here.

But for a computer, it seems these puzzles should be pretty easy. With its capacity for executing millions of operations per second, a computer could breeze through all possibilities and find the solution in a fraction of a second, right?

Well, not so fast. While a computer can definitely solve such puzzles very easily from a formalized version of the puzzles, the problem is that computers have no direct understanding of human languages (like English or Spanish). So reading puzzle texts, understanding what they mean, and automatically translating them to a formalized version to be solved – that’s actually a very big challenge. In fact, this sort of task is beyond the state-of-the-art in the field of Natural Language Understanding.

Why is it so difficult? The main reason is that there is no simple one-to-one correspondence between the puzzle texts and their formalization. A computer system that can automatically translate a puzzle text to its formalized version must rely on intricate linguistic knowledge of syntax and semantics. While part of this knowledge has already been developed during that last few decades in fields such as linguistic semantics and computational semantics, much work remains to be done on extending the scope of this knowledge to support the large variety of linguistic phenomena that appear in puzzle texts.

A big challenge? Great! That’s exactly what we are going to tackle, head on, in this project: Understanding Natural Language Logic Puzzles (UNLLP). The goal of this project is to create a computer system that can understand Natural Language texts of logic puzzles and correctly solve them. This effort originated in my dissertation, and I recently started re-implementing the system with some simplifications and posting its code as open source – you can find it on GitHub. I would like to invite you to join me on this fascinating and fun journey of expanding the abilities of this system, and thereby also expanding our scientific and technical knowledge of the structural semantics of natural language.


Why am I doing this? The overall motivation is to advance the state-of-the-art in Natural Language Understanding, specifically pertaining to the development of the required knowledge of Structural Semantics (aka “Formal Semantics” or “Compositional Semantics”). This is the branch of linguistics which, roughly speaking, deals with the literal meaning of sentences after abstracting away the specific concepts that non-functional terms express. It specifies how the structural meaning of a sentence is related to the syntactic structure of the sentence and the meaning of functional expressions. See this post for a longer explanation.

Why is this important? See this post for a survey of why Structural Semantics is needed for NLU applications, and this paper for more about why working on logic puzzles is particularly suited for pushing the envelope of this knowledge. For a broader motivation regarding precise understanding of NL texts, see this post.


So now, after the preliminaries, let’s get started! Here is a general outline of the posts about the system. These posts explain step by step how to create a system for solving NL logic puzzles. They also explain about the system’s code on GitHub. (The links below will be updated once the posts are ready.)

What is a NL Logic Puzzle?

The kind of Logic Puzzle texts we’ll be working on consists of two parts:

  • A preamble, which is a series of assumptions about relationships between several entities (such as people, objects, locations, and times)
  • Multiple-choice questions, each adding some additional assumption, and usually asking which of the given choices must be true, i.e. can be logically inferred from the assumptions.

Sometimes a question asks which of the choices may be true, or must/may be false, or which choice describes a complete list of objects that satisfy some conditions, etc.

Here is an example of a logic puzzle text (only the first two questions are shown):

Six sculptures -- C, D, E, F, G, and H -- are to be exhibited in rooms 1, 2, and 3 of an art gallery.

* Sculptures C and E may not be exhibited in the same room.
* Sculptures D and G must be exhibited in the same room.
* If sculptures E and F are exhibited in the same room, no other sculpture may be exhibited in that room.
* At least one sculpture must be exhibited in each room, and no more than three sculptures may be exhibited in any room.

1. If sculpture G is exhibited in room 2 and sculpture E is exhibited in room 3, which of the following MUST be true?
  (A) Sculpture C is exhibited in room 1.
  (B) No more than two sculptures are exhibited in room 3.
  (C) Sculptures F and H are exhibited in the same room.
  (D) Three sculptures are exhibited in room 2.
  (E) Sculpture H is exhibited in room 3.

2. If sculptures C and G are exhibited in room 1, which of the following may NOT be a complete list of the sculpture(s) exhibited in room 2?
  (A) Sculpture D
  (B) Sculpture E
  (C) Sculpture F
  (D) Sculptures E and H
  (E) Sculptures F and H

Source: Karl Weber, "The Unofficial Guide to the GRE Test", 2000 edition, ARCO Publishing.

System Overview

Here is a very brief overview of a system that can read a logic puzzle text of the sort shown above, translate it to a logical representation of the information conveyed by the text, and solve the puzzle using this representation. The code for this system can be found on GitHub.

First, the text is pre-processed to identify the preamble, the questions, and each question’s list of choices. Then, each preamble sentence is fed to the NLAnalyzer, which does the entire linguistic and logical analysis of the sentence, and outputs a FOL formula which expresses the truth conditions of the sentence (if the sentence is ambiguous, more than one such formula is produced). For example, the third constraint in the puzzle above gets this FOL formula:

∀x. [(Room(x) ∧ Exhibited_in(e,x) ∧ Exhibited_in(f,x)) →
     ¬∃y. [Sculpture(y) ∧ y≠e ∧ y≠f ∧ Exhibited_in(y,x)]]

The assumption in each question, as well as each answer choice, are also processed in a similar manner. Finally, a a FOL Theorem Prover is used to determine, for each question, which of the answer choices is correct given the assumptions in the preamble and in the question. The answers, together with a complete linguistic and logical analysis, are shown to the user.

The NLAnalyzer

This is the major part of the system, and the majority of the posts here will go into a lot of details about it. The main stages of the analyzer are:

  • Tokenization
  • Morphological analysis
  • Syntactic analysis
  • Semantic analysis
  • Pragmatic analysis
  • Logical analysis

Additionally, each type of linguistic phenomenon will require a separate discussion on how it should be handled on one or more of these levels. These phenomena include: events, quantifiers, anaphora, ellipsis, comparative expressions, plurality, reciprocals, and many others.

The Puzzle Solver

If all the texts (the preamble, questions, and answer choices) are unambiguous, we get a finite a set S of FOL formulas that encode the main constraints of the puzzle. For each question i, we get a formula Qi that encodes the question’s additional assumption, and a formula Cij (1 ≤ j ≤ 5) that encodes choice j of question i. Most questions have the form: “If <additional assumption>, which of the following statements <instruction>?” where <instruction> has several possibilities, shown in the left column:

Instruction Find the unique choice j in [1..5] such that:
must be true / cannot be false S, QiCij
must be false / cannot be true S, Qi ⊨ ¬Cij
may be true / not necessarily false S, Qi ⊭ ¬Cij
may be false / not necessarily true S, QiCij

The symbol ⊨ means logical entailment in FOL (and ⊭ means: the consequence is not logically entailed by the assumptions). Such logical entailment, or lack of it, can be determined by feeding the formulas to a FOL Theorem Prover. While FOL, in the general case, is undecidable, logic puzzles of the sort we are dealing with here always deal with a small finite set of objects and relations between them (these puzzles are essentially small Constrain Satisfaction Problems). Therefore, a FOL Theorem Prover should always terminate with an answer (either “follows” or “dos not follow”).

For each question, exactly one answer choice is correct. So e.g., if the question is “which of the following must be true”, the system needs to find the unique j such that S, QiCij. If this holds true for no choice, or for more than one choice, something went wrong during the process of understanding the preamble, the question, and/or one or more of the answer choices.

If one or more of the input sentences are ambiguous (produce more than one FOL formula), then all the possible combinations should be considered, until one is found which yields exactly one correct answer choice for each question.

It should also be mentioned that sometimes not all the information required for solving a logic puzzle is explicitly mentioned in the text. For example, the puzzle above does not mention that a sculpture may not be exhibited in more than one room (simultaneously). Human readers know this from their general world knowlege that a physical object cannot be in more than one location at the same time.

While In general, representing and acquiring such world knowledge is a very difficult AI
problem, in the case of logic puzzles, this kind of knowledge can often be recast as mathematical properties of the relations mentioned in the puzzle. For instance, the unique location constraint on sculptures is equivalent to constraining the mapping from sculptures to rooms to be injective (one-to-one). The computer could systematically search for such implicit constraints that, in combination with the explicitly stated constraints, yield exactly one answer per question.

More information about the puzzle solver will be given in a separate post (I’ll update the link here once it’s ready).


To create the system, we need to address the following questions, which are studied in the fields of Computational Semantics and Knowledge Representation and Reasoning:

  • What meaning representation language (MRL) should we use to express the meaning of NL texts?
  • In the context of a particular NLU application that interests us, what do particular NL sentences mean? How should their meaning be represented in the MRL?
  • How can this representation be calculated from the meaning of functional terms and the morphological and syntactic analysis of the text?
  • What algorithms should be used?
  • How should we handle ambiguity?
  • How can the MRL support the kinds of inference we need for the application?
  • How do we fill in the gap of background knowledge that is assumed by the text and is not explicitly stated in it?

To be continued…