Daniel Jurafsky and James H. Martin are writing Speech and Language Processing (3rd ed.) Many draft chapters are released online. The 2nd edition of this textbook was released in 2008, and is an introductory text in the field of computational linguistics.

I’d like to summarize the current paradigms for dialog systems presented in Ch. 29: Dialog Systems and Chatbots.

Dialog System Paradigms

Jurafsky splits conversational agents/dialog systems into 2 functional categories:

  1. Task-Oriented Dialog Agents
  2. Chatbots

Task-Oriented dialog agents process and create natural language to accomplish a specific goal, while chatbots are typically designed to carry on conversations with humans in an unregulated manner.

Chatbots

There are several strategies which chatbots commonly employ to generate natural language. They tend to do very little conversation modeling; rather, they use certain strategies to map input to immediate output without “understanding” the output.

Rule-Based

Rule-based chatbots rely on transformation rules and an input sequence to create output. One influential rule-based chatbot is the ELIZA system (Weizenbaum 1966). Here is the Wikipedia Link.

The ELIZA system worked in a psychological domain, ranking the features of the single human’s conversant’s input according rules. It then transformed the highest-ranked input feature or features according to another ruleset, creating output.

Corpus-Based

Corpus-based chatbots rely on large bodies of human generated text rather than transformation rules. They have two main groups.

Information Retrieval

Running with access to a large corpus of human-generated conversation, information retrieval (IR) chatbots locate an appropriate response in their corpus based on similarity scores to the human input, or turn. Cosine similarity is often used to calculate similarity, and corpora can be internet discussions, movie scripts, or the human turns of past chatbot conversations.

Each statement in a conversation is referred to as a turn. Dialog systems may be able to handle only a 2-turn conversation, where the human and the system each get 1 turn. Or they may be able to handle multiple turn conversations.

Supervised Machine Learning

Supervised machine learning chatbots are more complex than IR systems. They were derived from machine translation systems, but have been modified to create new text, rather than recreate the input text.
Seq2Seq models for response generation use sequence transduction to create output. Little detail regarding implementation is given. However, this book has an as yet unreleased chapter on seq2seq models. This leaves the following papers as reference:

Task-Oriented Dialog Agents

The previous dialog systems generate responses with the goal of continuing a conversation. Task-oriented natural language generation systems accomplish a designated purpose in a designated domain. This allows them to include more world-knowledge, but they are less flexible in converation.

Frames, Slots, & Values

Task-oriented dialog systems can book flights, set an alarm clock, or search Google. Since the domain and task are set beforehand, the system designer can create a frame which models all the data necessary for the system to take an action on behalf of the speaker. The frame has several slots, for which values must be obtained from the human speaker.

Slot Data Type Value
ORIGIN CITY city ””
DESTINATION CITY city ””
DEPARTURE TIME time ””
DEPARTURE DATE date ””
ARRIVAL TIME time ””
ARRIVAL DATE date ””

Adapted from Ch. 29: Dialog Systems and Chatbots in Daniel Jurafsky and James H. Martin. 2017. Speech and Language Processing. Draft, 3rd edition.

Each frame is specific to a conversation domain, and each value is obtained from a human speaker. Task-oriented systems can maintain control of the conversation entirely, or allow the user to assume initiative, filling in the acquired data through semantic parsing. Semantic parsing results in a hierarchical tree of a human-generated sentence. Sequence models can also be used to map user speech to slots in a frame.

Context-Free Grammar

A context-free grammar (CFG) (see Ch. 12) is a set of rules for a “language” whereby all the possible statements in that language can be formed. Language is not used here to mean an entire natural language, such as Hungarian or Esperanto, though it could be. A language could be defined for all the text in a children’s book, or a single conversation between two friends. The scope of a CFG is arbitrary. CFGs can be used to parse the conversation of a frame-based dialog system, which is within a given domain. They can also be probabilistically defined with respect to vocabulary used, resulting in a probabilistic context-free grammar (PCFG).

I am thinking that the PCFG concept would allow characterization of a twitter user’s corpus, which could allow a frame-based dialog system to emerge based on the frequencies of sentence constructions and vocabulary semantic types (from WordNet, etc.).

This idea needs more thought, but it is a direction. I’ll start with Jurafsky’s Chapter 12: Syntactic Parsing.

References

Joseph Weizenbaum. 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1):36-45.

Oriol Vinyals and Quoc Le. 2015. A neural conversational model. In Proceedings of ICML Deep Learning Workshop, Lille, France

Lifeng Shang, Zhengdong Lu, Hang Li. 2015. Neural responding machine for short-text conversation. In ACL 2015, pages 1577–1586.

Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, Bill Dolan. 2015. A neural network approach to context-sensitive generation of conversational responses. In NAACL HLT 2015, pages 196–205.

Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao and Dan Jurafsky. 2016. Deep reinforcement learning for dialogue generation. In EMNLP 2016, pages 1192-1202.

Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter and Dan Jurafsky. 2017. Adversarial learning for neural dialogue generation. In EMNLP 2017, pages 2147-2159.

Ryan Lowe, Nissan Pow, Iulian Vlad Serban, Laurent Charlin, Chia-Wei Liu, Joelle Pineau. 2017. Training end-to-end dialogue systems with the ubuntu dialogue corpus. Dialogue & Discourse, 8(1):31–65.