Jurafsky Textbook Ideas
Daniel Jurafsky and James H. Martin are writing Speech and Language Processing (3rd ed.) Many draft chapters are released online. The 2nd edition of this textbook was released in 2008, and is an introductory text in the field of computational linguistics.
I’d like to summarize the current paradigms for dialog systems presented in Ch. 29: Dialog Systems and Chatbots.
Dialog System Paradigms
Jurafsky splits conversational agents/dialog systems into 2 functional categories:
- Task-Oriented Dialog Agents
- Chatbots
Task-Oriented dialog agents process and create natural language to accomplish a specific goal, while chatbots are typically designed to carry on conversations with humans in an unregulated manner.
Chatbots
There are several strategies which chatbots commonly employ to generate natural language. They tend to do very little conversation modeling; rather, they use certain strategies to map input to immediate output without “understanding” the output.
Rule-Based
Rule-based chatbots rely on transformation rules and an input sequence to create output. One influential rule-based chatbot is the ELIZA system (Weizenbaum 1966). Here is the Wikipedia Link.
The ELIZA system worked in a psychological domain, ranking the features of the single human’s conversant’s input according rules. It then transformed the highest-ranked input feature or features according to another ruleset, creating output.
Corpus-Based
Corpus-based chatbots rely on large bodies of human generated text rather than transformation rules. They have two main groups.
Information Retrieval
Running with access to a large corpus of human-generated conversation, information retrieval (IR) chatbots locate an appropriate response in their corpus based on similarity scores to the human input, or turn. Cosine similarity is often used to calculate similarity, and corpora can be internet discussions, movie scripts, or the human turns of past chatbot conversations.
Each statement in a conversation is referred to as a turn. Dialog systems may be able to handle only a 2-turn conversation, where the human and the system each get 1 turn. Or they may be able to handle multiple turn conversations.
Supervised Machine Learning
Supervised machine learning chatbots are more complex than IR systems. They were derived from machine translation systems, but have been modified to create new text, rather than recreate the input text.
Seq2Seq models for response generation use sequence transduction to create output. Little detail regarding implementation is given. However, this book has an as yet unreleased chapter on seq2seq models. This leaves the following papers as reference:
- Vinyals, et al., 2015
- Shang et al., 2015
- Sordoni et al., 2015
- Li et al., 2016,
- Li et al., 2017
- Lowe et al., 2017
Task-Oriented Dialog Agents
The previous dialog systems generate responses with the goal of continuing a conversation. Task-oriented natural language generation systems accomplish a designated purpose in a designated domain. This allows them to include more world-knowledge, but they are less flexible in converation.
Frames, Slots, & Values
Task-oriented dialog systems can book flights, set an alarm clock, or search Google. Since the domain and task are set beforehand, the system designer can create a frame which models all the data necessary for the system to take an action on behalf of the speaker. The frame has several slots, for which values must be obtained from the human speaker.
Slot | Data Type | Value |
---|---|---|
ORIGIN CITY | city | ”” |
DESTINATION CITY | city | ”” |
DEPARTURE TIME | time | ”” |
DEPARTURE DATE | date | ”” |
ARRIVAL TIME | time | ”” |
ARRIVAL DATE | date | ”” |
Adapted from Ch. 29: Dialog Systems and Chatbots in Daniel Jurafsky and James H. Martin. 2017. Speech and Language Processing. Draft, 3rd edition.
Each frame is specific to a conversation domain, and each value is obtained from a human speaker. Task-oriented systems can maintain control of the conversation entirely, or allow the user to assume initiative, filling in the acquired data through semantic parsing. Semantic parsing results in a hierarchical tree of a human-generated sentence. Sequence models can also be used to map user speech to slots in a frame.
Context-Free Grammar
A context-free grammar (CFG) (see Ch. 12) is a set of rules for a “language” whereby all the possible statements in that language can be formed. Language is not used here to mean an entire natural language, such as Hungarian or Esperanto, though it could be. A language could be defined for all the text in a children’s book, or a single conversation between two friends. The scope of a CFG is arbitrary. CFGs can be used to parse the conversation of a frame-based dialog system, which is within a given domain. They can also be probabilistically defined with respect to vocabulary used, resulting in a probabilistic context-free grammar (PCFG).
I am thinking that the PCFG concept would allow characterization of a twitter user’s corpus, which could allow a frame-based dialog system to emerge based on the frequencies of sentence constructions and vocabulary semantic types (from WordNet, etc.).
This idea needs more thought, but it is a direction. I’ll start with Jurafsky’s Chapter 12: Syntactic Parsing.