As I mentioned last post, reading textbooks and papers have been my main focus for the last two weeks, and I believe that I’ve settled on a basic strategy on which to base the tweet generation system.

Basic Architecture

Creation of a context-free grammar (CFG) for the entire available corpus of tweets from the @realDonaldTrump twitter account should be possible through a combination of syntactic parsing (either constituency parsing or dependency parsing) and a script for creating rule syntax from the output of the tagging package. CFG rules look like the Python string below.

From the NLTK Grammar Parsing HOWTO guide

"""
S -> NP VP
PP -> P NP
NP -> Det N | NP PP
VP -> V NP | VP PP
Det -> 'a' | 'the'
N -> 'dog' | 'cat'
V -> 'chased' | 'sat'
P -> 'on' | 'in'
"""

Once a CFG is created, sentences can be programmatically generated from the rules. Since tweets are 150 characters max, a single sentence should be sufficient material to create a plausible tweet in most cases. Of course, some of the sentences will be nonsensical, or otherwise out of line with the idiosyncrasies of the corpus. A Markov, or N-gram, model of the corpus can be used to rate each sentence generated from the CFG.

Details

The rating function based on the N-gram model will have to be derived, and the type of N-gram model will have to be chosen (N = 1,2,3,..,n). Tuning of the optimal required threshold for acceptance of a generated sentence will be required to balance variation with imitation.

Improvements

While a CFG and rating function based on an N-gram model seems like a great starting point, the ability to influence on the topic of generated text would be a really great feature.

Semantic role labeling has some promise in this area. It seems like a sentence could be parsed for topics just like I am proposing for phrases. If this is true, I could construct an analogous system as above but for semantic roles. Another approach would be including the semantic roles in the rating function. Either way, there is some room for “reach” features here.

Proposal

At this point, I will finish the written proposal with my advisor, and start the approval process.