Implementation Phase

Spring 2018 will be the implementation period for this M.S. Project. To recap, I’ll be developing a text generation system based on a Context-Free Grammar (CFG), generated from a publicly available corpus of tweets.

System Components

Twitter API Interface – download tweet corpus
Constituency Parser – Stanford CoreNLP will probably do the trick.
CFG Production Generator – This creates “rules” for legal speech
Quasi-Random Sentence Generator – Create sentences
Markov Probability Function – Make sure the tweet sounds like the corpus

Issues

The Stanford CoreNLP library can do constituency parsing, but it’s written in Java, so it’ll have to be interfaced with Python.
Since productions in a CFG can be defined recursively, I’ll need to write an algorithm to ensure we exit the Generator.
The Markov probability function will need tuning. A machine learning classifier will be investigated if time permits.

I’ll be starting with finding a constituency parser and going down the list. The Twitter API interface should be straight forward, so that’ll be last.