New passages from an old book.

There’s nothing as bittersweet as finishing a really great book. What do you do when you complete a great series and are left yearning for more?
If you are like me, you train a machine to read the books and let it generate new (unseen) passages from the book!

About the book

Let me introduce you to my favourite book - The Hitchhiker’s Guide to the Galaxy.

A sci-fi comedy unlike no other, a trilogy in five parts.
The book is smart, funny, well-written and full of wonderful commentary on the human condition.
Plus, the book has some of the best quotes EVER!
Why would you not read it?

Generating new passages

It is trained on the first 2 books - The Hitchhiker’s Guide to the Galaxy and The Restaurant at the End of the Universe
Once I obtained the text version of the books, I divided the text into individual sentences.
Each sentence was then subdivided into words (dealing with punctuation as well)
I used the word2vec algorithm to generate word-embeddings for each word in the text’s vocabulary.
(Word embeddings are numerical representations of contextual similarities between words)
Generating the dataset:
- Iterate through the entire text and divide it into overlapping, semi-redundant chunks of 31 words each. The first 30 words act as X (data) while the last word serves as the Y
Train an LSTM model using the X, Y datasets constructed earlier. (Note that the last layer of the LSTM model should be a softmax layer with the same size as the total number of words). Essentially, the model outputs probabilities of the next word given a sentence
Once trained, generate a random string of exactly 30 words (preferably from the text itself) and pass it to the LSTM model, select a random word according to the probabilities given by the model. And repeat!

Example: Consider the text:
“Orbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.” The X,Y pairs generated from this text are:

X	Y
“Orbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.”	“digital”
“this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital”	“watches”
“at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches”	“are”
…	…
“roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat”	“idea”

Usage

Visit http://hg2g-42.herokuapp.com/ (it’s pretty much self-explanatory)
- The top section displays a random one-liner from the original trilogy (of 5 parts).
- You can enter some seed text that my model can use as a starting point (Leave it empty for an arbitrary starting sentence).
- The length of the generated text can also be varied (between 5-75 words).
Since the sampling mechanism is random, you will get different (but similar) results even for the same seed text.
The algorithm works only on lowercase letters for simplicity. So, the case of the seed text is irrelevant

Some good starting points (Found after hours of experimentation ;p)

A nice cup …
Once again, Marvin was …
The heart of gold …
The mice …
The total perspective vortex …
The Sirius cybernetics corporation …

New passages from an old book.

About the book

Generating new passages

Usage

Contents

Trending Tags