This application was made for the Johns Hopkins Coursera Data Science Specialization Capstone Project.

Instructions

Click the Application button at the top of the screen, then type a phrase in the box and wait a moment. The application will run the algorithm using your phrase as an input and attempt to guess the next word.

Note: The app may require a few seconds to load up initially, but each calculation should be fairly swift.

Background

The algorithm uses a generative n-gram model called Katz Backoff, which creates conditional probabilities for possible phrase-terminating words, in conjunction with the Good-Turing frequency estimation, which is used to estimate the likelihood that the desired terminating word doesn’t exist in the dataset.

To view the code, visit my GitHub Repository. For a more detailed explanation of how the model works, you can find the presentation pitch on RPubs.