You will design and build a word predictor. This word predictor will take some text as training data.
You need to provide an API which accepts a word as input and then outputs the most likely next word based on the training data.
The prediction model can be constructed with various different heuristics, but initially it should be built based on bigram frequency and should optimize for the fastest prediction time.
Examples:
Training data:
[["I", "am", "Sam"],
["Sam", "I", "am"],
["I", "like", "green", "eggs", "and", "ham"],
["My", "friends", "like", "green", "shirts"]
]
Predictions:
predict("Sam") => "I"
predict("I") => "am"
predict("like") => "green"
This problem asks you to build a next-word predictor from training text. The standard approach is to preprocess bigram frequencies with a hash map from each word to its most likely successor, so prediction queries can be answered very quickly. The key tradeoff is spending time during training to make lookup fast later, which matches the requirement to optimize prediction latency.