Google VO 面试真题解析:Word Predictor(Bigram 频率预测)

19次阅读
没有评论

You will design and build a word predictor. This word predictor will take some text as training data.

You need to provide an API which accepts a word as input and then outputs the most likely next word based on the training data.

The prediction model can be constructed with various different heuristics, but initially it should be built based on bigram frequency and should optimize for the fastest prediction time.

Examples:

Training data:

[["I", "am", "Sam"],
  ["Sam", "I", "am"],
  ["I", "like", "green", "eggs", "and", "ham"],
  ["My", "friends", "like", "green", "shirts"]
]

Predictions:

predict("Sam") => "I"
predict("I") => "am"
predict("like") => "green"

这道题要求根据训练文本构建一个“下一个词预测器”,核心思路是统计 bigram(相邻两个词)的出现频率,并在查询时快速返回给定单词后最可能出现的下一个词。实现上通常会用哈希表把“当前词 -> 后继词频次表”预处理出来,这样训练阶段完成后,预测可以在接近 O(1) 的时间内完成;如果有并列频次,还需要按照题目要求或预设规则做稳定选择。

正文完
 0