McKinsey / VO 面试真题解析:Data Cleaning 字符串博弈

17次阅读
没有评论

In a data cleaning pipeline, engineers optimize text datasets by alternately removing substrings: Alex removes substrings with an odd number of vowels in the first step, followed by Chris removing substrings with an even number of vowels in the next step.

Alex always goes first, then the engineers take turns optimally until no valid substring remains. The goal is to determine which engineer removes the last substring.

Implement a function optimizeDataCleaning to determine the outcome of the cleaning process for each dataset, assuming alternating and optimal cleaning.

The function optimizeDataCleaning takes the following input:

  • string datasets[n]: each string represents a dataset

The function should return an array of strings specifying the result of the cleaning process for each dataset.

Note: Vowels are 'a', 'e', 'i', 'o', and 'u'.

Example

Given n = 2 and datasets = ["git", "dry"]

这道题本质上是一个字符串博弈:Alex 只能删除“元音数为奇数”的子串,Chris 只能删除“元音数为偶数”的子串,而且双方都采取最优策略,判断最终是谁删掉最后一个合法子串。解题时通常不需要真的模拟每一步删除,而是先分析哪些子串在当前规则下可删、谁能形成必胜态,再据此推导最终结果。关键点是字符串中元音分布对可操作子串的影响,并用博弈思维或数学归纳快速判断每个数据集的胜负。

正文完
 0