Amazon is developing a regex matching library for NLP use cases. A prototype regex matching implementation has the following requirements:
- The regex expression contains lowercase English letters,
.,(,), and*. .matches exactly one lowercase English letter.- A regular expression followed by
*matches zero or more occurrences of the regular expression. - If an expression is enclosed in parentheses
(and), it is treated as one expression and any*applies to the whole expression. - It is guaranteed that there is no nested parentheses in the given regex for the prototype.
For example:
- Regex
(ab)*dmatches stringsd,ababd,abd, but notabbd,abab. - Regex
ab*cdmatches stringsabbcd,ad,abd, but notabbd. - Regex
(a.b)*matches stringsa,aa,aaa,b,bbb, and many more, but notac,and, orbcd.
Given an array of strings, for each string determine whether the given regex matches the string or not, and report an array of strings "YES" or "NO" respectively.
This Amazon OA problem asks you to match multiple strings against a simplified regex that supports lowercase letters, wildcard <code>.</code>, repetition with <code>*</code>, and non-nested parenthesized groups. The key is to parse the pattern into repeatable units and then determine whether each target string can be fully matched. A dynamic programming approach is natural here, especially when handling zero-or-more repetition and grouped subexpressions correctly.