Design a system that supports the following operations:
- A user can create a new post and attach one or more hashtags to it.
- A user can search for posts using a boolean expression over hashtags, such as
apple AND banana. - Search results should be updated in near real time when new posts are added.
Functional requirements:
- When a user posts new content, hashtags should be indexed quickly.
- When a user searches, the system should return matching posts with low latency.
- New hashtags and new posts should become searchable as soon as possible.
Non-functional requirements:
- Low latency for reads and writes.
- High availability.
- Scalability for spikes in traffic and data volume.
API design:
- Create a new post:
POST /posts/{post_id}with the post content in the request body. - Search by hashtag expression:
GET /posts?query=<boolean expression> - Example:
GET /posts?query=<apple AND banana>
Design the system architecture, data model, and indexing strategy to support these requirements.
这道题的核心是为“发帖 + 标签搜索 + 近实时更新”设计一个高可用、可扩展的检索系统。写入路径上,帖子先进入帖子服务,再通过消息队列异步落库和建立索引;读路径上,搜索服务根据布尔表达式拆解出 hashtag,利用倒排索引快速取回候选 post id,再做集合运算得到最终结果。为了支撑大规模数据和流量,可以将帖子表与标签索引拆分存储,并按 hashtag 进行分片;同时结合哈希分片减少天然倾斜,再对热点标签做动态分片或单独隔离,避免单个 shard 过热。监控系统负责热点 key,更新路由 / 查找表,使新帖子尽快可搜、查询延迟保持稳定。