Design a system that supports the following operations:
- A user can create a new post and attach one or more hashtags to it.
- A user can search for posts using a boolean expression over hashtags, such as
apple AND banana. - Search results should be updated in near real time when new posts are added.
Functional requirements:
- When a user posts new content, hashtags should be indexed quickly.
- When a user searches, the system should return matching posts with low latency.
- New hashtags and new posts should become searchable as soon as possible.
Non-functional requirements:
- Low latency for reads and writes.
- High availability.
- Scalability for spikes in traffic and data volume.
API design:
- Create a new post:
POST /posts/{post_id}with the post content in the request body. - Search by hashtag expression:
GET /posts?query=<boolean expression> - Example:
GET /posts?query=<apple AND banana>
Design the system architecture, data model, and indexing strategy to support these requirements.
This problem is best approached as a real-time hashtag indexing and retrieval system. Write requests go through a post service and message queue so posts can be stored and indexed asynchronously, while reads use an inverted index to evaluate boolean hashtag queries efficiently. To handle scale and availability, split the post store and hashtag-to-post mapping across shards, use hash-based partitioning for more even distribution, and isolate hot hashtags with dynamic sharding when needed. A monitoring layer can detect popular keys and update the shard lookup table so new content becomes searchable quickly without overloading any single database shard.