System Design Cheetsheet
templates

Newsfeed

Overview

Requirements

Functional Requirements

  • Post multimedia content (text, image)
  • Browse the feed
  • Interact with content (like, comment)
  • Follow and unfollow users
  • Search for users and hashtags

Non-Functional Requirements

  • Low Latency
  • High Availability
  • Consistency
  • Scalability
  • Security and Privacy

System Architecture

Key Strategies

  • Fan-Out/Fan-In Strategy
  • Read-Your-Writes Consistency
  • Data replication across servers

Architectural Styles

  • Event-Driven Architecture
  • Microservices Structure

Design Patterns

Fan-Out/Fan-In Strategy

  • Push Model: Updates are directly pushed to all followers' feeds.
  • Pull Model: Followers fetch updates from the original poster's feed.
  • Hybrid Model: Combines push for critical updates and pull for less urgent ones.

Event Sourcing Pattern

  • Store Events
  • Replay Events

Sharding Pattern

  • Shard by user region
  • Shard by activity level

System Architecture

1. Authentication Flow

  1. Client initiates secure authentication by requesting a token from Token Server. After validating credentials, Token Server issues a secure JWT token.
  2. Using the authenticated token, Client uploads media to Media Storage. The system then processes uploads, generates optimized thumbnails, and prepares content for CDN distribution.

2. Post Creation Process

  1. Client assembles post content (text/media) and transmits through API Gateway, which manages rate limiting and performs preliminary request validation.
  2. Web Server conducts thorough content validation and permission checks. Queue system handles traffic management and ensures reliable message delivery.

3. Content Distribution Strategy

  1. Work Server queries User Relationship DB to determine follower relationships and distribution method.
  2. Fanout Strategy (< 10K followers): Immediately pushes new content to followers' Timeline Cache, enabling real-time updates.
  3. Pull Strategy (> 10K followers): Implements lazy loading where content is retrieved when followers access their feed, optimizing system resources.

Database Design

API Design

class NewsFeedAPI:
def create_post(user_id: UUID, content: string, media_files: List[File]) -> Post:
"""Create a new post with optional media attachments"""
# 1. Validate user and content
validate_user(user_id)
validate_content(content)
# 2. Upload media files to CDN/Storage
media_urls = media_service.upload_files(media_files)
# 3. Create post in database
post = Post(
user_id=user_id,
content=content,
media_urls=media_urls
)
db.posts.insert(post)
# 4. Publish event to message queue for async processing
event = {
"type": "NEW_POST",
"post_id": post.id,
"user_id": user_id
}
message_queue.publish("post_events", event)
return post
def get_newsfeed(user_id: UUID, page: int, limit: int) -> List[Post]:
"""Get user's newsfeed with pagination"""
# 1. Try to get from cache first
cached_feed = cache.get(f"feed:{user_id}:{page}")
if cached_feed:
return cached_feed
# 2. Get timeline entries from cache/database
timeline_entries = timeline_cache.get_entries(
user_id=user_id,
offset=page * limit,
limit=limit
)
# 3. Fetch full post data
post_ids = [entry.post_id for entry in timeline_entries]
posts = db.posts.find({"id": {"$in": post_ids}})
# 4. Apply ranking algorithm
ranked_posts = ranking_service.rank_posts(posts, user_id)
# 5. Cache the results
cache.set(
f"feed:{user_id}:{page}",
ranked_posts,
expire=300 # 5 minutes
)
return ranked_posts

Interview Questions

How would you ensure low latency for instant updates across a globally distributed user base?

  • Implement multi-region CDNs with edge caching for static content and frequently accessed posts
  • Use multi-level caching strategy (browser, CDN, application, database) with Redis/Memcached
  • Shard databases by user_id with read replicas for heavy read operations

How would you guarantee consistency in the news feed across multiple devices?

  • Implement version vectors or logical clocks to track update sequence across devices
  • Use optimistic concurrency control with last-write-wins for conflict resolution
  • Maintain a central source of truth with event sourcing for state reconstruction

How would you design a system to identify and display trending hashtags in real-time?

  • Use Apache Kafka/Kinesis for real-time stream processing of hashtag usage
  • Implement sliding window counters with Redis for temporal trending analysis
  • Apply decay factors to older data for recency-biased trending algorithms

How would you track and display real-time analytics for trending content?

  • Use stream processing (Apache Flink/Storm) for real-time event aggregation
  • Implement counter sharding with eventual consistency for high-volume metrics
  • Maintain pre-aggregated statistics in Redis with periodic persistence to database

How would you design a system to deliver real-time notifications to millions of users?

  • Use WebSocket connections with connection pooling for persistent connections
  • Implement message queues (RabbitMQ/Kafka) for reliable message delivery
  • Deploy notification service across multiple regions with local presence

How would you handle data hotspots from viral content?

  • Implement adaptive caching with automatic promotion of viral content
  • Use rate limiting and request throttling based on user/content popularity
  • Deploy dynamic read replicas for hot data partitions

How would you ensure data privacy and compliance with regulations?

  • Implement end-to-end encryption for sensitive data with proper key management
  • Use role-based access control with detailed audit logging of all data access
  • Deploy data retention policies with automated deletion/anonymization workflows