Skip to main content
SDMastery
easy7 min readUpdated 2026-06-03

Design Autocomplete for Search

Design a search autocomplete/typeahead system using Trie data structures, prefix matching, and ranking.

Design Autocomplete for Search system design overview showing key components and metrics
High-level overview of Design Autocomplete for Search

Problem Statement

Design a search autocomplete (typeahead) system that returns the top suggestions as a user types each character. The system must serve suggestions with sub-100ms latency for billions of queries per day, ranking results by popularity, recency, and personalization.

Requirements

Design Autocomplete for Search system architecture with service components and data flow
System architecture for Design Autocomplete for Search

Functional

  • Return top-k (e.g., 10) prefix-matched suggestions as the user types
  • Rank suggestions by popularity, freshness, and optionally user history
  • Support multi-language queries and handle typos gracefully
  • Update suggestion corpus as new trending queries emerge

Non-Functional

  • Latency: <100ms p99 per keystroke
  • Availability: 99.99% uptime -- degraded suggestions are better than none
  • Scale: 500M daily active users, ~10B queries/day, peak 200K QPS

Core Architecture

Step-by-step diagram showing how Design Autocomplete for Search works in practice
How Design Autocomplete for Search works step by step
  1. Trie Service -- In-memory distributed Trie storing top-k suggestions at each prefix node. Each node holds a sorted list of up to 25 candidates with their scores. Tries are partitioned by first two characters across nodes.

  2. Ranking Aggregator -- Offline MapReduce pipeline that processes query logs every 15 minutes. Computes popularity scores using exponential time decay: score = count * 0.95^(hours_since_query). Rebuilds Trie snapshots.

  3. Sampling Collector -- Logs a sampled subset (1-in-1000) of all queries to avoid overwhelming storage. Feeds into the Ranking Aggregator. Stored in append-only log (Kafka topic).

Comparison table for Design Autocomplete for Search showing key metrics and tradeoffs
Comparing key aspects of Design Autocomplete for Search
  1. Zookeeper Coordination Layer -- Manages Trie partition assignment across nodes. Handles leader election for Trie rebuild. Coordinates rolling Trie snapshot deployments.

  2. CDN/Edge Cache -- Most popular prefixes (a, th, the, wh, ...) are cached at CDN edge nodes with 5-minute TTL. Handles ~30% of all requests without reaching the Trie service.

Database Choice

Data flow diagram for Design Autocomplete for Search showing request and response paths
Data flow through Design Autocomplete for Search

Trie storage: In-memory data structure (custom or Redis-based) for serving. The Trie is rebuilt offline and serialized as a binary snapshot loaded at startup. Raw query logs are stored in HDFS/S3 for batch processing. Metadata (query-to-score mappings) lives in Cassandra for its write-heavy, append-friendly nature.

Key API Endpoints

text
GET /api/v1/suggestions?prefix=\{query\}&limit=10&lang=en
  -> Returns: \{ suggestions: ["search term 1", "search term 2", ...], latency_ms: 12 \}

POST /api/v1/queries (internal - from sampling collector)
  -> Body: \{ query: "full search term", timestamp: 1700000000, user_id?: "abc" \}

Scaling Insight

Interview tips for Design Autocomplete for Search system design questions
Interview tips for Design Autocomplete for Search

The key scaling trick is two-level caching with prefix partitioning. Short prefixes (1-2 chars) generate massive fan-out and are cached at the CDN edge. Longer prefixes hit the Trie service, which is partitioned by the first two characters of the prefix. This means "go" and "goo" and "google" all hit the same Trie partition. With 26^2 = 676 possible two-char prefixes, you can distribute across ~100 machines with good load balance. Hot prefixes get additional replicas.

Key Tradeoffs

DecisionOption AOption BChosen
Data StructureTrie (prefix tree)Inverted indexTrie -- O(L) lookup where L=prefix length, naturally supports prefix matching
Update FrequencyReal-time per queryBatch every 15 minBatch -- simpler, avoids write contention on Trie nodes, 15-min lag acceptable
RankingGlobal popularity onlyPersonalized per userHybrid -- global Trie + client-side re-ranking with user history for top results

Practical Implementation for .NET Developers

Decision guide showing when to use Design Autocomplete for Search and when to avoid
When to use Design Autocomplete for Search

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Pros and cons analysis of Design Autocomplete for Search for system design decisions
Advantages and disadvantages of Design Autocomplete for Search

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
Real-world companies using Design Autocomplete for Search in production systems
Real-world examples of Design Autocomplete for Search

This gives you searchable, structured logs in Azure Monitor or Seq.

System-Specific Clarifying Questions

Key components of Design Autocomplete for Search with roles and responsibilities
Key components of Design Autocomplete for Search

Before designing Autocomplete, ask questions specific to THIS system:

  1. Who are the primary users? Understanding the user base shapes every technical decision — consumer apps have different requirements than enterprise B2B systems.
  2. What is the read-to-write ratio? This determines whether you optimize for fast reads (caching, denormalization) or fast writes (write-ahead logs, async processing).
  3. What is the geographic distribution? Users in one country vs. global users fundamentally changes your data replication and CDN strategy.
  4. What is the acceptable latency? Some features need sub-100ms responses, others can tolerate seconds. This determines your caching and architecture strategy.
  5. What is the consistency requirement? Some data (payments, inventory) needs strong consistency. Other data (social feeds, recommendations) can be eventually consistent.

Architecture Deep Dive

The architecture for Autocomplete should be designed around the specific access patterns of the system. Do not apply generic templates — every system has unique hotspots, bottlenecks, and scaling challenges.

Write Path: How does data enter the system? Is it bursty (event-driven, flash sales) or steady (sensor data, logs)? Bursty writes need queuing and backpressure. Steady writes can go directly to the database.

Read Path: How is data consumed? Is it fan-out (one write, many reads like social feeds) or point lookups (one read for specific data like user profiles)? Fan-out reads benefit from pre-computation and caching. Point lookups benefit from efficient indexing.

Hot Spots: Where are the bottlenecks? For Autocomplete, identify the component that will fail first under load and design mitigation strategies: caching, sharding, rate limiting, or async processing.

Sources