easy7 min readUpdated 2026-06-08

Design Autocomplete for Search

Design a search autocomplete/typeahead system using Trie data structures, prefix matching, and ranking.

High-level overview of Design Autocomplete for Search

Problem Statement

Design a search autocomplete (typeahead) system that returns the top suggestions as a user types each character. The system must serve suggestions with sub-100ms latency for billions of queries per day, ranking results by popularity, recency, and personalization.

Requirements

System architecture for Design Autocomplete for Search

Functional

Return top-k (e.g., 10) prefix-matched suggestions as the user types
Rank suggestions by popularity, freshness, and optionally user history
Support multi-language queries and handle typos gracefully
Update suggestion corpus as new trending queries emerge

Non-Functional

Latency: <100ms p99 per keystroke
Availability: 99.99% uptime -- degraded suggestions are better than none
Scale: 500M daily active users, ~10B queries/day, peak 200K QPS

Core Architecture

How Design Autocomplete for Search works step by step

Trie Service -- In-memory distributed Trie storing top-k suggestions at each prefix node. Each node holds a sorted list of up to 25 candidates with their scores. Tries are partitioned by first two characters across nodes.
Ranking Aggregator -- Offline MapReduce pipeline that processes query logs every 15 minutes. Computes popularity scores using exponential time decay: score = count * 0.95^(hours_since_query). Rebuilds Trie snapshots.
Sampling Collector -- Logs a sampled subset (1-in-1000) of all queries to avoid overwhelming storage. Feeds into the Ranking Aggregator. Stored in append-only log (Kafka topic).

Comparing key aspects of Design Autocomplete for Search

Zookeeper Coordination Layer -- Manages Trie partition assignment across nodes. Handles leader election for Trie rebuild. Coordinates rolling Trie snapshot deployments.
CDN/Edge Cache -- Most popular prefixes (a, th, the, wh, ...) are cached at CDN edge nodes with 5-minute TTL. Handles ~30% of all requests without reaching the Trie service.

Database Choice

Data flow through Design Autocomplete for Search

Trie storage: In-memory data structure (custom or Redis-based) for serving. The Trie is rebuilt offline and serialized as a binary snapshot loaded at startup. Raw query logs are stored in HDFS/S3 for batch processing. Metadata (query-to-score mappings) lives in Cassandra for its write-heavy, append-friendly nature.

Key API Endpoints

text

GET /api/v1/suggestions?prefix=\{query\}&limit=10&lang=en
  -> Returns: \{ suggestions: ["search term 1", "search term 2", ...], latency_ms: 12 \}

POST /api/v1/queries (internal - from sampling collector)
  -> Body: \{ query: "full search term", timestamp: 1700000000, user_id?: "abc" \}

Scaling Insight

Interview tips for Design Autocomplete for Search

The key scaling trick is two-level caching with prefix partitioning. Short prefixes (1-2 chars) generate massive fan-out and are cached at the CDN edge. Longer prefixes hit the Trie service, which is partitioned by the first two characters of the prefix. This means "go" and "goo" and "google" all hit the same Trie partition. With 26^2 = 676 possible two-char prefixes, you can distribute across ~100 machines with good load balance. Hot prefixes get additional replicas.

Key Tradeoffs

Decision	Option A	Option B	Chosen
Data Structure	Trie (prefix tree)	Inverted index	Trie -- O(L) lookup where L=prefix length, naturally supports prefix matching
Update Frequency	Real-time per query	Batch every 15 min	Batch -- simpler, avoids write contention on Trie nodes, 15-min lag acceptable
Ranking	Global popularity only	Personalized per user	Hybrid -- global Trie + client-side re-ranking with user history for top results

Practical Implementation for .NET Developers

When to use Design Autocomplete for Search

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Advantages and disadvantages of Design Autocomplete for Search

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

Real-world examples of Design Autocomplete for Search

This gives you searchable, structured logs in Azure Monitor or Seq.

System-Specific Clarifying Questions

Key components of Design Autocomplete for Search

Before designing Autocomplete, ask questions specific to THIS system:

Who are the primary users? Understanding the user base shapes every technical decision — consumer apps have different requirements than enterprise B2B systems.
What is the read-to-write ratio? This determines whether you optimize for fast reads (caching, denormalization) or fast writes (write-ahead logs, async processing).
What is the geographic distribution? Users in one country vs. global users fundamentally changes your data replication and CDN strategy.
What is the acceptable latency? Some features need sub-100ms responses, others can tolerate seconds. This determines your caching and architecture strategy.
What is the consistency requirement? Some data (payments, inventory) needs strong consistency. Other data (social feeds, recommendations) can be eventually consistent.

Architecture Deep Dive

The architecture for Autocomplete should be designed around the specific access patterns of the system. Do not apply generic templates — every system has unique hotspots, bottlenecks, and scaling challenges.

Write Path: How does data enter the system? Is it bursty (event-driven, flash sales) or steady (sensor data, logs)? Bursty writes need queuing and backpressure. Steady writes can go directly to the database.

Read Path: How is data consumed? Is it fan-out (one write, many reads like social feeds) or point lookups (one read for specific data like user profiles)? Fan-out reads benefit from pre-computation and caching. Point lookups benefit from efficient indexing.

Hot Spots: Where are the bottlenecks? For Autocomplete, identify the component that will fail first under load and design mitigation strategies: caching, sharding, rate limiting, or async processing.

Sources

Reference

Reference Solutionarticle