Top-k Video Similarity Search — Google ML Interview

Question Description

You are asked to find the top-k most similar videos to a given source in a directed, weighted video-similarity graph where similarity composes multiplicatively along paths (path weight = product of edge weights). Each edge weight is a float (typically in [0,1]) and cycles may exist; the similarity from s to t is the maximum product over all paths from s to t.

In the coding stage you will implement top_k_similar(graph, source_id, k) that returns up to k (video_id, similarity) pairs ordered by decreasing similarity. Expect to discuss numerical stability (products of many small floats), how you handle zero-weight edges, and how to make the search efficient on very large graphs (millions of nodes) when you only need the top-k results.

The interview typically flows in two parts: (1) local algorithm and complexity/edge-case analysis — you should show that you can transform multiplicative scores (e.g., -log transform) to use a shortest-path style best-first search and explain correctness with cycles; (2) system/remote-awareness — describe how you would adapt the solution when adjacency lists are sharded and must be fetched over RPC. For the non-local stage you'll be expected to specify required interfaces (for example, fetch_neighbors(node_id) -> List[(neighbor, weight)]), caching budget, timeout and retry behavior, and trade-offs between batching RPCs, memory limits, and result latency.

Skills signaled: graph algorithms (max-product / log-transform + best-first search), numerical stability and precision, large-scale search optimization, and distributed/remote data access patterns (API contracts, caching, failure modes). Use precise complexity and operational constraints when you explain your design.

Top-k Video Similarity Search - Google ML Coding Interview

Question Description

Common Follow-up Questions

Related Questions

Explore More Questions

Practice This Question with AI