coding
ByteDance
TikTok
Google

ByteDance Coding Question: Merge Accounts by Email

Topics:
Union-Find
Hash Table
Strings
Roles:
Software Engineer
Backend Engineer
Full Stack Engineer
Experience:
Entry Level
Mid Level
Senior

Question Description

You are given a list of accounts where each account is a list: the first element is a person's name and the rest are their emails. Your task is to merge accounts that belong to the same person — two accounts belong to the same person if they share at least one email — and return each merged account with the name followed by the account's unique, lexicographically sorted emails.

How the interview will flow

  • Clarify requirements first: confirm whether names can differ for the same email, whether input contains duplicates, and expected output ordering constraints.
  • Propose approaches: a Union-Find (disjoint set) that unions emails within the same account, or build an undirected graph with emails as nodes and run DFS/BFS to gather connected components. Use a hash map to map each email to a name and to map email to its DSU parent or adjacency list.
  • Implement: focus on correct data structures, deduplication, sorting, and robust edge-case handling (single-email accounts, identical accounts, empty input).
  • Analyze complexity: time dominated by union/find operations and sorting of emails per component; space for hash maps and DSU arrays.

Skill signals you should demonstrate

  • Strong use of Hash Tables and Union-Find (or graph traversal) and correct string handling
  • Ability to reason about complexity and memory trade-offs
  • Attention to edge cases and output formatting (unique and sorted emails)

This question tests practical data-structure design and clean implementation skills commonly asked in coding screens and onsite rounds.

Common Follow-up Questions

  • How would you modify your solution if two accounts with the same email have different names? Which name do you keep and how do you decide?
  • Implement the merge using a graph + DFS instead of Union-Find — compare time, space, and implementation complexity between the two.
  • How do you scale this solution for millions of accounts and emails that don't fit in memory? Describe an external-memory or distributed approach.
  • If emails can be normalized (e.g., ignore dots in Gmail local part), where in your pipeline do you apply normalization and how does it affect deduplication?

Related Questions

1Find connected components in an undirected graph (using Union-Find/DFS)
2Find Duplicate File in System — group files by identical content
3Design an email deduplication system with normalization rules
4Accounts Merge variant: merge by phone number or mixed identifiers

Explore More Questions

Practice This Question with AI

Get real-time hints, detailed requirements, and insightful analysis of the question.

Merge Accounts by Email — ByteDance Coding Interview Prep | Voker