ByteDance Coding Question: Merge Accounts by Email

Question Description

You are given a list of accounts where each account is a list: the first element is a person's name and the rest are their emails. Your task is to merge accounts that belong to the same person — two accounts belong to the same person if they share at least one email — and return each merged account with the name followed by the account's unique, lexicographically sorted emails.

How the interview will flow

  • Clarify requirements first: confirm whether names can differ for the same email, whether input contains duplicates, and expected output ordering constraints.
  • Propose approaches: a Union-Find (disjoint set) that unions emails within the same account, or build an undirected graph with emails as nodes and run DFS/BFS to gather connected components. Use a hash map to map each email to a name and to map email to its DSU parent or adjacency list.
  • Implement: focus on correct data structures, deduplication, sorting, and robust edge-case handling (single-email accounts, identical accounts, empty input).
  • Analyze complexity: time dominated by union/find operations and sorting of emails per component; space for hash maps and DSU arrays.

Skill signals you should demonstrate

  • Strong use of Hash Tables and Union-Find (or graph traversal) and correct string handling
  • Ability to reason about complexity and memory trade-offs
  • Attention to edge cases and output formatting (unique and sorted emails)

This question tests practical data-structure design and clean implementation skills commonly asked in coding screens and onsite rounds.

Common Follow-up Questions

  • How would you modify your solution if two accounts with the same email have different names? Which name do you keep and how do you decide?
  • Implement the merge using a graph + DFS instead of Union-Find — compare time, space, and implementation complexity between the two.
  • How do you scale this solution for millions of accounts and emails that don't fit in memory? Describe an external-memory or distributed approach.
  • If emails can be normalized (e.g., ignore dots in Gmail local part), where in your pipeline do you apply normalization and how does it affect deduplication?

Related Questions

1LinkedIn T9 Coding Question: Phone-Keyboard Word Match
2Adobe Coding Question: Shortest Subarray with K Distinct
3Cisco Coding Interview: Look-and-Say Next Term
4Google Coding Question: Online Longest Subarray with Average
5LRU Cache Implementation - Oracle Coding Interview
6Find connected components in an undirected graph (using Union-Find/DFS)
7Find Duplicate File in System — group files by identical content
8Design an email deduplication system with normalization rules
9Accounts Merge variant: merge by phone number or mixed identifiers

Explore More Questions

Practice This Question with AI

Get real-time hints, detailed requirements, and insightful analysis of the question.

Merge Accounts by Email — ByteDance Coding Interview Prep | Voker