← ~/lc-grind/posts

#169 - Majority Element

September 9, 2025

Majority Element

Problem Description

Given an array nums of size n, return the majority element.

The majority element is the element that appears more than ⌊n / 2⌋ times. You may assume that the majority element always exists in the array.

Example 1:

Input: nums = [3,2,3]
Output: 3

Example 2:

Input: nums = [2,2,1,1,1,2,2]
Output: 2

Constraints:

  • n == nums.length
  • 1 <= n <= 5 * 104
  • -109 <= nums[i] <= 109

Follow-up: Could you solve the problem in linear time and in O(1) space?

Solution

1. Problem Deconstruction

Technical Restatement:
Given an array nums of length n, identify the element x such that the frequency of x in nums strictly exceeds n/2\lfloor n/2 \rfloor. The solution must efficiently determine this element under the guarantee of its existence, optimizing for time and space complexity.

Beginner-Friendly Explanation:
Imagine you have a list of numbers. One number appears in the list more than half the time. Your task is to find that number. For example, if the list has 7 numbers, the winning number must appear at least 4 times. The challenge is to find it as quickly as possible without using too much computer memory.

Mathematical Formulation:
Let SS be a multiset of size nn, and let f(x)f(x) denote the frequency of element xx in SS. The majority element mm satisfies:

f(m)>n2f(m) > \left\lfloor \frac{n}{2} \right\rfloor

This can be equivalently expressed as:

mSsuch thati=0n11nums[i]=m>n2\exists m \in S \quad \text{such that} \quad \sum_{i=0}^{n-1} \mathbb{1}_{nums[i] = m} > \frac{n}{2}

where 1\mathbb{1} is the indicator function.

Constraint Analysis:

  • n == nums.length: The input size is explicitly defined, allowing precise complexity calculations.
  • 1 <= n <= 5 * 10^4:
    • Limitation: Rules out O(n²) solutions (e.g., brute force checking all pairs would require ~1.25e9 operations worst-case, which is computationally infeasible).
    • Edge Case: Arrays of size 1 must return the single element immediately.
  • -10^9 <= nums[i] <= 10^9:
    • Limitation: Values can be very large or negative, making value-based indexing (e.g., using the number as a direct array index) impossible. Hash maps must be used instead.
    • Edge Case: Negative numbers must be handled correctly in hash functions and comparisons.

2. Intuition Scaffolding

Analogies:

  1. Real-World Metaphor (Election):
    Think of the array as votes in an election. The majority element is the candidate who receives more than half the votes. The Boyer-Moore algorithm is like a ballot counter who notes a candidate’s lead, canceling out opposing votes pairwise.

  2. Gaming Analogy (Team Selection):
    In a game where players choose factions, the majority faction is the one with over half the players. You can simulate a “battle” where two players from different factions cancel each other. The last faction standing is the majority.

  3. Math Analogy (Frequency Dominance):
    The majority element’s frequency minus the sum of all other frequencies is at least 1. This imbalance allows a linear cancellation process to correctly identify the element.

Common Pitfalls:

  1. Sorting Misinterpretation: Assuming the median is always the majority element. While true, sorting is O(n log n), which is suboptimal.
  2. Hash Map Overcomplication: Using a hash map counts all frequencies, which is O(n) time but also O(n) space, missing the O(1) space challenge.
  3. Partial Cancellation Errors: Incorrectly implementing the cancellation logic by not resetting the count properly when candidates change.
  4. Unverified Candidates: In variants where the majority might not exist, failing to verify the candidate after the cancellation phase.
  5. Global Frequency Assumption: Assuming the entire array must be processed to identify the majority, whereas local tracking suffices.

3. Approach Encyclopedia

1. Brute Force (Check Every Element)

  • Logic: For each element, scan the entire array to count its occurrences. Return the first element whose count exceeds n/2.
  • Pseudocode:
    for i from 0 to n-1:
        count = 0
        for j from 0 to n-1:
            if nums[j] == nums[i]:
                count += 1
        if count > n/2:
            return nums[i]
    
  • Complexity: Time: O(n²) due to nested loops. Space: O(1).
  • Visualization:
    nums: [3, 2, 3]
    Check 3: count=2 → 2 > 1.5? ✓ → return 3
    

2. Sorting and Median Selection

  • Logic: After sorting, the majority element must occupy the middle position (and potentially adjacent positions) due to its frequency dominance.
  • Pseudocode:
    sort(nums)
    return nums[n // 2]  # Integer division for 0-indexing
    
  • Complexity: Time: O(n log n) dominated by sorting. Space: O(1) if sorted in-place.
  • Visualization:
    Sorted: [2, 3, 3] → median at index 1 is 3
    

3. Hash Map Frequency Counting

  • Logic: Traverse the array, counting occurrences of each element in a hash map. Then, find the key with count > n/2.
  • Pseudocode:
    count_map = {}
    for num in nums:
        count_map[num] = count_map.get(num, 0) + 1
        if count_map[num] > n/2:
            return num
    
  • Complexity: Time: O(n) for single pass. Space: O(n) for storing counts.
  • Visualization:
    nums: [2,2,1,1,1,2,2]
    Count: {2:4, 1:3} → return 2 when count reaches 4
    

4. Boyer-Moore Voting Algorithm (Optimal)

  • Logic: Maintain a candidate and a count. Traverse the array: if count is 0, set current element as candidate. If current element equals candidate, increment count; else decrement. The majority element survives the cancellation.
  • Pseudocode:
    candidate = None
    count = 0
    for num in nums:
        if count == 0:
            candidate = num
        count += (1 if num == candidate else -1)
    return candidate
    
  • Complexity: Time: O(n) with one pass. Space: O(1).
  • Visualization:
    nums: [2,2,1,1,1,2,2]
    Step: [2] (candidate=2, count=1)
           [2,2] (candidate=2, count=2)
           [2,2,1] (candidate=2, count=1)
           [2,2,1,1] (candidate=2, count=0)
           [2,2,1,1,1] (candidate=1, count=1)
           [2,2,1,1,1,2] (candidate=1, count=0)
           [2,2,1,1,1,2,2] (candidate=2, count=1)
    Return 2
    

4. Code Deep Dive

Boyer-Moore Implementation (Python):

def majorityElement(nums):
    candidate = None
    count = 0
    for num in nums:
        if count == 0:
            candidate = num
        count += (1 if num == candidate else -1)
    return candidate

Line-by-Line Annotations:

  1. candidate = None: Initializes the potential majority element.
  2. count = 0: Tracks the net advantage of the candidate over other elements.
  3. for num in nums: Iterates through each element exactly once.
  4. if count == 0:: If the net advantage is zero, the current segment has no majority, so reassign candidate.
  5. count += (1 if num == candidate else -1): Adjusts the net advantage—increment for matches, decrement for mismatches.
  6. return candidate: The algorithm ensures the majority element survives all cancellations.

Edge Case Handling:

  • Example 1 ([3,2,3]):
    • Step 1: num=3, count=0 → candidate=3, count=1
    • Step 2: num=2, count=1 → count=0 (cancel 3 and 2)
    • Step 3: num=3, count=0 → candidate=3, count=1 → return 3
  • Example 2 ([2,2,1,1,1,2,2]):
    • The visualization above shows how the candidate changes but ultimately corrects to 2.
  • Single Element ([5]):
    • candidate=5, count=1 → returns 5 correctly.

5. Complexity War Room

Hardware-Aware Analysis:

  • The Boyer-Moore algorithm uses two integer variables (candidate and count), consuming ~16 bytes on a 64-bit system.
  • For n=50,000, the loop has 50,000 iterations, which fits comfortably in L1/L2 cache due to sequential access, ensuring high performance.

Industry Comparison Table:

Approach Time Space Readability Interview Viability
Brute Force O(n²) O(1) 10/10 ❌ Fails large n
Sorting O(n log n) O(1) 9/10 ✅ Acceptable
Hash Map O(n) O(n) 10/10 ✅ Good
Boyer-Moore O(n) O(1) 7/10 ✅ Optimal

6. Pro Mode Extras

Variants:

  • Majority Element II (LC 229): Find all elements appearing more than n/3 times.
    • Solution: Extend Boyer-Moore with two candidates and two counters.
  • Non-Guaranteed Majority: Verify the candidate by a second pass to count occurrences if existence isn’t guaranteed.

Interview Cheat Sheet:

  • First Mention: Always state time and space complexity upfront.
  • Boyer-Moore Insight: Describe it as “simulating pairwise cancellation between majority and non-majority elements.”
  • Validation: Even if the problem assumes existence, note that in real-world scenarios, verification is needed.
  • Trade-offs: Hash maps are more intuitive and acceptable unless O(1) space is explicitly required.