Post

[SolveSQL] VIP of Cities

Solution for SolveSQL VIP of Cities using CTEs and window ranking.

[SolveSQL] VIP of Cities

Problem

SolveSQL VIP of Cities

Find the VIP customer for each city. A VIP is the customer with the highest total spending in that city, excluding returned transactions.

The important detail is that spending should be based on the net transaction amount, not just the raw price.


Data Model

TableColumns UsedMeaning
transactionscity_idCity where the transaction happened
transactionscustomer_idCustomer who made the transaction
transactionstotal_priceOriginal transaction amount
transactionsdiscount_amountDiscount applied to the transaction
transactionsis_returnedWhether the transaction was returned

Query Goal

Translate the problem into SQL steps:

  1. Exclude returned transactions before aggregation.
  2. Compute each customer’s net spending in each city.
  3. Rank customers inside each city by total spending.
  4. Return only the top-ranked customer or customers for each city.

Expected Result / Result Schema

ColumnMeaning
city_idCity identifier
customer_idVIP customer identifier in that city
total_spentCustomer’s total net spending in that city

The result has one row per city-level VIP. If multiple customers tie for the highest spending in the same city, the query should return all tied VIP customers.


Strategy

This is a Top per Group problem.

A common mistake is to compute MAX(total_spent) per city and then try to recover the matching customer. That approach becomes awkward when there are ties or when the result must include columns from the winning row.

Instead, split the query into two CTEs:

  1. customer_spending: one row per (city_id, customer_id) with total net spending.
  2. ranked_spending: one row per customer-city pair with a city-local spending rank.

Then the outer query keeps only spending_rank = 1.


Step-by-Step Analysis

flowchart TD
    A["transactions"] --> B["Filter returned rows<br>is_returned = 0"]
    B --> C["Group by city_id and customer_id"]
    C --> D["Compute SUM(total_price - discount_amount)"]
    D --> E["Rank within each city<br>RANK() OVER (PARTITION BY city_id)"]
    E --> F["Keep spending_rank = 1"]
    F --> G["Return city_id, customer_id, total_spent"]
    style G fill:#90EE90
  1. WHERE is_returned = 0 removes returned transactions before they can affect the totals.
  2. SUM(total_price - discount_amount) calculates net spending.
  3. GROUP BY city_id, customer_id creates one spending total per customer per city.
  4. RANK() assigns rank 1 to the highest spender or spenders in each city.
  5. The outer query filters spending_rank = 1.

Solution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
WITH
  customer_spending AS (
    SELECT
      city_id,
      customer_id,
      SUM(total_price - discount_amount) AS total_spent
    FROM
      transactions
    WHERE
      is_returned = 0
    GROUP BY
      city_id,
      customer_id
  ),
  ranked_spending AS (
    SELECT
      city_id,
      customer_id,
      total_spent,
      RANK() OVER (
        PARTITION BY city_id
        ORDER BY
          total_spent DESC
      ) AS spending_rank
    FROM
      customer_spending
  )
SELECT
  city_id,
  customer_id,
  total_spent
FROM
  ranked_spending
WHERE
  spending_rank = 1
ORDER BY
  city_id,
  customer_id;

Clause-by-Clause

customer_spending

This CTE defines the row granularity for the ranking step: one row per customer in each city.

WHERE is_returned = 0 belongs here because returned transactions should never be included in the spending total. The net spending expression is:

1
SUM(total_price - discount_amount) AS total_spent

This is the value that determines VIP status.

ranked_spending

This CTE ranks customers within each city:

1
2
3
4
5
RANK() OVER (
  PARTITION BY city_id
  ORDER BY
    total_spent DESC
) AS spending_rank

PARTITION BY city_id restarts the ranking for every city, and ORDER BY total_spent DESC puts the highest spender first.

Final Filter

The final query keeps spending_rank = 1.

This filter must happen in the outer query because window function results are not available to the same SELECT statement’s WHERE clause. The CTE gives the window result a name first, then the outer query filters it.


Pitfalls

  • Filtering returned transactions too late: is_returned = 0 must be applied before GROUP BY, otherwise returned orders can inflate spending.
  • Using gross spending: VIP status depends on total_price - discount_amount, not total_price alone.
  • Using ROW_NUMBER() without a tie-breaker: ROW_NUMBER() would keep only one customer when two customers tie. RANK() preserves tied VIPs.
  • Filtering window results in the wrong phase: spending_rank = 1 belongs in the outer query, not in the same query block that computes RANK().
  • Omitting final ordering: ORDER BY city_id, customer_id makes the output deterministic and easier to verify.

Key Takeaways

PointDescription
Top per GroupRank rows inside each group, then keep rank 1
RANK()Preserves ties when multiple rows share the top value
WHERE before GROUP BYExclude rows that should not contribute to aggregates
Window filter phaseCompute window results in a CTE, then filter in the outer query
This post is licensed under CC BY 4.0 by the author.