Amazon Business Intelligence SQL Interview Questions

Common Table Expression(CTE) Type:

  1. Highest cost orders
    Find the customer with the highest daily total order cost between 2019-02-01 to 2019-05-01. If customer had more than one order on a certain day, sum the order costs on daily basis. Output their first name, total cost of their items, and the date.
    👉 Question Source

     with my_cte as (
       (SELECT 
           first_name, order_date, SUM(total_order_cost) AS total_order_cost
       FROM
           (SELECT 
               first_name, order_date, total_order_cost
           FROM
               customers AS c
           INNER JOIN orders AS o ON c.id = o.cust_id
           ORDER BY 1 , 2) AS t
       GROUP BY 1 , 2);
    
     SELECT * FROM my_cte
     WHERE total_order_cost = (SELECT MAX(total_order_cost)
             FROM my_cte WHERE order_date BETWEEN '2019-02-01' AND '2019-05-01');
  2. Best Selling Item
    Find the best selling item for each month where the biggest total invoice was paid. The best selling item is calculated using the formula (unitprice * quantity). Output the description of the item along with the amount paid.
    👉 Question Source

     with cte as (
       SELECT 
           description, TO_CHAR(invoicedate, 'YYYY-MM') AS year_month,
           quantity * unitprice AS amount_paid
       FROM online_retail
       ORDER BY 1)
    
     SELECT description, amount_paid
     FROM cte WHERE
         (year_month , amount_paid) IN 
         (SELECT year_month, MAX(amount_paid) AS amount_paid
             FROM my_cte GROUP BY 1)
     ORDER BY year_month;
  3. Host Popularity Rental Prices
    You’re given a table of rental property searches by users. The table consists of search results and outputs host information for searchers. Find the minimum, average, maximum rental prices for each host’s popularity rating. The host’s popularity rating is defined as below. Output host popularity rating and their minimum, average and maximum rental prices.
    0 reviews: New
    1 to 5 reviews: Rising
    6 to 15 reviews: Trending Up
    16 to 40 reviews: Popular
    more than 40 reviews: Hot
    👉 Question Source

     with cte as (
         SELECT 
             CONCAT(price, room_type, host_since, zipcode) AS host_id,
             number_of_reviews, price,
             CASE
                 WHEN number_of_reviews = 0 THEN 'New'
                 WHEN number_of_reviews BETWEEN 1 AND 5 THEN 'Rising'
                 WHEN number_of_reviews BETWEEN 6 AND 15 THEN 'Trending Up'
                 WHEN number_of_reviews BETWEEN 16 AND 40 THEN 'Popular'
                 WHEN number_of_reviews > 40 THEN 'Hot'
             END AS host_popularity
         FROM
             airbnb_host_searches
         GROUP BY 1 , 2 , 3)
    
       SELECT 
           host_popularity, MIN(price), AVG(price), MAX(price)
       FROM cte
       GROUP BY 1

Window Functions Type:

  1. Monthly percentage difference
    Given a table of purchases by date, calculate the month-over-month percentage change in revenue. The output should include the year-month date (YYYY-MM) and percentage change, rounded to the 2nd decimal point, and sorted from the beginning of the year to the end of the year. The percentage change column will be populated from the 2nd month forward and can be calculated as ((this month’s revenue - last month’s revenue) / last month’s revenue)*100.
    👉 Question Source

     SELECT 
         year_month, round((revenue_month - lag(revenue_month) over (w))/
         lag(revenue_month) over (w)*100,2) as rev_diff_pct
     FROM
         (SELECT TO_CHAR(created_at, 'YYYY-MM') AS year_month,
                 SUM(value) AS revenue_month
         FROM sf_transactions
         GROUP BY 1
         ORDER BY 1) AS t
         window w as (order by year_month);
  2. Unique Salaries
    Find the top three unique salaries for each department. Output the department name and the top 3 unique salaries by each department. Order your results alphabetically by department and then by highest salary to lowest.
    👉 Question Source

     SELECT DISTINCT
         department, salary
     FROM
         (SELECT 
             department, salary,
             dense_rank() over(partition by department order by salary desc) as departmentwise_salary_rank
         FROM
             twitter_employee
         ORDER BY 1, 2 DESC, 3) t1
     WHERE
         departmentwise_salary_rank < 4
     ORDER BY 1, 2 DESC;
  3. Top Percentile Fraud
    ABC Corp is a mid-sized insurer in the US and in the recent past their fraudulent claims have increased significantly for their personal auto insurance portfolio. They have developed a ML based predictive model to identify propensity of fraudulent claims. Now, they assign highly experienced claim adjusters for top 5 percentile of claims identified by the model. Your objective is to identify the top 5 percentile of claims from each state. Your output should be policy number, state, claim cost, and fraud score.
    👉 Question Source

     SELECT 
         *
     FROM
         (SELECT policy_num, state, claim_cost, fraud_score,
         ntile(100) over (partition by state order by fraud_score desc) as percentile_score from fraud_score) 
         AS t1
     WHERE
         percentile_score < 6;
  4. Find the top 5 highest paid and top 5 least paid employees in 2012
    Find the top 5 highest paid and top 5 least paid employees in 2012. Output the employee name along with the corresponding total pay with benefits. Sort records based on the total payment with benefits in ascending order.
    👉 Question Source

     SELECT 
         employeename, totalpaybenefits
     FROM
         (SELECT 
             employeename, totalpaybenefits,
             dense_rank() over(order by totalpaybenefits asc) as least_salary_rank,
             dense_rank() over(order by totalpaybenefits desc) as highest_salary_rank,
         FROM
             sf_public_salaries
         WHERE year = '2012'
         ORDER BY highest_salary_rank ASC) AS t1
     WHERE
         least_salary_rank < 6
             OR highest_salary_rank < 6
     ORDER BY 2;
  5. Activity Rank
    Find the email activity rank for each user. Email activity rank is defined by the total number of emails sent. The user with the highest number of emails sent will have a rank of 1, and so on. Output the user, total emails, and their activity rank. Order records by the total emails in descending order. Sort users with the same number of emails in alphabetical order. In your rankings, return a unique value (i.e., a unique rank) even if multiple users have the same number of emails.
    👉 Question Source

     SELECT 
         user1 AS user, total_emails,
         dense_rank() over (order by total_emails desc, user1) as activity_rank
     FROM
         (SELECT 
             from_user AS user1, COUNT(from_user) AS total_emails
         FROM
             google_gmail_emails
         GROUP BY 1
         ORDER BY 2 DESC) AS t1
  6. Growth of Airbnb
    Estimate the growth of Airbnb each year using the number of hosts registered as the growth metric. The rate of growth is calculated by taking ((number of hosts registered in the current year - number of hosts registered in the previous year) / the number of hosts registered in the previous year) * 100. Output the year, number of hosts in the current year, number of hosts in the previous year, and the rate of growth. Round the rate of growth to the nearest percent and order the result in the ascending order based on the year. Assume that the dataset consists only of unique hosts, meaning there are no duplicate hosts listed.
    👉 Question Source

     SELECT 
         year, current_yr_host_cnt,
         lag(current_yr_host_cnt,1) over(w) as previous_yr_host_cnt,
         (current_yr_host_cnt - lag(current_yr_host_cnt,1) over(w))*100/current_yr_host_cnt as growth_rate
     FROM
         (SELECT 
             TO_CHAR(host_since::date, 'YYYY') AS year,
                 COUNT(id) AS current_yr_host_cnt
         FROM
             airbnb_search_details
         GROUP BY 1
         ORDER BY 2) AS t1
     window w as (order by year)

Join Type:

  1. Finding User purchases
    Write a query that’ll identify returning active users. A returning active user is a user that has made a second purchase within 7 days of any other of their purchases. Output a list of user_ids of these returning active users.
    👉 Question Source

     SELECT DISTINCT
         a.user_id
     FROM
         amazon_transactions a
             INNER JOIN
         amazon_transactions b ON a.user_id = b.user_id AND a.id != b.id
             AND b.created_at BETWEEN (a.created_at AND a.created_at + INTERVAL 7 DAY)::date
     order by 1
  2. Favorite Host Nationality
    For each guest reviewer, find the nationality of the reviewer’s favorite host based on the guest’s highest review score given to a host. Output the user ID of the guest along with their favorite host’s nationality.

Both the from_user and to_user columns are user IDs.
👉 Question Source

    SELECT DISTINCT
        from_user, nationality
    FROM
        (SELECT 
            from_user, to_user, from_type, to_type, review_score, nationality,
            dense_rank() over (partition by from_user order by review_score desc) as review_rank
        FROM
            airbnb_reviews r
        JOIN airbnb_hosts AS h ON r.to_user = h.host_id
        WHERE
            from_type = 'guest' AND to_type = 'host') AS t1
    WHERE
        review_rank = 1;
        

Case Statement Type:

  1. Finding User purchases
    Write a query that’ll identify returning active users. A returning active user is a user that has made a second purchase within 7 days of any other of their purchases. Output a list of user_ids of these returning active users.
    👉 Question Source

     SELECT 
         user_id
     FROM
         (SELECT user_id,
                 COUNT(
                 CASE WHEN lead_created_at BETWEEN created_at AND 
                 ((created_at + INTERVAL '7 days')::date) THEN user_id
                 END) AS user_cnt
         FROM
             (SELECT 
             user_id, created_at,
             lead(created_at,1) over (partition by user_id order by user_id, created_at) as lead_created_at
         FROM
             amazon_transactions) AS t
         WHERE
             lead_created_at IS NOT NULL
         GROUP BY 1) AS t1
     WHERE
         user_cnt > 0;
Simranpal Singh Kohli
Simranpal Singh Kohli
Data Scientist