Amazon Business Intelligence SQL Interview Questions
Common Table Expression(CTE) Type:
Highest cost orders
Find the customer with the highest daily total order cost between 2019-02-01 to 2019-05-01. If customer had more than one order on a certain day, sum the order costs on daily basis. Output their first name, total cost of their items, and the date.
👉 Question Sourcewith my_cte as ( (SELECT first_name, order_date, SUM(total_order_cost) AS total_order_cost FROM (SELECT first_name, order_date, total_order_cost FROM customers AS c INNER JOIN orders AS o ON c.id = o.cust_id ORDER BY 1 , 2) AS t GROUP BY 1 , 2); SELECT * FROM my_cte WHERE total_order_cost = (SELECT MAX(total_order_cost) FROM my_cte WHERE order_date BETWEEN '2019-02-01' AND '2019-05-01');
Best Selling Item
Find the best selling item for each month where the biggest total invoice was paid. The best selling item is calculated using the formula (unitprice * quantity). Output the description of the item along with the amount paid.
👉 Question Sourcewith cte as ( SELECT description, TO_CHAR(invoicedate, 'YYYY-MM') AS year_month, quantity * unitprice AS amount_paid FROM online_retail ORDER BY 1) SELECT description, amount_paid FROM cte WHERE (year_month , amount_paid) IN (SELECT year_month, MAX(amount_paid) AS amount_paid FROM my_cte GROUP BY 1) ORDER BY year_month;
Host Popularity Rental Prices
You’re given a table of rental property searches by users. The table consists of search results and outputs host information for searchers. Find the minimum, average, maximum rental prices for each host’s popularity rating. The host’s popularity rating is defined as below. Output host popularity rating and their minimum, average and maximum rental prices.
0 reviews: New
1 to 5 reviews: Rising
6 to 15 reviews: Trending Up
16 to 40 reviews: Popular
more than 40 reviews: Hot
👉 Question Sourcewith cte as ( SELECT CONCAT(price, room_type, host_since, zipcode) AS host_id, number_of_reviews, price, CASE WHEN number_of_reviews = 0 THEN 'New' WHEN number_of_reviews BETWEEN 1 AND 5 THEN 'Rising' WHEN number_of_reviews BETWEEN 6 AND 15 THEN 'Trending Up' WHEN number_of_reviews BETWEEN 16 AND 40 THEN 'Popular' WHEN number_of_reviews > 40 THEN 'Hot' END AS host_popularity FROM airbnb_host_searches GROUP BY 1 , 2 , 3) SELECT host_popularity, MIN(price), AVG(price), MAX(price) FROM cte GROUP BY 1
Window Functions Type:
Monthly percentage difference
Given a table of purchases by date, calculate the month-over-month percentage change in revenue. The output should include the year-month date (YYYY-MM) and percentage change, rounded to the 2nd decimal point, and sorted from the beginning of the year to the end of the year. The percentage change column will be populated from the 2nd month forward and can be calculated as ((this month’s revenue - last month’s revenue) / last month’s revenue)*100.
👉 Question SourceSELECT year_month, round((revenue_month - lag(revenue_month) over (w))/ lag(revenue_month) over (w)*100,2) as rev_diff_pct FROM (SELECT TO_CHAR(created_at, 'YYYY-MM') AS year_month, SUM(value) AS revenue_month FROM sf_transactions GROUP BY 1 ORDER BY 1) AS t window w as (order by year_month);
Unique Salaries
Find the top three unique salaries for each department. Output the department name and the top 3 unique salaries by each department. Order your results alphabetically by department and then by highest salary to lowest.
👉 Question SourceSELECT DISTINCT department, salary FROM (SELECT department, salary, dense_rank() over(partition by department order by salary desc) as departmentwise_salary_rank FROM twitter_employee ORDER BY 1, 2 DESC, 3) t1 WHERE departmentwise_salary_rank < 4 ORDER BY 1, 2 DESC;
Top Percentile Fraud
ABC Corp is a mid-sized insurer in the US and in the recent past their fraudulent claims have increased significantly for their personal auto insurance portfolio. They have developed a ML based predictive model to identify propensity of fraudulent claims. Now, they assign highly experienced claim adjusters for top 5 percentile of claims identified by the model. Your objective is to identify the top 5 percentile of claims from each state. Your output should be policy number, state, claim cost, and fraud score.
👉 Question SourceSELECT * FROM (SELECT policy_num, state, claim_cost, fraud_score, ntile(100) over (partition by state order by fraud_score desc) as percentile_score from fraud_score) AS t1 WHERE percentile_score < 6;
Find the top 5 highest paid and top 5 least paid employees in 2012
Find the top 5 highest paid and top 5 least paid employees in 2012. Output the employee name along with the corresponding total pay with benefits. Sort records based on the total payment with benefits in ascending order.
👉 Question SourceSELECT employeename, totalpaybenefits FROM (SELECT employeename, totalpaybenefits, dense_rank() over(order by totalpaybenefits asc) as least_salary_rank, dense_rank() over(order by totalpaybenefits desc) as highest_salary_rank, FROM sf_public_salaries WHERE year = '2012' ORDER BY highest_salary_rank ASC) AS t1 WHERE least_salary_rank < 6 OR highest_salary_rank < 6 ORDER BY 2;
Activity Rank
Find the email activity rank for each user. Email activity rank is defined by the total number of emails sent. The user with the highest number of emails sent will have a rank of 1, and so on. Output the user, total emails, and their activity rank. Order records by the total emails in descending order. Sort users with the same number of emails in alphabetical order. In your rankings, return a unique value (i.e., a unique rank) even if multiple users have the same number of emails.
👉 Question SourceSELECT user1 AS user, total_emails, dense_rank() over (order by total_emails desc, user1) as activity_rank FROM (SELECT from_user AS user1, COUNT(from_user) AS total_emails FROM google_gmail_emails GROUP BY 1 ORDER BY 2 DESC) AS t1
Growth of Airbnb
Estimate the growth of Airbnb each year using the number of hosts registered as the growth metric. The rate of growth is calculated by taking ((number of hosts registered in the current year - number of hosts registered in the previous year) / the number of hosts registered in the previous year) * 100. Output the year, number of hosts in the current year, number of hosts in the previous year, and the rate of growth. Round the rate of growth to the nearest percent and order the result in the ascending order based on the year. Assume that the dataset consists only of unique hosts, meaning there are no duplicate hosts listed.
👉 Question SourceSELECT year, current_yr_host_cnt, lag(current_yr_host_cnt,1) over(w) as previous_yr_host_cnt, (current_yr_host_cnt - lag(current_yr_host_cnt,1) over(w))*100/current_yr_host_cnt as growth_rate FROM (SELECT TO_CHAR(host_since::date, 'YYYY') AS year, COUNT(id) AS current_yr_host_cnt FROM airbnb_search_details GROUP BY 1 ORDER BY 2) AS t1 window w as (order by year)
Join Type:
Finding User purchases
Write a query that’ll identify returning active users. A returning active user is a user that has made a second purchase within 7 days of any other of their purchases. Output a list of user_ids of these returning active users.
👉 Question SourceSELECT DISTINCT a.user_id FROM amazon_transactions a INNER JOIN amazon_transactions b ON a.user_id = b.user_id AND a.id != b.id AND b.created_at BETWEEN (a.created_at AND a.created_at + INTERVAL 7 DAY)::date order by 1
Favorite Host Nationality
For each guest reviewer, find the nationality of the reviewer’s favorite host based on the guest’s highest review score given to a host. Output the user ID of the guest along with their favorite host’s nationality.
Both the from_user
and to_user
columns are user IDs.
👉 Question Source
SELECT DISTINCT
from_user, nationality
FROM
(SELECT
from_user, to_user, from_type, to_type, review_score, nationality,
dense_rank() over (partition by from_user order by review_score desc) as review_rank
FROM
airbnb_reviews r
JOIN airbnb_hosts AS h ON r.to_user = h.host_id
WHERE
from_type = 'guest' AND to_type = 'host') AS t1
WHERE
review_rank = 1;
Case Statement Type:
Finding User purchases
Write a query that’ll identify returning active users. A returning active user is a user that has made a second purchase within 7 days of any other of their purchases. Output a list of user_ids of these returning active users.
👉 Question SourceSELECT user_id FROM (SELECT user_id, COUNT( CASE WHEN lead_created_at BETWEEN created_at AND ((created_at + INTERVAL '7 days')::date) THEN user_id END) AS user_cnt FROM (SELECT user_id, created_at, lead(created_at,1) over (partition by user_id order by user_id, created_at) as lead_created_at FROM amazon_transactions) AS t WHERE lead_created_at IS NOT NULL GROUP BY 1) AS t1 WHERE user_cnt > 0;