Do you really know SQL? 🤔

AADI JAIN
6 min readFeb 18, 2024

--

I bet after reading this block you’ll change your perception about this question.

So, if your answer is yes, do answer my second question.
Do you know how to write SQL queries or, Do you know how SQL works?

So, both are different things and generally, we tend to know how to write SQL queries, how to join two tables, how to select fields, how to order on a particular column, etc. But today, we are going to look at the topline of how SQL works!!!

Let’s start with the basics:

Assume, we have this query and this will give us the desired results. But, ever thought about what flow this query will execute?

SELECT customers_id, COUNT(order_id) as total_orders, SUM(order_amount) as total_spent,
FROM customers
JOIN orders
ON customers.id = orders.customer_id
WHERE order_date >= '2023-01-01'
GROUP BY customer_id
HAVING total_spent >= 1000
total_spent DESC
limit 10;

If you observe the below diagram, you’ll see the actual order of the execution of an SQL query. Surprised right? The first SELECT statement we write gets executed near to last.

Source: https://youtu.be/BHwzDmr6d7s?si=WB_sOWfBHmdf1F5E

Mentioned clauses are the most used while writing a query, but let’s focus on how we get the result that we expect from our SQL snippet.

I hope you have a basic understanding of the execution order of SQL. Now, let’s come to an interesting part where we’re gonna learn how to write optimized queries that do not make your CPU sad.

EXPLAIN in SQL, make it your best friend:

There’s nothing better to understand how the database management system (DBMS) executes a given query. When you prefix a SELECT statement with EXPLAIN, the DBMS provides information about the query execution plan, which includes details about how the query will be processed and what operations the DBMS will perform to retrieve the requested data.

Let’s understand the EXPLAIN in SQL with examples.

EXPLAIN SELECT Customers.customerID, COUNT(orderID) as total_orders
FROM Customers
JOIN Orders ON Customers.customerID = Orders.customerID
Output of the above query

id: This is a unique identifier for each step in the query execution plan. In this case, there’s only one step, so it’s labeled as “1.”

select_type: This indicates the type of SELECT statement being executed. In this case, it’s labeled as “SIMPLE,” which typically means a straightforward SELECT query without complex subqueries or unions.

table: This specifies the table being accessed in the query execution plan. The output shows two tables: “Orders” and “Customers.”

partitions: If the table is partitioned, this column would indicate the partitions involved in the query. In this case, it’s empty, indicating that partitions are not used.

type: This column describes the type of access method used for retrieving rows from the table. In the case of “Orders,” it shows “index,” indicating that the DBMS is using an index to access rows from the “Orders” table. The term “eq_ref” in the “type” column of the EXPLAIN output refers to an “equi-join using a unique index.

possible_keys: This column lists the indexes that the DBMS could potentially use to optimize the query. For the “Orders” table, the index “CustomerID” is listed.

key: This column specifies the index actually chosen by the DBMS for the table access. For the “Orders” table, the same index “CustomerID” is used.

key_len: This column indicates the length of the index key being used. In this case, it’s “9,” which represents the number of bytes used for the index key.

ref: This column shows the columns or expressions that are used with the index to retrieve rows from the table. Here, it shows “development.Orders.CustomerID,” indicating that the “CustomerID” column from the “Orders” table is used with the index.

rows: This column estimates the number of rows that the DBMS expects to examine when executing the query. For the “Orders” table, it’s “4893.”

filtered: This column represents the percentage of rows that pass the filter conditions, based on the estimated number of rows. Here, it’s “100.0,” indicating that all rows are likely to pass the filter conditions.

Extra: This column provides additional information about the query execution. In this case, it shows “Using index” for both tables, indicating that the DBMS is able to satisfy the query using only the index without accessing the actual table data. This can be beneficial for performance.

Ways to optimize your SQL queries:

1. Use Indexes — The Hero of RDMS databases

Let’s talk about what are indexes, why to use indexes, how to use indexes, and when not to use indexes.
Remember, the phone book diary where names are always written in the sorted order of the alphabet, and you could jump on to whatever name by simply jumping on the page with the matching first alphabet. Similarly, indexes act as a roadmap, allowing the database management system (DBMS) to quickly locate rows based on the values of specific columns.
The primary objective of indexes is to make the query results faster, by improving sorting, and grouping. This also means, that over-use of indexes can make the other operations like INSERT, DELETE, and UPDATE slow, as the database management system not only adds the new rows but also updates the indexes to reflect the changes.
By default, when we add an index to any column. It creates a B-Tree Index on that column.

The B-tree simplifies the binary search tree by allowing nodes with more than two children.

2. Use EXISTS() instead of COUNT()

EXISTS() checks if a subquery returns any rows, making it more efficient than COUNT() which counts all matching rows.

3. Avoid Subqueries in WHERE Clause

Subqueries in the WHERE clause can be inefficient. It’s often better to use joins or other techniques to achieve the same result.

4. Use LIMIT to sample query results

LIMIT restricts the number of rows returned by a query, useful for sampling data or limiting the impact of expensive queries.

5. Avoid using OR in join queries

OR conditions in join queries can lead to inefficient query plans. It’s better to use multiple join conditions or other techniques to achieve the same result.

6. Avoid != or <> (not equal) operator

The != or <> operator can be less efficient than using other comparison operators. It’s better to use NOT IN or NOT EXISTS where applicable.

7. SELECT fields instead of using SELECT

Specifying specific fields in the SELECT statement reduces data transfer and can improve query performance.

8. Monitor and Tune

Regularly monitor query performance using database profiling tools and query execution plans. Identify slow-performing queries and analyze their execution plans to identify opportunities for optimization.

Conclusion:

In conclusion, understanding how SQL works and employing effective optimization techniques are essential for ensuring optimal performance in database-driven applications. By familiarizing yourself with the order of execution of SQL queries and implementing strategies such as indexing, minimizing joins, and optimizing predicates, you can improve query performance and enhance the overall efficiency of your database operations. Remember that query optimization is an ongoing process, and regular monitoring and tuning are essential for maintaining optimal performance as data volumes and usage patterns evolve.

--

--

AADI JAIN

A learner, who learn things and try to express my learning by writing it down. Trying to be a good engineer :)