Identifying & Optimizing Heavy Queries in PostgreSQL

4 min readJan 11, 2023

As a software engineer, one of the most critical aspects of building and maintaining a scalable and performant system is ensuring that the database runs efficiently. One common issue that can arise with databases, particularly in high-traffic systems, is the presence of heavy queries that slow down the entire system's performance.

This post will discuss how to identify and optimize heavy queries in PostgreSQL, a widely-used open-source relational database management system.

Identifying Heavy Queries

The first step in optimizing heavy queries is identifying which queries are causing the most performance issues.

PostgreSQL provides a few different ways to gather this information on query performance. One of the most commonly used tools is the pg_stat_statements extension, which tracks the execution time and the number of calls for each query.

To enable the pg_stat_statements extension, you can add the following line to the postgresql.conf file:

shared_preload_libraries = 'pg_stat_statements'

Then, you’ll need to restart the PostgreSQL service for the changes to take effect. Once the extension is enabled, you can use the following query to retrieve a list of the top slowest queries:

SELECT query, calls, total_time, rows,
       100.0 * total_time / sum(total_time) OVER () as percentage
FROM pg_stat_statements
ORDER BY total_time DESC;

This query will return a list of all queries that have been run, along with the number of calls, total execution time, the number of rows retrieved, and the percentage of the total time the query took.

Another tool for identifying heavy queries is the pg_stat_activity view, which shows the currently running queries and their progress. You can use the following query to view the currently running queries:

SELECT pid, age(query_start, clock_timestamp()), query
FROM pg_stat_activity
WHERE state = 'active';

This query will return the process ID, the time the query has been running, and the actual query text for all currently running queries.

And if you are an AWS user, AWS provides several ways to monitor the performance of your PostgreSQL cluster:

Amazon RDS Performance Insights: This performance monitoring feature is available for all Amazon RDS instances, including those running PostgreSQL. It allows you to view performance metrics, such as CPU and memory usage, and query performance metrics, such as the average execution time and the number of rows returned. You can also view the performance of individual queries and sort them by execution time, enabling you to identify the slowest-performing queries.
AWS CloudWatch: This monitoring service lets you view performance metrics and logs for all your AWS resources, including RDS instances. You can use CloudWatch to view metrics such as CPU and memory usage and query performance metrics, such as the average execution time and the number of rows returned. You can also create alarms to notify you when certain thresholds are met and access the logs to understand better what happened.
AWS RDS Enhanced Monitoring: This additional feature provides real-time performance metrics for your RDS instances. You can use it to view performance metrics such as CPU and memory usage and query performance metrics such as the average execution time and the number of rows returned.

By using the above AWS tools, you will be able to monitor the performance of your PostgreSQL cluster and identify slow-performing queries.

Optimizing Heavy Queries

Once you’ve identified the heavy queries that are causing performance issues, the next step is to optimize them. There are several strategies you can use to optimize queries, including:

Indexing: One of the most common issues with slow queries is the need for appropriate indexes. By adding indexes to frequently-searched columns, PostgreSQL can quickly locate the relevant data without scanning the entire table.
Partitioning: For large tables, partitioning can significantly improve query performance. Partitioning divides a table into smaller, more manageable pieces that can be searched separately, which reduces the amount of data that needs to be scanned.
Caching: By using a caching mechanism, you can decrease the load on the database server by storing query results in memory to avoid computing them repeatedly.
Normalization: If the table is not normalized, you may end up with duplicate data, and queries can become slow. Normalization helps to reduce data redundancy and improve the efficiency of queries.
Rewriting the queries: Another common cause of slow queries is poor query design. You can improve query performance by rewriting the query to use more efficient methods, such as changing a LIKE query to ILIKE.

In addition to the above strategies, you should also consider general performance tuning best practices like keeping your statistics up to date, keeping your database cluster healthy, and running the vacuum and analysis regularly.

Conclusion

In this post, we’ve discussed how to identify and optimize heavy queries in PostgreSQL. We saw how to use the pg_stat_statements extension and pg_stat_activity view to identify slow-performing queries. We covered several strategies for optimizing those queries, including indexing, partitioning, rewriting the query, caching, and normalization. By following these best practices and continuously monitoring query performance, you can ensure that your PostgreSQL database is running efficiently and supporting the performance and scalability of your overall system.

Cheers!

Identifying & Optimizing Heavy Queries in PostgreSQL

Identifying Heavy Queries

Optimizing Heavy Queries

Conclusion

Written by birkan atıcı