API with NestJS #17. Offset and keyset pagination with PostgreSQL and TypeORM
November 9, 2020 As our database grows, so do the results of our queries. Returning a lot of data in our API might not be the best approach performance-wise.
9.1k
By Kate Angelou
As our database grows, so do the results of our queries. Returning a lot of data in our API might not be the best approach performance-wise. Dividing our content into multiple pages and solutions like infinite scrolling have been around for quite some time. In this article, we explore ways of implementing pagination and point out their pros and cons.
You can find all of the code from this series in this repository.
Offset and Limit
Let’s start with the following, straightforward query:
SELECT*FROMpost
ORDERBYidASC
The above returns all of the records from the post table. To be sure about the order of the results, we sort them by id.
The first step in implementing pagination would be to limit the number of results. We can do that using the LIMIT statement.
SELECT*FROMpost
ORDERBYidASC
LIMIT10
Now, instead of getting all of the posts, we get just the first ten of them. This results in getting elements with ids from 1 to 10.
To have fully functional pagination, we need to specify the starting point of our query. To do that, we can use the OFFSET keyword. With it, we can say how many rows we want to skip.
SELECT*FROMpost
ORDERBYidASC
OFFSET10
LIMIT10
We omit the first ten posts with the above while still getting just ten posts as a result. This gives us elements with ids from 11 to 20.
If we would like to change the way we order elements while paginating, we need to modify our ORDER BY clause.
Implementing offset and limit with TypeORM
We want the users to provide the offset and the limit through query params. To implement this, let’s use the knowledge we’ve gained in previous parts of this series. This includes the usage of the class-validator and the class-transformer.
Implementing offset-based pagination is very easy with TypeORM. Aside from returning an array of posts, we also want to return a number of them. Thanks to that, our frontend can estimate the number of pages available.
Although we could use the postsRepository.count() and postsRepository.find() methods separately, this would result in making two queries to the database. We can improve that by using postsRepository.findAndCount.
In one of the previous parts of this series, we’ve integrated our posts with Elasticsearch. Fortunately, it is effortless to add the offset-based pagination to it. We need to pass the additional offset and size parameters.
The solution with offset and limit seems to be the most widely used. Unfortunately, its performance might fall short of our expectations.
An essential thing to keep in mind is that the database still needs to compute the rows skipped by the OFFSET. First, the database sorts all of the rows according to our ORDER BY clause. Then, Postgres drops the number of rows specified in the OFFSET. This might require quite a bit of work.
Aside from the performance, another important thing to consider is consistency. We want an element to appear in the results exactly once. Let’s imagine the following situation:
one user fetches page number one with posts
meanwhile, the second user creates a new post – after sorting, it ends up on page number one
the first user fetches the second page
The last element of the first page is now again seen on the second page because of the above. What’s even worse, the user missed the element that has been added to the first page.
Advantages
While the offset approach has its cons, it is still common. Due to its simplicity, it is straightforward to implement. Also, it is easy to change the column that we use for sorting, including the usage of multiple columns. Because of that, it is a viable solution in many cases. Especially if the offset is expected not to be big, and the result inconsistencies are acceptable.
Keyset pagination
While the offset-based pagination can be useful, its performance might not be the best. Sometimes we might want to avoid it.
One of the ways to do so is to implement keyset pagination. Instead of using the OFFSET clause, we use the WHERE command to select the data we haven’t fetched yet.
Let’s start with a simple query:
SELECT*FROM post
ORDER BY id ASC
LIMIT10
The above query gets us the first ten posts. Let’s assume that the id of the last post was 20. With this assumption, we can run this query:
SELECT*FROM post
WHERE id>20
ORDER BY id ASC
LIMIT10
The above query gets us ten posts with id bigger than 20. Now, we can take the last post and rerun the query, changing the id. Doing that creates us simple and efficient pagination mechanism.
This exposes the biggest drawback of the keyset pagination, though. To get a page, we need to know the last element of the previous set of results. This makes traversing multiple pages at once impossible.
Fortunately, most of the time, the users got straight to the next page. To cover all of the cases, we can implement both the offset-based approach and the keyset pagination.
If we would like to change the column that we order our elements by, we need to change both the ORDER BY and WHERE clauses.
Implementing keyset pagination with TypeORM
Adding keyset pagination is not difficult with TypeORM. First, let’s add another query parameter called startId to our PaginationParams.
Along the way, we will face a small issue with the count of our elements. The postsRepository.findAndCount with a WHERE clause will return only the number of matching posts. We need to count them separately.
We can also achieve the above result with Elasticsearch by adding the id of a post to our query.
In this very simple example, we separately count the matching posts. If you feel like using other pagination approaches due to performance reasons, Elasticsearch has other built-in methods of pagination.
The most apparent drawback of the keyset pagination is that we need to know the element that we want to start with. Fortunately, we can overcome it by mixing in some offset-base pagination as in the examples above.
Another consideration is that the column we use in the WHERE clause should have an index to experience an extra performance boost. Fortunately, in the documentation, we can see that Postgres creates an index for every primary key constraint automatically. Therefore, the keyset pagination should be fast with ids out of the box.
Also, ordering the results by text fields might be tricky if we want to implement natural sorting. If you want to read more about using the < operator with strings, read this answer on StackOverflow.
Advantages
One of the main advantages of keyset pagination is a performance improvement over the offset-based approach. Also, it solves the inconsistency issue that we experience with the offset-based approach. If the user adds or removes elements between fetching pages, it does not cause element duplicates or omissions.
Summary
In this article, we’ve implemented two types of pagination with PostgreSQL and TypeORM. We’ve pointed out the advantages and disadvantages of both the offset-based approach and the keyset pagination. While neither of them are ideal, they make a good combination that covers a variety of cases.
Since in this series we’ve also used Elasticsearch, we didn’t forget about it when implementing the pagination. While keyset pagination might not be a perfect fit with Elasticsearch, it also has other built-in methods of pagination.