engineering grouparoo node.js typescript
2021-03-04 - Originally posted at https://www.grouparoo.com/blog/sql-dialect-differences
Like many applications, Grouparoo stores data in a relational database. Unlike most applications, Grouparoo works with 2 different types of databases - Postgres and SQLite. We enable our customers to run Grouparoo in a number of different ways - on their laptop with no external decencies, and as part of a large cluster with many servers processing data in parallel. When running Grouparoo locally, you can use SQLite so no other dependencies are needed, and in the production cluster, you can use a hosted version of Postgres provided by your hosting provider.
Grouparoo uses the Sequelize Object Relational Mapper, or
ORM, along with sequelize-typescript so we can work with the same Objects in our codebase, regardless of the database providing persistence. Sequelize does a great job of abstracting away the differences between the database types... most of the time. In this blog post, I’ll be sharing the times when the differences in the SQL implementations of Postgres and SQLite matter.
Postgres supports both the
iLike operators for comparing strings, with the
i indicating case-insensitive matching (Postgres Docs). That means you can choose, per query, if you are ignoring case or not:
However, in SQLite, all string comparisons are case-insensitive (and there is no
iLike function (SQLite Docs). Instead, if you really want your
like function to be made case-sensitive, you would use the
case_sensitive_like PRAGMA (SQLite Docs)... but that’s a database-wide change that you likely don’t want to use.
In the Grouparoo application, this distinction shows up in a number of places, with the most interesting being that we need to provide different rules that can be used to calculate Group membership. If you visit the groups config page and check out the options for string or email comparisons between Postgres and SQLite, you’ll see the difference.
Postgres ships with a number of handy date and time functions with a consistent API, like
date_trunc. (Postgres Docs) SQLite instead chose to rely on the C-like
strftime function (SQLite Docs). Both are popular ways to deal with time, but very different ways of approaches.
For example, if we want to count up how many events occurred per hour:
While not necessarily a user-facing problem, there are quite a few places in the Grouparoo codebase were we calculate rollups like these, and need to make different queries depending on the database in use.
Sequelize helps you to write rather complex queries in a database-agnostic way. Consider the following query that asks for all the types of events that exist, and returns the count, first occurrence and most recent occurrence. e.g.: we might learn that there have been 100
pageview events, with the first one on Jan 1 and the most recent one today. This Sequelize query works for both Postgres and SQLite!
However, the resulting objects differ slightly
types.min will be a JS
Date object from Postgres and a
string from SQLite. They will need to be converted to the same type in your application code.
SQLite does not have Boolean columns, and uses integers instead. When using an ORM that supports the boolean type, most of the time it knows to covert the database’s
false, but when accessing properties directly it may not. This appears regularly with Sequelize’s
instance.getDataValue() method. Conversely, Postgres boolean values are always properly cast.
SQLite can only handle one transaction at a time. This makes sense, as it’s quite literally reading and writing a file on disk. Postgres, on the other hand, can handle many transactions at once and does a great job of merging the results and avoiding deadlocks. If you using Node.JS like Grouparoo is, even a single process can generate many transactions - you might be processing many API requests in parallel, or in the case of Grouparoo, running many background tasks at once.
To help avoid SQLite deadlocks (which look like
SequelizeTimeoutError: SQLITE_BUSY: database is locked), we limit the number of workers we run against a SQLite database to 1.
Sequelize has a bug in which a migration against a table that has an index against 2 columns will make those columns unique, even if they wen’t before the migration. To mitigate this, we do not use compound indexes in the Grouparoo application.
While this list may seem long, the vast majority of the Grouparoo codebase works exactly the same regardless of if you are backing the application with SQLite or Postgres. The Sequelize team did a great job abstracting most of the dialect nuances away.