Brandur Leach | CrunchyData Blog

Postgres 18: OLD and NEW Rows in the RETURNING Clause

Brandur.Leach@crunchydata.com (Brandur Leach) — Thu, 25 Sep 2025 11:00:00 EDT

Postgres 18 was released today. Well down page from headline features like async I/O and UUIDv7 support, we get this nice little improvement:

This release adds the capability to access both the previous (OLD) and current (NEW) values in the RETURNING clause for INSERT, UPDATE, DELETE and MERGE commands.

It's not a showstopper the way async I/O is, but it is one of those small features that's invaluable in the right situation.

A simple demonstration with UPDATE to get all old and new values:

UPDATE fruit
SET quantity = 300
WHERE item = 'Apples'
RETURNING OLD.*, NEW.*;

 id |  item  | quantity | id |  item  | quantity
----+--------+----------+----+--------+----------
  5 | Apples |      200 |  5 | Apples |      300
(1 row)

Detecting new rows with `OLD` on upsert

Say we're doing an upsert and want to differentiate between whether a row sent back by RETURNING was one that was newly inserted or an existing row that was updated. This was possible before, but relied on an unintuitive check on xmax = 0 (see the very last line below):

INSERT INTO webhook (
    id,
    data
) VALUES (
    @id,
    @data
)
ON CONFLICT (id)
    DO UPDATE SET id = webhook.id -- force upsert to return a row
RETURNING webhook.*,
    (xmax = 0) AS is_new;

The statement relies on xmax being set to zero for a fresh insert as an artifact of Postgres' locking implementation (see a full explanation for why this happens). It works, but isn't a guaranteed part of the API, and could conceivably change at any time.

In Postgres 18, we can reimplement the above so it's more legible and doesn't rely on implementation details. It's easy too -- just check whether OLD is null in the returning clause:

INSERT INTO webhook (
    id,
    data
) VALUES (
    @id,
    @data
)
ON CONFLICT (id)
    DO UPDATE SET id = webhook.id -- force upsert to return a row
RETURNING webhook.*,
    (OLD IS NULL)::boolean AS is_new;

Access to OLD and NEW will undoubtedly have many other useful cases, but this is one example that lets us improve pre-18 code right away.

Don't mock the database: Data fixtures are parallel safe, and plenty fast

Brandur.Leach@crunchydata.com (Brandur Leach) — Thu, 29 May 2025 09:00:00 EDT

The API powering our Crunchy Bridge product is written in Go, a language that provides a good compromise between productivity and speed. We're able to keep good forward momentum on getting new features out the door, while maintaining an expected latency of low double digits of milliseconds for most API endpoints.

A common pitfall for new projects in fast languages like Go is that their creators, experiencing a temporary DX sugar high of faster compile and runtime speeds than they've previously encountered in their career, become myopically focused on performance above anything else, and start making performance optimizations with bad cost/benefit tradeoffs.

The textbook example of this is the database mock. Here's a rough articulation of the bull case for this idea: CPUs are fast. Memory is fast. Disks are slow. Why should tests have to store data to a full relational database with all its associated bookkeeping when that could be swapped out for an ultra-fast, in-memory key/value store? Think of all the time that could be saved by skipping that pesky fsync, not having to update that plethora of indexes, and foregoing all that expensive WAL accounting. Database operations measured in hundreds of microseconds or even *gasp*, milliseconds, could plausibly be knocked down to 10s of microseconds instead.

Mock everything, test nothing

Anyone who's substantially journeyed down the path of database mocks will generally tell you that it leads nowhere good. They are fast (although disk speed has improved by orders of magnitude over the last decade), but every other one of their aspects leaves something to be desired.

A fatal flaw is that an in-memory mock bears no resemblance to a real database and the exhaustive constraints that real databases put on input data. Consider for example, whether a mock would fail like a database in any of these scenarios:

A value is inserted for a column that doesn't exist.
A value of the wrong data type for a column is inserted.
Duplicate values are inserted such that a UNIQUE constraint would not be satisfied.
A value is inserted in a foreign key column that doesn't exist in the reference table.
The conditions of a CHECK constraint aren't met.

The likelihood is that it wouldn't. The database mock would dumbly accept mocked test data that was completely invalid, and the code under test would melt down spectacularly once it hit production with errors like this one:

ERROR: insert or update on table "cluster" violates foreign key constraint "cluster_team_id_fkey"
    (SQLSTATE 23503)

And the trouble with mocks doesn't stop there:

There isn't a query engine to determine what mocked data should be returned so that has to be mocked. Sometimes that might work, but it could also just be hopelessly wrong, and there's no way to catch those errors except production.
```
expect_any_instance_of(Cluster).to receive(:where)
  .with("id IN (?, ?, ?)", 1, 2, 3)
  .and_return([cluster1, cluster2, cluster4])
```
From a human perspective, writing mock code (imagine having to write expect(...).to receive(...) chained across multiple objects) is laboriously, error prone, and slow! By comparison, inserting rows into and querying a database are faster and easier.
In languages like Go without dynamic typing, it's difficult to write a general purpose mocking framework, which often leaves it to the app developer to write and maintain their own internal mocking platforms making up interfaces and mock structs. The interfaces add a suboptimal layer of indirectness to code, making it harder to make good use of IDE features like jump-to-definition.

With the widespread use of mocks, you may have to consider that because so much of the stack under exercise is synthetic, you're really just testing that you got your mocks right rather than testing that code actually works.

Fixtures are fast enough

Hopefully this has done something to convince you that database mocks aren't an appropriate way for testing code intended for production, but even more relevant is that the entire premise behind their use is flawed!

The principle support database mocks starts from the notion that database access is unacceptably slow, and if that were ever true, it certainly isn't today. Oh my commodity laptop, inserting a reasonably complex object with over a dozen columns and multiple foreign keys and constraints takes about ~100µs. That's ten objects that'll fit in a millisecond, and using techniques like test transactions and ubiquitous use of t.Parallel() it's entirely parallelizable.

To hold up our large, mature app as an example, we have a little under 4,900 tests that run in ~23s uncached:

$ PLATFORM_RUN_ID=$(uuidgen) gotestsum ./... -- -count=1
✓  apiendpoint (235ms)
✓  apierror (370ms)
✓  apiexample (483ms)
...
✓  util/urlutil (1.058s)
✓  util/uuidutil (1.084s)
✓  validate (1.077s)

DONE 4876 tests, 4 skipped in 23.156s

We have strong conventions around the use of database fixtures in tests, which are exactly like inserting a normal record except they come with defaults which makes their use fast, easier, and more concise:

package dbfactory

type MultiFactorOpts struct {
    ID          *uuid.UUID              `validate:"-"`
    AccountID   uuid.UUID               `validate:"required"`
    ActivatedAt *time.Time              `validate:"-"`
    ExpiresAt   *time.Time              `validate:"-"`
    Kind        *dbsqlc.MultiFactorKind `validate:"-"`
}

func MultiFactor(ctx context.Context, t *testing.T, e db.Executor, opts *MultiFactorOpts) *dbsqlc.MultiFactor {
    t.Helper()

    validateOpts(t, opts)

    var (
        num          = nextNumSeq()
        numFormatted = formatNumSeq(num)
    )

    multiFactor, err := dbsqlc.New().MultiFactorInsert(ctx, e, dbsqlc.MultiFactorInsertParams{
        ID:          ptrutil.ValOrDefaultFunc(opts.ID, func() uuid.UUID { return ptesting.ULID(ctx).New() }),
        AccountID:   opts.AccountID,
        ActivatedAt: ptrutil.TimeSQLNull(opts.ActivatedAt),
        ExpiresAt:   ptrutil.TimeSQLNull(opts.ExpiresAt),
        Kind:        string(ptrutil.ValOrDefault(opts.Kind, dbsqlc.MultiFactorKindTOTP)),
        Name:        fmt.Sprintf("%s no. %s", ptrutil.ValOrDefault(opts.Kind, dbsqlc.MultiFactorKindTOTP), numFormatted),
    })
    require.NoError(t, err)

    return multiFactor
}

With constructs like Go's var ( ... ) block, they even look pretty when assembling long series of them in test cases:

func TestClusterServiceActionRestart(t *testing.T) {
    t.Parallel()

    setup := func(t *testing.T) (*testBundle, context.Context) {
        t.Helper()

        var (
            account = dbfactory.Account(ctx, t, tx, &dbfactory.AccountOpts{})
            team    = dbfactory.Team(ctx, t, tx, &dbfactory.TeamOpts{})
            _       = dbfactory.AccessGroupAccount_Admin(ctx, t, tx, team.ID, account.ID)
            cluster = dbfactory.Cluster(ctx, t, tx, &dbfactory.ClusterOpts{TeamID: team.ID})
        )

I wrote a plugin to measure how many test fixtures are generated during the course of a complete run of the test suite, and found the number to be a little north of 18,000:

=# select * from test_stat;
                  id                  |          created_at           | num_fixtures
--------------------------------------+-------------------------------+--------------
 9E06C8B9-EA6E-490F-A0D3-1A18310376CF | 2025-05-28 07:42:49.500298-07 |        18132

An imperfect calculation would suggest we're generating 18k fixtures / 23 seconds = 780 fixtures/s. This doesn't account at all for tests that don't need database access or non-fixture database operations, so we're really averaging more like a few thousand database operations per second of testing.

Summary: Fast fixtures, total parallelization, good constraints

To sum it up, here's how to design a test suite that's fast and thorough:

Don't mock databases. A little extra speed isn't worth the dramatic reduction in test fidelity.
Make database use in tests easy with a fixture framework that does most of the work for you. It can even be homegrown (ours is) as long as it's easy to use and establishes strong convention.
Make up for any lost speed by using techniques like test transactions to maximize parallel throughput. Databases are built to accommodate this.
With database mocks in the rear view mirror, take advantage of all the nice constraints RDBMSes offer like strongly defined schema, data types, check constraints, and foreign keys. Each of these features that catches a mistake during tests is one less bug to fix in production.

Real World Performance Gains With Postgres 17 B-tree Bulk Scans

Brandur.Leach@crunchydata.com (Brandur Leach) — Mon, 23 Sep 2024 10:15:00 EDT

With RC1 freshly cut, the release of Postgres 17 is right on the horizon, giving us a host of features, improvements, and optimizations to look forward to.

As a backend developer, one in particular pops off the page, distinguishing itself amongst the dozens of new release items:

Allow btree indexes to more efficiently find a set of values, such as those supplied by IN clauses using constants (Peter Geoghegan, Matthias van de Meent)

The B-tree is Postgres' overwhelmingly most common and best optimized index, used for lookups on a table's primary key or secondary indexes, and undoubtedly powering all kinds of applications all over the world, many of which we interact with on a daily basis.

During lookups, a B-tree is scanned, with Postgres descending down through its hierarchy from the root until it finds a target value on one of its leaf pages. Previously, multi-value lookups like id IN (1, 2, 3) or id = any(1, 2, 3) would require that process be repeated multiple times, once for each of the requested values. Although not perfectly efficient, it wasn't a huge problem because B-tree lookups are very fast. It'd take an extremely performance sensitive user to even notice the deficiency.

As of a Postgres 17 enhancement to nbtree's ScalaryArrayOp execution, that's no longer always the case. Any particular scan with multiple scalar inputs will consider all those inputs as it's traversing a B-tree, and where multiple values land on the same leaf page, they're retrieved together to avoid repetitive traversals.

A narrowly focused script to demonstrate the original problem shows a dramatic performance increase before and after ScalaryArrayOp improvement, so we already know the gains are very real. With Postgres 17 so close to hand, we wanted to try to measure what kind of gain a realistic web app might expect from the optimization by testing it against the real API service that powers Crunchy Bridge.

In our experiment we saw roughly a 30% improvement in throughput 20% drop in average request time -- promising to say the least. Read on for details.

List endpoints and eager loading

The API is a production-grade (i.e. has bells and whistles like auth, telemetry, and defensive hardening) program written in Go. I chose its GET /teams/:id/members endpoint (team member list) as a test dummy because it's a good middle ground between performance and sophistication. Substantial enough to be able to benefit from the index improvements, but simple enough as to stay easy to understand.

It returns a list of team member API resources:

// A team member.
type TeamMember struct {
    apiresource.APIResourceBase

    // Primary ID of the team member record.
    ID eid.EID `json:"id" validate:"-"`

    // Properties of the account associated with the team member.
    Account *Account `json:"account" validate:"-"`

    // The role assigned to the team member.
    Role dbsqlc.TeamMemberRole `json:"role" validate:"required,teammemberrole"`
}

// Account information nested in a team member.
type Account struct {
    // Primary ID of the account.
    ID eid.EID `json:"id" validate:"-"`

    // Email associated with the account.
    Email string `json:"email" validate:"required,email,apistring200"`

    // Indicates that the account has a password set, as opposed to being
    // SSO-only with no usable password. It's possible for an account to have
    // both a password and a federated identity through a provider like Google
    // or Microsoft.
    HasPassword *bool `json:"has_password" validate:"required"`

    // Indicates that the account has a federated identity for single sign-on
    // through an identity provider like Google or Microsoft.
    HasSSO *bool `json:"has_sso" validate:"required"`

    // Whether the account has at least one activated multi-factor source.
    MultiFactorEnabled *bool `json:"multi_factor_enabled" validate:"required"`

    // Full name associated with the account.
    Name string `json:"name" validate:"required,apistring200"`
}

The team member itself is minimal, containing only ID and role as properties of its own, but embedding an account with detail on the user that's linked to the team member (a team member in this instance can be thought of as a join table between an account and a team).

An account has obvious properties like an email and name, but also a few less common ones like has_password and multi_factor_enabled which are used while rendering a list of team members in the UI to show badges next to each person for security features like "SSO-only (password-less) account" or "multi-factor enabled", thereby letting an admin vet the security posture of everyone on their team, giving them the information they need to reach out to team members who for example don't have MFA enabled.

These specifics aren't important, but demonstrative of a common pattern in which multiple database records are needed to render a final product. Team members and accounts are backed directly by their own database models, but although they're booleans, has_password and multi_factor_enabled need to load federated identity and multi-factor credential records for the associated account.

The simplest possible version of loading a page of team members and rendering API resources is roughly:

fetch_team_member_page().map do |team_member|
  account = fetch_account(team_member.account_id)
  render_team_member(team_member,
    account: render_account(account))

Our version looks more like:

team_members = fetch_team_member_page()
bundle = fetch_load_bundle(team_members)
team_members.map do |team_member|
  render_team_member(bundle, team_member,
    account: render_account(bundle, account))

The key difference is that there's no data loaded (e.g. fetch_account) in the loop. Instead, we use Two-phase Load and Render, a technique where all the database records needed to render a set of API resources are loaded in bulk on a single pass, making N+1s difficult to write.

Bulk query patterns

Fetching a page of team members looks exactly how you'd expect (queries have been a simplified for brevity):

SELECT * FROM team_member WHERE team_id = <team_id>;

A set of account IDs is extracted from the result, and a couple more lookups made with them:

Select account records for each account ID extracted from team members:

SELECT * FROM account WHERE id = any(<account1>, <account2>, ...);

Fetch federated identities for the accounts so that we can populate properties like has_sso.

SELECT * FROM account_federated_identity WHERE id = any(<account1>, <account2>, ...);

And likewise, multi factors for setting a value to multi_factor_enabled.

SELECT * FROM multi_factor WHERE id = any(<account1>, <account2>, ...);

It's okay if the details specific to our app are a little fuzzy, but notice broadly that:

Like any web app or API, a number of different database models are interwoven to render the final product. We're using only four on this endpoint, but for a complex app, rendering even one page might require the use of hundreds of different models.
Lookups make heavy use of id = any(...), where the set being queried might be fairly large. Our API's default page size is 100, so given a full page each any(...) contains 100 account IDs.

And while we're using two-phase load and render, eager loading like found in frameworks like Ruby on Rails will generate similar query patterns.

Inducing load

We'll use the excellent go-wrk to benchmark the API, making sure to do so over a sustained period (60 seconds) to compensate for a cold start and caching.

In a typical web app it's common for database calls to make up the lion's share of the time spent servicing a request, and that's true of our team member list endpoint, but there's a reasonable amount of non-database work happening too. The incoming request is parsed, sent through a middleware stack, its auth checked, telemetry/logging emitted, a response serialized, and so on.

We've left in this extra overhead on purpose. It's possible to demonstrate extreme performance benefits using large quantities of synthetic data combined with carefully crafted queries, but we're trying to demonstrate how the index lookup improvements will benefit a realistic use case.

In pursuit of having a reasonable set of data to test with, I generated a team with 100 (our default page size) team members/accounts along with associated records like federated identities and activated multi factors.

Benchmarked with Postgres 16:

$ go-wrk -d 60 -H 'Authorization: Bearer cbkey_dbGR3HgJkeFyJ8VUXAXeQHlnb5gIlZdoNYoNI51jmCVH6V' -M GET http://localhost:5222/teams/matjsvug6vb7javsjsugxbjtiy/members
Running 60s test @ http://localhost:5222/teams/matjsvug6vb7javsjsugxbjtiy/members
  10 goroutine(s) running concurrently
74272 requests in 59.977486758s, 2.54GB read
Requests/sec:           1238.33
Transfer/sec:           43.35MB
Overall Requests/sec:   1237.71
Overall Transfer/sec:   43.33MB
Fastest Request:        2.427ms
Avg Req Time:           8.074ms
Slowest Request:        147.039ms
Number of Errors:       0
10%:                    2.841ms
50%:                    3.105ms
75%:                    3.206ms
99%:                    3.283ms
99.9%:                  3.285ms
99.9999%:               3.285ms
99.99999%:              3.285ms
stddev:                 3.934ms

And on Postgres 17:

$ go-wrk -d 60 -H 'Authorization: Bearer cbkey_4SgqjRk3B9lcZp7sIb8vWZiJQRtT2MUr4cn7SBapnC2tTX' -M GET http://localhost:5222/teams/matjsvug6vb7javsjsugxbjtiy/members
Running 60s test @ http://localhost:5222/teams/matjsvug6vb7javsjsugxbjtiy/members
  10 goroutine(s) running concurrently
94484 requests in 59.978741362s, 3.23GB read
Requests/sec:           1575.29
Transfer/sec:           55.14MB
Overall Requests/sec:   1574.54
Overall Transfer/sec:   55.12MB
Fastest Request:        1.943ms
Avg Req Time:           6.347ms
Slowest Request:        97.279ms
Number of Errors:       0
10%:                    2.424ms
50%:                    2.713ms
75%:                    2.806ms
99%:                    2.877ms
99.9%:                  2.879ms
99.9999%:               2.88ms
99.99999%:              2.88ms
stddev:                 2.441ms

Highlights in graph form:

The jump from Postgres 16 to 17 shows a ~30% improvement in throughput (1,238 RPS to 1,575 RPS) and 20% drop (8 ms to 6.3 ms) in average request time. That's not full multiples like a synthetic benchmark would produce, but for a real world application, a 20% across-the-board drop in request time is a big deal. There are many, many developers over the years, including this author, who've spent many more hours on optimizations that yielded far less.

The design and implementation of our Go-based API is admittedly pretty bespoke, but I'd expect to see gains not too far off this in applications making heavy use of eager loading in frameworks like Rails.

The flagship features of any new release tend to get the most glory, but these sorts of invisible but highly impactful optimizations are just as good. There's nothing quite like the satisfaction of seeing your entire stack get faster from just pushing an upgrade button!