Greg Smith | CrunchyData Blog

Loading the World! OpenStreetMap Import In Under 4 Hours

Greg.Smith@crunchydata.com (Greg Smith) — Tue, 19 Nov 2024 09:30:00 EST

The OpenStreetMap (OSM) database builds almost 750GB of location data from a single file download. OSM notoriously takes a full day to run. A fresh open street map load involves both a massive write process and large index builds. It is a great performance stress-test bulk load for any Postgres system. I use it to stress the latest PostgreSQL versions and state-of-the-art hardware. The stress test validates new tuning tricks and identifies performance regressions.

Two years ago, I presented (video / slides) at PostGIS Day on challenges of this workload. In honor of this week’s PostGIS Day 2024, I’ve run the same benchmark on Postgres 17 and the very latest hardware. The findings:

PostgreSQL keeps getting better! Core improvements sped up index building in particular.
The osm2pgsql loader got better too! New takes on indexing speed things up.
Hardware keeps getting better! It has been two years since my last report and the state-of-the-art has advanced.

Tune Your Instrument

First, we are using bare metal hardware—a server with 128GB RAM—so so let’s tune Postgres for loading and to match that server:

max_wal_size = 256GB
shared_buffers = 48GB
effective_cache_size = 64GB
maintenance_work_mem = 20GB
work_mem = 1GB

Second, let’s prioritize bulk load. The following settings do not make sense for a live system under read/write load, but they will improve performance for this bulk load scenario:

checkpoint_timeout = 60min
synchronous_commit = off
# if you don't have replication:
wal_level = minimal
max_wal_senders = 0
# if you believe my testing these make things
# faster too
fsync = off
autovacuum = off
full_page_writes = off

It’s also possible to tweak the background writer for the particular case of massive data ingestion, but for bulk loads without concurrency it doesn’t make a large difference.

How PostgreSQL has Improved

In 2022, testing that year's new AMD AM5 hardware loaded the data in just under 8 hours with Postgres 14. Today the amount of data in the OSM Planet files has grown another 14%. Testing with Postgres 17 still halves the load time, with the biggest drops coming from software improvements in the PG14-16 time-frame.

The benchmark orchestration and metrics framework here is my pgbench-tools. Full hardware details are published to GeekBench.

GIST Index Building in PostgreSQL 15

The biggest PostgreSQL speed gains are from improvements in the GIST index building code.

The new code pre-sorts index pages before merging them, and for large GIST index builds the performance speed-up can be substantial, as reported by the author of osm2pgsql.

My tests showed going from PostgreSQL 14 to 15 delivered:

16% speedup
15% size reduction
86% GIST index build speedup!

There have been further improvements in PostgreSQL 16 and 17 in B-Tree index building, but this osm2pgsql benchmark does not really show them. The GIST index time build times wash out the other index builds.

How osm2pgsql has improved

In Q3 2022, osm2pgsql 1.7 made a technique called the Middle Way Node Index ID Shift the new default.

Middle Way Node Index ID Shift is a clever design approach that compresses the database's largest index, trading off lookup and update performance for a smaller footprint. It uses a Partial Index to merge nearby values together into less fine grained sections. When an index is used frequently, this would waste too many CPU cycles. Similar to hash bucket collision, partial indexes have to constantly exclude non-matched items. That chews through extra CPU on every read. In addition, because individual blocks hold so many more values, the locking footprint for updates increases proportionately. However, for large but infrequently used indexes like this one, those are satisfactory trade-offs.

Applying that improvement dropped my loading times by 37% and plummeted the database size from 1000GB to under 650GB. Total time at the terabyte size had crept upward to near 10 hours. The speed-up drove it back below 6 hours.

The osm2pgsql manual shows the details in its Update for Expert Users. I highly recommend that section and its Improving the middle blog entry. It's a great study of how PG's skinnable indexing system lets applications optimize for their exact workload.

How hardware has improved

SSD Write Speed

During data import, the osm2pgsql workload writes heavily at medium queue depths for hours. The best results come from SSDs with oversized SLC caches that also balance cleanup compaction of that cache. The later CREATE TABLE AS (CTAS) sections of the build reach its peak read/write speeds.

I saw 11GB/s from a Crucial T705 PCIe 5.0 drive the week (foreshadowing!) I was running that with an Intel i9-14900K:

osm2pgsql has a tuning parameter named --number-processes that guides how many parallel operations the code tries to spawn.

For the server and memory I used in this benchmark, increasing--number-processesfrom my earlier 2 to 5 worked well. However, be careful: you can easily go too far! Bumping up this parameter increases memory usage too. Going wild on the concurrent work will run you out of memory and put you into the hands of the Linux Out of Memory (OOM) killer.

Processor advances

Obviously, every year processors get a little better, but they do so in different ways and at different rates.

For later 2023 and testing against PostgreSQL 15 and 16, an Intel i7-13600K overtook the earlier AMD R5 7700X. There was another small bump in 2024 upgrading to an i9-14900K.

But this is a demanding regression test workload, and it only took a few weeks of running the OSM workload to trigger the i9-14900K’s voltage bugs to the point where my damaged CPU could not even finish the test.

Thankfully I was able to step away from those issues when AMD's 9600X launched. Here's the latest results from PG17 on an AMD 9600X, with the same SK41 2TB drive as I tested in 2022 for my PostGIS Day talk.

My best OSM import results to date

2024-10-15 10:03:41  [00] Reading input files done in 7851s (2h 10m 51s).
2024-10-15 10:03:41  [00]   Processed 9335778934 nodes in 490s (8m 10s) - 19053k/s
2024-10-15 10:03:41  [00]   Processed 1044011263 ways in 4301s (1h 11m 41s) - 243k/s
2024-10-15 10:03:41  [00]   Processed 12435485 relations in 3060s (51m 0s) - 4k/s
2024-10-15 10:03:41  [00] Overall memory usage: peak=158292MByte current=157746MByte...
2024-10-15 11:32:13  [00] osm2pgsql took 13162s (3h 39m 22s) overall. f

Completed in less than 4 hours!

PostgreSQL 17 is about 3% better on this benchmark than PostgreSQL 16 when replication is used, thanks to improvements in the WAL infrastructure in PostgreSQL 17.

I look forward to following up on this benchmark in more detail, after my scorched Intel system is fully running again! Like the speed of the Postgres ecosystem, the pile of hardware I've benchmarked to death grows every year.

PostgreSQL on Linux: Counting Committed Memory

Greg.Smith@crunchydata.com (Greg Smith) — Fri, 11 Jun 2021 05:00:00 EDT

By default Linux uses a controversial (for databases) memory extension feature called overcommit. How that interacts with PostgreSQL is covered in the Managing Kernel Resources section of the PG manual.

Overcommit allows clients to pre-allocate virtual memory beyond even server RAM. They are only nailed down to a real allocation, committed to use its terminology, when it's actually used. This lets applications have a flatter memory model without having to grapple with virtual memory coding. This model improves how effectively swap can work as well.

If you upgraded PostgreSQL or increased your server's shared_buffers setting recently, you may find a larger chunk of memory is now listed in Linux's "Committed" section that wasn't noticeable before. Let's walk through enough of this area to interpret the associated system memory metrics.

Shared memory history

In PostgreSQL versions up to 9.2, the shared memory block needed to run the server was allocated directly as UNIX System V shared memory. Documentation from that era gave an estimate of memory needed in that block. The 9.2 Kernel Resources has it in Table 17-2 "PostgreSQL Shared Memory Usage".

Starting in PostgreSQL 9.3, "PostgreSQL normally allocates a very small amount of System V shared memory, as well as a much larger amount of POSIX (mmap) shared memory", quoting the 10.0 Kernel Resources. The system then commits the shared_buffers memory to pin them down and initialize. That's why the shared/committed balance of newer Postgres servers will look very different from older versions. The memory use formula numbers were made largely obsolete by this change, and that table was impossible to maintain well in the documentation anyway. That's why the level of detail was reduced when switching to the new mmap allocation style.

PG10 example

This example uses the PostgreSQL 10 included with Ubuntu 18.04; you can use any Linux distribution albeit with different service control scripts. Start with the server down (more on the right syntax below) and look at the memory use:

$ service postgresql stop
$ cat /proc/meminfo | grep Commit
CommitLimit:    10252072 kB
Committed_AS:     806928 kB

On this 16GB RAM server, that gives CommitLimit=10252072kB 10GB. Currently locked down, committed RAM Committed_AS=806928kB 800MB. This is memory dedicated to the core Linux operating system and its utilities. You might conclude that this OS as configured requires at least 1GB to run at all, which is accurate.

On this server, starting the database correctly means I have to drop back to my user account to use sudo. You can easily give those powers to the postgres Linux account instead, it's just not necessary on my test system. The proper systemd call to stop and start the database on this server uses systemctl. Here are some alternate forms of startup lines you might need to use instead:

gsmith@hydra:~$ sudo systemctl start postgresql@10-main
postgres@hydra:~$ service postgresql start
postgres@hydra:~$ pg_ctlcluster 10 main start # Debian/Ubuntu, PG10

Confirm the database just restarted:

postgres@hydra:~$ ps -eaf | grep postgres
postgres  8022     1  0 06:40 ?        00:00:00 /usr/lib/postgresql/10/bin/postgres -D /var/lib/postgresql/10/main -c config_file=/etc/postgresql/10/main/postgresql.conf
postgres  8024  8022  0 06:40 ?        00:00:00 postgres: 10/main: checkpointer process
postgres  8025  8022  0 06:40 ?        00:00:00 postgres: 10/main: writer process
postgres  8026  8022  0 06:40 ?        00:00:00 postgres: 10/main: wal writer process
postgres  8027  8022  0 06:40 ?        00:00:00 postgres: 10/main: autovacuum launcher process
postgres  8028  8022  0 06:40 ?        00:00:00 postgres: 10/main: stats collector process
postgres  8029  8022  0 06:40 ?        00:00:00 postgres: 10/main: bgworker: logical replication launcher
postgres@hydra:~$ date
Sat May  1 06:42:45 EDT 2021

And check the biggest user of committed memory, shared_buffers:

postgres@hydra:~$ psql -c "show shared_buffers"
 shared_buffers
----------------
 4GB

Now let's look at memory again:

postgres@hydra:~$  cat /proc/meminfo | grep Commit
CommitLimit:    10252072 kB
Committed_AS:    5115160 kB

Committed_AS jumped to 5115160 kB4.9GB. Since it was 800MB before, that means the database server committed a new 4308232kb4.1GB on startup. That's the shared memory block, which includes shared_buffers plus some overhead for clients and other shared state.

Digging into the memory

You can see more about where the memory is going when using the pmap utility. While most of the bytes are shared_buffers, the bulk of the text output is linking to various shared libraries. Here's a grep command that screens most of the trivia out:

postgres@hydra:~$ pmap -x 8022 | egrep -v "anon|lib|ld|locale"
Address           Kbytes     RSS   Dirty Mode  Mapping
00005637cb721000    7012    3492       0 r-x-- postgres
00005637cb721000       0       0       0 r-x-- postgres
00005637cbffa000     136     136     136 r---- postgres
00005637cbffa000       0       0       0 r---- postgres
00005637cc01c000      52      52      52 rw--- postgres
00005637cc01c000       0       0       0 rw--- postgres
00007f919cb75000 4317408  108240  108240 rw-s- zero (deleted)
00007f919cb75000       0       0       0 rw-s- zero (deleted)
00007f92ae841000       8       4       4 rw-s- PostgreSQL.158420325
00007f92ae841000       0       0       0 rw-s- PostgreSQL.158420325
00007f92ae843000       4       4       4 rw-s-   [ shmid=0x48000 ]
00007f92ae843000       0       0       0 rw-s-   [ shmid=0x48000 ]
00007ffec9539000     132      32      32 rw---   [ stack ]
00007ffec9539000       0       0       0 rw---   [ stack ]
---------------- ------- ------- -------
total kB         4492416  124396  110252

The key block is obviously this one:

00007f919cb75000 4317408  108240  108240 rw-s- zero (deleted)

That shows 4317408k is the zeroed out buffer space holding shared_buffers, while 108240k is nailed down using an old SysV resident memory allocation (RSS). That RSS chunk is the overhead Postgres needs to run, things similar to what the old documentation put into the "Shared Memory Usage" table.

Most people find this information easier to track on a hot server using the top command. For Postgres top -c is recommended because it will decode what all the database processes are doing. top output from this server shows the big virtual memory block in the VIRT column:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 8025 postgres  20   0 4492412  36020  33992 S   0.0  0.2   0:00.15 postgres: 10/main: writer process
 8026 postgres  20   0 4492412  21412  19388 S   0.0  0.1   0:00.42 postgres: 10/main: wal writer process
 8028 postgres  20   0  175124   4396   2260 S   0.0  0.0   0:00.26 postgres: 10/main: stats collector process
 8024 postgres  20   0 4493760  62308  59052 S   0.0  0.4   0:00.65 postgres: 10/main: checkpointer process
 8029 postgres  20   0 4492724   4984   2840 S   0.0  0.0   0:00.00 postgres: 10/main: bgworker: logical replication launcher
 8027 postgres  20   0 4492816   6800   4552 S   0.0  0.0   0:00.14 postgres: 10/main: autovacuum launcher process

Dealing with shared memory on modern PostgreSQL and Linux versions is far improved from the old days when you had to endlessly tweak kernel parameters just to make a database run. There is still another level of work to support Huge Pages, which I'll demonstrate next time.

PostgreSQL 13 Benchmark: Memory Speed vs. TPS

Greg.Smith@crunchydata.com (Greg Smith) — Wed, 30 Dec 2020 04:00:00 EST

Some people are obsessed with sports or cars. I follow computer hardware. The PC industry has overclocking instead of nitrous, plexi cases instead of chrome, and RGB lighting as its spinning wheels.

The core challenge I enjoy is cascading small improvements to see if I can move a bottleneck. The individual improvements are often just a few percent. Percentage gains can compound as you chain them together.

Today I'm changing the memory speed on my main test system, going from 2133MHz to 3200MHz, and measuring how that impacts PostgreSQL SELECT results. I'm seeing a 3% gain on this server, but as always with databases that's only on a narrow set of in-memory use cases. Preview:

Why more benchmarks?

The industry around PC gaming has countless performance tests at the micro and macro level. A question I took on this year is how to take some useful metrics or test approaches from PC benchmarking and apply them even to virtual database instances.

There was a big constraint: I could only use SQL. As much as I enjoy the tinkering side of real hardware, a lot of the customers we support at Crunchy Data are provided a virtual database instance instead of a dedicated server. For PostgreSQL, I call these "Port 5432" installs, because the only access to the server is a connection to the database's standard port number. Disk seek test? You can't run fio or iozone on port 5432. Memory speed? There's no STREAM or Aida on 5432. You can tunnel system calls through the PostgreSQL's many server-side languages. That only goes so far when the software in each database container is shrunk to a minimum viable installation.

The improvements I've put into pgbench-tools this year let me chug through an entire grid of client/size workloads, and my last blog went over upgrading to PostgreSQL 13 on this AMD Ryzen 9 3950X server. Part of what I'm doing here today is proving to myself the toolkit is good enough to measure a small gain, given that pgbench itself is not the most consistent benchmark.

Memory tweaking theory

On a lot of server installs tuning memory is something only the hardware vendor ever does. To respect that my initial PG13 comparisons left memory at its platform default speed: 2133MHz. The memory I'm using, G.SKILL F4-3600C19-16GVRB, can in theory run at 3600MHz.

Most desktop class motherboards have 4 RAM slots and run fastest when only two are used. Effectively this G.SKILL pair can bond into a dual-channel at 3600MHz. But the minute I try to fill all four slots, that speed is impossible. Performance doesn't scale up to quad channels; instead you get dual channels that are each split across two DIMMs. Juggling that adds just enough latency that the motherboard and CPU can't run at the maximum dual-channel speed anymore. Fully loaded with RAM, the best I can do on this hardware is running memory at 3200MHz. There's a similar but even worse trade-off buying big servers, as the buffering needed to handle very large amounts of RAM adds enough latency to pull down single core results.

Pounding a server with pgbench generates enough heat and random conditions that I rejected outright overclocking some years ago. I once lost my entire winter holiday chasing a once per day PG crash on my benchmark server, all from the CPU overheating just enough to flip one bit.

Quantifying speed improvements

The graph above shows an even increase in speed across all the sizes tested (up to 256GB=4X RAM), which is a nice start. Looking instead at the client count gives a different pattern:

There is a clear trend that high client counts are getting more of a boost from the faster memory than low ones. That is exactly what you'd expect and hope for. More clients means more pressure to move memory around, and anything you can do to accelerate that helps proportionately.

pgbench-tools puts all the results in a database, so I can write simple SQL to analyze the workload grid:

SELECT set,clients,ROUND(AVG(tps)) FROM test_stats WHERE set>10
GROUP BY set,clients ORDER BY set,clients
\crosstabview

clients	1	2	4	8	16	32	64	128
2133	8354	16508	34366	64859	106305	165437	196002	219186
3000	8293	16574	34267	65612	107867	169757	202697	231303
3200	8459	16954	35221	67392	109688	170635	203348	232132

To make these examples cleaner, in the first column I replaced the set identifier number with the actual speed in MHz. The 128 client results are notably better. At 1 client the run to run variation noise was bigger than the regression, showing the bizarre result that 3000MHz memory worked slower than 2133MHz. I can make problems like that go away by running a lot more tests until the averages settle down; that didn't seem necessary here. I have a follow-up article coming where I dig into single client speeds more carefully.

I also like to look at the maximum rate any test runs at. Averages can hide changes to a distribution . You can't fake legitimately running faster than ever before. Considering only the best out of the runs that fit into each summary cell, which normally is the scale=100 1.6GB result, gives:

SELECT set,clients,max(tps) FROM test_stats WHERE set>10
GROUP BY set,clients ORDER BY set,clients
\crosstabview

clients	1	2	4	8	16	32	64	128
2133	22319	41535	81216	149678	233058	377269	347553	367855
3000	22361	42217	83650	153980	234985	379450	348910	369827
3200	23213	42806	84694	157181	237486	386854	352344	375003

Re-scaling to percentages and eliminating the 3000MHz middle step:

2133-3200	1	2	4	8	16	32	64	128	Median
Avg TPS	+1.3%	+2.7%	+2.5%	+3.9%	+3.2%	+3.1%	+3.7%	+5.9%	+3.2%
Max TPS	+4.0%	+3.1%	+4.3%	+5.0%	+1.9%	+2.5%	+1.4%	+1.9%	+2.8%

Since increasing memory speed gives a 2.8-3.2% gain overall depending on how you slice the results, I'm happy to call that a solid 3% gain across the grid. Light client counts gain the least, with a low at 1 client of only a 1.3% average gain. When overloaded with a full 128 clients, average throughput increased by up to 5.9%.

If you'd like to read another perspective on this topic, Puget Systems has a nice article on CPU Performance: AMD Ryzen 9 3950X. They find a similarly sized gain to what I measured here, and their commentary about larger memory capacity is in line with my comments above.

PostgreSQL Benchmarks: Apple ARM M1 MacBook Pro 2020

Greg.Smith@crunchydata.com (Greg Smith) — Fri, 20 Nov 2020 04:00:00 EST

This week Apple started delivering Macs using their own Apple Silicon chips, starting with a Mac SOC named the M1. M1 uses the ARM instruction set and claims some amazing acceleration for media workloads. I wanted to know how it would do running PostgreSQL, an app that's been running on various ARM systems for years. The results are great!

The OSS community around the homebrew project already qualified their PostgreSQL package as working on M1, and with some recompiling work that all worked as expected:

$ /opt/homebrew/bin/psql -c "select version()"
PostgreSQL 13.0 on arm-apple-darwin20.1.0, compiled by
Apple clang version 12.0.0 (clang-1200.0.32.28), 64-bit

I need some additional software for my benchmark toolkit, and the only compile problems I saw were Qt and Python's numpy; those two seemed straightforward to fix once someone gets to them.

My last blog entry introduced my basic method of using pgbench-tools to look at past MacBook Pro models. I said there Apple needed to exceed "15K TPS single/60K TPS all core" on PostgreSQL to fully embarrass Intel. Well, they outperformed expectations:

32K single/92K all core is so fast for a laptop, I need to pull in some other hardware to put it into perspective. Here's a data table for all the results behind the graph, plus adding two generations of AMD's Ryzen desktop hardware:

server	1	2	4	8	16	32
2011 16GB MacBookPro8,2	7252	14644	20471	30749	32894	32647
2012 16GB MacBookPro9,1	7781	15861	22380	34743	38294	36754
2015 16GB MacBookPro11,4	9770	17795	22372	38341	45048	43497
2017 16GB Intel NUC7i3BNB	12789	20870	33649	31053	32029	32409
2019 16GB MacBookPro16,1	14353	27588	43784	45089	61603	58705
2019 64GB MacBookPro16,1	14105	28733	46836	61167	62083	69101
2019 16GB Intel NUC10i5FNB	15444	27496	43341	70015	61927	62584
2020 8GB MacBookPro17,1	32198	52828	96536	97042	95130	92663
2018 64GB Ryzen 7 2700X	11624	22153	41648	69399	138431	123466
2019 64GB Ryzen 9 3950X	37768	69162	133943	206684	258722	306185

This graph is amazing to me:

Of course Intel has Xeon processors that have pushed single core performance higher than these laptop-oriented Intel results. But look at that big cluster below 5 clients, showing how long they've been stuck in the same performance range when power and heat is limited. I mentioned last time Intel had only doubled performance in the 8 years of MacBook models I looked at, which is not industry leading performance.

AMD has been doing a lot better, getting their single core boost competitive in their 3000 series. Even last year's 3950X with its mandatory water cooling is barely faster than the M1 until you hit 8 clients.

If Apple can push the M1 design into larger amounts of memory and add a few more cores, it could be a fierce midsize server competitor. That's not going to disrupt the big industry push toward hosting things on giant cloud systems, where data centers want >=48 processors for a server to be worth installing. There are cloud scale ARM servers out there, and Apple's ARM instruction set Macs make developing for that platform easier. I'm looking forward to the competition of a four way race between Intel, AMD, Apple, and the other ARM designers.

The M1 is a great step forward for developers who can take advantage of it. Let's hope the obvious virtualization issues are sorted out in the near future. A lot of developers need tools like Docker and VMs to build modern cloud software. Until that area is sorted out, the M1 Macs aren't suitable for everyone. Make sure you understand your requirements and what's supported before you consider buying one.

To make this article complete, here's the detailed list of the hardware I tested for this tour of benchmark results, and you can drill into detail about the systems I have here by digging into my Geekbench Profile.

System	CPU Model	CPUs
2011 16GB MacBookPro8,2	Intel i7-2860QM CPU @ 2.50GHz	8
2012 16GB MacBookPro9,1	Intel i7-3615QM CPU @ 2.30GHz	8
2015 16GB MacBookPro11,4	Intel i7-4770HQ CPU @ 2.20GHz	8
2017 16GB Intel NUC7i3BNB	Intel i3-7100U CPU @ 2.40GHz	4
2019 16GB MacBookPro16,1	Intel i7-9750H CPU @ 2.60GHz	12
2019 64GB MacBookPro16,1	Intel i9-9980HK CPU @ 2.40GHz	16
2019 16GB Intel NUC10i5FNB	Intel i5-10210U CPU @ 1.60GHz	8
2020 8GB MacBookPro17,1	Apple M1	8
2018 64GB Ryzen X470	AMD Ryzen 7 2700X	16
2019 64GB Ryzen X570	AMD Ryzen 9 3950X	32

PostgreSQL Benchmarks: Apple Intel MacBook Pro, 2011-2019

Greg.Smith@crunchydata.com (Greg Smith) — Mon, 09 Nov 2020 04:00:00 EST

Apple's Intel-based laptops are very popular among developers, and that's as true of people who work on PostgreSQL as other groups. Tomorrow, the first shipping Apple laptops running on ARM CPUs instead of Intel are expected. That is likely to include at least a 13" MacBook Pro. I decided to prepare for that with a survey of PostgreSQL performance on my small herd of Apple laptops. Mine are all the 15" or newer 16" models.

Crunchy Data has already started digging into PostgreSQL on ARM performance as part of Crunchy Bridge, such as microbenchmarking on AWS Graviton 2. The OSS community around the Mac Homebrew tools seems ready for the ARM transition too. I'm hopeful that with some work, the new Apple ARM hardware can be as performant running Postgres as the Intel chips they replace. My results here say that ideally, Apple Silicon would hit 15K TPS single/60K TPS all core, or at least get close. Who wants to make an over/under bet?

I benchmarked them all with the consistent toolchain and method of my pgbench-tools software. That runs lots of PostgreSQL performance tests at various database sizes and client counts. For these MacBook CPU tests, the most useful tests I found for general CPU performance used single row "point" SELECT statements against a 1.6GB database, which is 100 on pgbench's size scale factor. Performance on that specific benchmark hasn't changed a lot (for laptop sized workloads) in the last few versions of PostgreSQL. Some of these results on older systems were using PG11, most are using PG12.

Here's the hardware I tested for this tour of benchmark results, and you can drill into detail about the systems I have here by digging into my Geekbench Profile.

System	CPU Model	CPUs
2011 16GB MacBookPro8,2	Intel i7-2860QM CPU @ 2.50GHz	8
2012 16GB MacBookPro9,1	Intel i7-3615QM CPU @ 2.30GHz	8
2015 16GB MacBookPro11,4	Intel i7-4770HQ CPU @ 2.20GHz	8
2017 16GB Intel NUC7i3BNB	Intel i3-7100U CPU @ 2.40GHz	4
2019 16GB MacBookPro16,1	Intel i7-9750H CPU @ 2.60GHz	12
2019 64GB MacBookPro16,1	Intel i9-9980HK CPU @ 2.40GHz	16
2019 16GB Intel NUC10i5FNB	Intel i5-10210U CPU @ 1.60GHz	8

Since I stayed away from laptops with Apple's butteryfly keyboard, I noticed a gap in my data around 2017. Intel had its own problems during this period too. I did pick up a cheap Intel 7th generation i3 CPU NUC in 2017, so I substituted that onto the chart. The single core performance fit midway in performance between the 2015 and 2019 MacBook models I had. The NUC hardware isn't a proper laptop chip, but the thermal limits of the form factor make them perform more like laptop CPUs than desktop one. I also have a 2019 10th gen i5 NUC I added for comparison. The two NUCs are running Ubuntu Linux instead of Mac OS.

I like data tables as much as charts, and parts of the story here are easier to see that way:

server	1	2	4	8	16	32
2011 16GB MacBookPro8,2	7252	14644	20471	30749	32894	32647
2012 16GB MacBookPro9,1	7781	15861	22380	34743	38294	36754
2015 16GB MacBookPro11,4	9770	17795	22372	38341	45048	43497
2017 16GB Intel NUC7i3BNB	12789	20870	33649	31053	32029	32409
2019 16GB MacBookPro16,1	14353	27588	43784	45089	61603	58705
2019 64GB MacBookPro16,1	14105	28733	46836	61167	62083	69101
2019 16GB Intel NUC10i5FNB	15444	27496	43341	70015	61927	62584

It's nice to see that both single-core and multi-core results have doubled during this 8 year stretch of time.