Thomas Munro



Tags:   postgresql    microsoft    freebsd    new-zealand    committer    scheme   
Category:   Interviews   
Interviewed by: Andreas Scherbaum

PostgreSQL is the World’s most advanced Open Source Relational Database. The interview series “PostgreSQL Person of the Week” presents the people who make the project what it is today. Read all interviews here.

Please tell us about yourself, and where you are from.

I’m from New Zealand, and I live in Wellington, the tiny capital city. I’m a PostgreSQL developer and committer, and I’m employed by Microsoft to do that as part of the Citus Data group.

Thomas Munro

Thomas Munro

How do you spend your free time? What are your hobbies?

Family, hacking code, cycling, fondly remembering the time when we were allowed to travel.

Any Social Media channels of yours we should be aware of?

How would your ideal weekend look like?

A decent bike ride in good weather with friends, maybe visiting a new city, finishing up at an outdoor restaurant.

When did you start using PostgreSQL, and why?

A couple of decades ago, some friends and I ran a travel blogging website as a side-project. It started life as a beige PC running Linux/Apache/MySQL (gasp!) under the stairs at a friend’s house, because he worked at an ISP and had the bandwidth required coming into his house. Eventually we owned a bunch of chunky loud servers in a rented rack at a hosting facility (before the cloud was a thing). We wrote our own code to render maps from GIS data; it was a lot of fun!

One day we had a huge spike in popularity due to media coverage and learned the hard way that MySQL (in the time of MyISAM, ancient history now) couldn’t handle concurrent readers and writers, and table-level locking caused disastrous performance cliffs for our workload.

So, we migrated the system to PostgreSQL. It was slower at individual queries than MyISAM, but that didn’t matter, it allowed the concurrency we needed. It also had foreign key constraints and other nice features. That old website of ours is long gone – it didn’t survive the arrival of Big Social Media and much better map technologies – but my interest in PostgreSQL and database internals dates from those days of running (what we quaintly thought of as) high load systems on a shoe-string budget.

Do you remember which version of PostgreSQL you started with?

I don’t recall the exact version but it was 7.something. You needed cron jobs to run VACUUM. I vaguely recall installing and playing with a 6.x release too, before the WAL was added, but I didn’t use it seriously. It used to flush dirty pages from the entire buffer pool to disk twice on every commit.

My first patch wasn’t until much, much later: SKIP LOCKED in 9.5.

I did some undergraduate courses in languages, linguistics and computer science in the mid 90s, but I didn’t stick around to complete a degree. I met a lot of smart people there, and I was lucky enough to get a part-time student job writing factory automation software. I’d already tinkered … a lot … with a Sinclair 8 bit computer and later an Amiga as a kid, but that job gave me access to Unix systems, compilers, databases, networks, books, and most importantly people who could pass on some elements of what I think of as the open Unix culture. It was a different time – the dot com boom was happening, there were a lot of opportunities if you knew how to code and my friends and I were itching to get out and see the world. With hindsight, if I’d stuck it out at university I’d have graduated into the dot com crash, so I regret-but-also-don’t-regret that youthful impatience.

What other databases are you using? Which one is your favorite?

Having worked with all the major commercial relational databases as an application developer in data-heavy industries like fintech over a couple of decades, I am under no illusions about the fact that each of them can beat PostgreSQL in some respect or other. But PostgreSQL is my favourite… it belongs to all of us and we have the power to keep making it better!

Right at the moment I’m trying to make crash recovery and streaming replication faster. I also have a project to try to tame the query executor’s memory usage.

How do you contribute to PostgreSQL?

Writing code, fixing bugs, discussing ideas with other contributors, reviewing and committing patches.

Any contributions to PostgreSQL which do not involve writing code?

One thing I tinker on is cfbot.cputube.org, which tries to bridge the gap between our traditional mailing list-based development culture and the modern pull-request-with-continuous-integration workflow style. I hope it’s useful as a way to start reviewing patches: look for something that’s currently not passing CI, see if you can figure out why, and while you’re there, have a bit of a look around the patch and see what else you can find to comment on, from language to logic to usability. Before you know it, you have a review you can post on the mailing list.

What is your favorite PostgreSQL extension?

Probably PostGIS. It’s a testament to the extensibility that POSTGRES’s founders designed at Berkeley decades ago. Even today, I doubt there are many other systems that could have a full featured GIS system as an externally developed plug-in module.

There’s another extension that I’d really love to see: PL/Scheme, to write procedures in the beautiful Scheme programming language. Someone wrote it years ago based on Guile, but it wasn’t maintained and no longer works. I managed to revive it partially a couple of years ago. I’d like to come back to that one day soon.

What is the most annoying PostgreSQL thing you can think of? And any chance to fix it?

The worst thing about PostgreSQL is the need to freeze (rewrite) data due to transaction ID wraparound. Currently I think the best way forward is to adopt 64 bit transaction IDs in enough places so that we don’t need to freeze committed transactions anymore. Some small initial steps have been taken and I think we’ll get there. Doing the same for aborted transactions will require some more complicated machinery.

What is the feature you like most in the latest PostgreSQL version?

PostgreSQL 13 has a new space efficient representation of duplicate keys in btrees. Thanks to Anastasia Lubennikova and Peter Geoghegan.

Adding to that, what feature/mechanism would you like to see in PostgreSQL? And why?

Thinking big: One day I hope we’re multithreaded, running exactly one thread per CPU core and executing queries compiled to cooperative machine code coroutines. Yeah, I know, it’s not much to ask for.

Could you describe your PostgreSQL development toolbox?

I use old school Unix development tools. I do most of my editing in Emacs (real programming) or Vim (quick changes), and I primarily work on FreeBSD and Debian GNU/Linux operating systems with the system compilers, Clang and GCC respectively. As for other tools, GDB, LLDB, perf, dtrace, etc, and I tend to write throw-away scripts in Python. I sometimes use continuous integration systems to get my development branches tested on other OSes.

Which skills are a must have for a PostgreSQL developer/user?

Developing any software requires a blend of curiosity and stubborn persistence. Because it’s open source and consensus-driven, developers also need a lot of energy for communication, openness to others’ ideas and patience.

Which PostgreSQL conferences do you visit? Do you submit talks?

PGCon in Ottawa, PostgreSQL Down Under in Melbourne/Sydney, and occasionally others. I live far away from … well, everywhere, but it’s always well worth the trip to meet all the people you’ve interacted with in the online community to find out what they have cooking, and meet new people too. I usually submit talks about areas I’ve been working on. I collect them here.

Do you think Postgres has a high entry barrier?

No! You can get it running within minutes through your favourite package manager. You can find drivers for any language in minutes. You can pick up any book on SQL or PostgreSQL, or transfer your knowledge from any other SQL database pretty easily. You can ask questions on the psql-general mailing list, the IRC channel, the Slack channel, and you’ll usually find people willing to help you with any problem.

What is your advice for people who want to start PostgreSQL developing - as in, contributing to the project. Where and how should they start?

Create a community account on postgresql.org, join the pgsql-hackers mailing list, choose a patch from commitfest.postgresql.org, try it out, and write about your experiences with it. If you need help figuring out how to patch and compile and run PostgreSQL, don’t be afraid to ask. Or send in a new patch, or an idea for a patch, of your own. Many good first projects consist not of picking something from any preexisting list of ideas, but rather solving a real world problem you or a customer ran into: perhaps something that’s hard to migrate from another system, or that doesn’t work well for your use case, that sort of thing, and you fix it. It’s pretty likely you’re not the only one who’s run into that problem, but you can be the one to fix it! Or perhaps you have some much loftier project in mind, such as a major new feature. In that case, get talking to other developers about it, it’s highly likely that others have ideas and possibly even relevant active projects or earlier stalled work that could help you on your way. Many successful projects involve smaller features that are developed as stepping stones that are useful in their own right, but open the way to bigger features.

Do you think PostgreSQL will be here for many years in the future?

Absolutely. There is no company to go out of business and stop making it or take it away from you, and there is a whole community of people and companies incentivised to keep making it better.

Would you recommend Postgres for business, or for side projects?

Yes!

Are you reading the -hackers mailinglist? Any other list?

I follow many (but definitely not all) threads on pgsql-hackers, and occasionally pgsql-general. I also try to monitor pgsql-bugs for things that I can help with.

What other places do you hang out?

Sometimes I hang out in #postgresql on the Freenode IRC network as macdice.

Which other Open Source projects are you involved or interested in?

I am a minor contributor and committer on the FreeBSD operating system. Most of my contributions to user space and kernel code there are related in some way to making PostgreSQL run better. Here’s some more information about that.