PostgreSQL is the World’s most advanced Open Source Relational Database. The interview series “PostgreSQL Person of the Week” presents the people who make the project what it is today. Read all interviews here.
Please tell us about yourself, your hobbies and where you are from.
I’m a middle-aged guy from the blessed west coast of Canada, where the sky and the mountains meet the sea. Raising a family takes up most of my time, but I’ve managed to carve out personal time for fitness recently, and exploring the area on my bike. I grew up in the interior of British Columbia, but have lived in Victoria, on Vancouver Island, for most of my adult life.
Any Social Media channels of yours we should be aware of?
When did you start using PostgreSQL, and why?
I became a PostgreSQL user in the late 90’s, for no fault of my own.
I was working on a consulting project where we were calculating land use statistics for every watershed in British Columbia. Things like percentage logged area, length of roads build, number of stream crossings, amount of steep slopes – numbers that can be used to quantify environmental health.
Over every watershed, and over a hundred metrics, that added up to a lot of data, and a lot of data processing. It took months to process the whole province.
So that we could easily stop and restart processing, and apply multiple workstations to the problem, we stored all the interim and final data in a database – PostgreSQL.
Do you remember which version of PostgreSQL you started with?
PostgreSQL 6.3. I didn’t choose PostgreSQL, the project technical lead did, but I came to love it as the first relational database server I got competent in. Because we were basically using PostgreSQL as a processing queue manager, transactional integrity was an important feature, so MySQL wasn’t going to cut it.
Have you studied at a university? If yes, was it related to computers? Did your study help you with your current job?
I did a BSc in Mathematics and an MSc in statistics, but only did a handful of computer science courses. Frankly, I didn’t find any of it very interesting. For my graduate work in statistics, I did a lot of work in SPlus, which was the language cloned by R, but still very much as a user.
I finally got turned on by computing when I installed Linux on my home computer.
My wife was doing a GIS program and wanted to experiment with GRASS GIS, an open source GIS released by the US federal government. So we needed to install “that Linux thing” first, to then install GRASS.
And installing “that Linux thing” turned out to be both incredibly fiddly and surprisingly engaging, like solving a big mental jigsaw puzzle. I remember staying up until 2AM getting X windows running for the first time, juggling “modelines” entries. (I managed to avoid blowing up my monitor.)
From there, building my own software, writing scripts in Perl, all seemed to flow naturally. The power that Unix systems puts in the hands of users was intoxicating, and always a contrast to the guardrails and limitations I ran into trying to solve problems with Windows and proprietary software. So I kept on going.
What other databases are you using? Which one is your favorite?
For various purposes in my consulting career I’ve used Oracle and MySQL. I find I can tolerate modern versions of MySQL. Oracle is always laughably terrible to use, though I can understand the appeal to people who bill by the hour.
On which PostgreSQL-related projects are you currently working?
I’m most known for my work on the PostGIS project, a “geospatial database” extension to PostgreSQL. PostGIS provides geometry, geography and raster types for PostGIS, along with all the supporting index and function bindings you need to make the types useful. I like to tell GIS people that PostGIS gives them “GIS without the GIS”, because the functionality is rich enough to solve any spatial analysis problem in SQL.
I also have a bunch of small side projects, small extensions that bind existing libraries to PostgreSQL that people might find useful:
- pgsql-http binds libcurl, so that it’s possible to make HTTP requests in PostgreSQL. It provides functions like http_get() and http_post().
- pgsql-gzip binds zlib, which is useful for people using pgsql-http, since sometimes HTTP payloads are encoded with gzip compression. It provides functions like gzip() and gunzip().
- pgsql-ogr-fdw binds gdal as a PostgreSQL foreign data wrapper (FDW). The OGR subsystem of GDAL exposes dozens of geospatial formats (Oracle Spatial, Excel, SQL Server, FGDB, GPKG and more) with an API that maps nicely to tables. It’s easily my most complex side project, and it’s used a surprising amount by people connecting to spreadsheets and other odd formats OGR supports.
How do you contribute to PostgreSQL?
I contribute small patches to PostgreSQL in places where my needs as an extension author run into limitations of the core. My two biggest patches to date were:
- A patch that allowed extension operators to be pushed down over FDW connections. For PostGIS, this means you could run a spatial query on the local system and have the spatial filters pushed down to the remote.
- A patch that allowed compressed TOASTed objects to be partially decompressed when reading a slice from the object. For PostGIS, this allows us to read the header off of a large spatial object without reading and decompressing the whole thing first. It also incidentally sped up the core left() and substr() functions substantially for large text strings.
Any contributions to PostgreSQL which do not involve writing code?
What is your favorite PostgreSQL extension?
Aside from PostGIS, I have gotten the most regular use from fuzzystrmatch. So many practical problems involve fuzzy string matching.
What is the most annoying PostgreSQL thing you can think of? And any chance to fix it?
The way the versioning of extensions intersects with packaging is incredibly frustrating. Extensions have to be built against particular versions of PostgreSQL, so there’s a version dependency there. But at the same time, there’s no rule that says one extension version has to be tied to a particular PostgreSQL version, and nor should there be: PostGIS 2.5 works against PgSQL 10, 11 and 12.
But the extension system includes an extension.control file which has no version information on it. So packagers put it into all the versioned packages of extensions. Which means two versions cannot coexist, because they both contain copies of the extension.control file.
In general, the cases of “upgrade from this version of this server and this version of an extension to a new server and new extension” are not well tested and lead to problems.
What is the feature you like most in the latest PostgreSQL version?
We got a huge win from a completely ignored feature that Tom Lane added in PostgreSQL 12 called the “support functions API”. It lets you add magic around functions to do things like dynamically re-cost functions, or rewrite the plan around functions.
It’s quite deep magic, but for practical purposes it allowed us to index-enable and re-cost a bunch of spatial functions in ways that have made spatial queries much more amenable to parallel plans.
So with PostgreSQL 12 and PostGIS 3, spatial users finally have full access to the parallel execution features that other users have been enjoying since PostgreSQL 10.
Which skills are a must have for a PostgreSQL developer/user?
The number one skill is “try it and see”. How does this function interact with this data, with this transactional state? Try it and see!
For a developer, the patience to read the code carefully is huge. It’s a huge code base, and there’s almost always something you can use already there, avoid wheel reinvention.
Do you use any git best practices, which makes working with PostgreSQL easier?
Nope, I only use bad git practices.
Which PostgreSQL conferences do you visit? Do you submit talks?
PgCon is an essential conference for anyone interested in core development or high-end production use, to understand the internals better.
PgConf.EU is a great conference with a more user focus.
Do you think Postgres has a high entry barrier?
There’s not great materials for folks with no SQL background to get started, nor is there nicely graduated learning material for folks learning the more complex aspects of SQL. Our core site seems to operate on the assumption that people already know standard SQL, and concentrate more on teaching them the things that make PostgreSQL unique as a SQL database. This is fine for folks migrating from Oracle or MySQL but it leaves the field open for folks choosing between “NoSQL database X and SQL database Y”. The barriers could be lower, for sure.
What is your advice for people who want to start PostgreSQL developing - as in, contributing to the project. Where and how should they start?
Have a reason to want to contribute. A practical problem is far more likely to result in contributions that have practical upsides, and thus will be accepted.
Do you think PostgreSQL will be here for many years in the future?
In a world where unemployment systems are still running on hand-cranked COBOL, there’s no doubt that a reliable data store like PostgreSQL will be around as long as there is electricity to run it.
Would you recommend Postgres for business, or for side projects?
I’d recommend it for both. One transition I’m waiting to see is the movement of systems of record onto PostgreSQL. Maybe I’ll be waiting a long time, because big organisations don’t change databases frequently – maybe every 25 years or so. While I am glad to see businesses and governments using PostgreSQL in production more and more, I still see lots of web-first projects, and not so many core critical systems, so that’s the tipping point I’m waiting for – when systems of record get migrated.
Are you reading the -hackers mailinglist? Any other list?
What other places do you hang out?
Which other Open Source projects are you involved or interested in?
Crunchy Data has incubated a couple spatial web services projects, pg_tileserv and pg_featureserv, that I’ve had a hand in. They are small projects meant to make building web mapping applications on PostgreSQL/PostGIS easier.
Also, the core computational geometry algorithms in PostGIS come from a library, GEOS, which I’m active in maintaining and developing. GEOS has gotten a lot faster and more stable over the past couple releases, and those improvements are showing up in PostGIS now.