Josh Berkus



Tags:   postgresql    oregon    red hat    kubernetes    k8s    pottery    food    cooking    core    emeritus   
Category:   Interviews   
Interview conducted by: Andreas Scherbaum

PostgreSQL is the World’s most advanced Open Source Relational Database. The interview series “PostgreSQL Person of the Week” presents the people who make the project what it is today. Read all interviews here.

Please tell us about yourself, and where you are from.

I’m from America. I mean, seriously, I grew up in DC, Ohio, Florida, Texas, and California, and now I live in Oregon. So really all over the USA.

Josh Berkus

Josh Berkus

How do you spend your free time? What are your hobbies?

Well, first, I’m a potter. I make a wide variety of functional ceramics, including mugs, bowls, and stuff with slugs and cats on them. I used to make mugs and cups with Slonik on them; quite a few folks in the community have one of these, since I gave them out to conference speakers. You can find my pottery on fuzzychef.com. And because these things happen, I also run the website for the Oregon Pottery Association.

Second, I cook a lot. I had two years experience as a pastry chef, and ever since have been into cooking every way I could. I particularly like learning regional cooking from all around the world, including Greek, Georgian, Turkish, Moroccan, Portuguese, Ukranian, Shanghai, and many other places. I’ve made use of my conference travel to find out about, and buy, ingredients from everywhere. You can find more of my cooking on fuzzychef.org.

Any Social Media channels of yours we should be aware of?

Last book you read? Or a book you want to recommend to readers?

Just finished “We Could Be Heroes” by Mike Chen and it was a fun read and turns superhero tropes upside-down. In non-fiction, I’m currently reading “American Cheese” which is a fun dive into the zany world that is competition cheesemongering.

What’s still on your bucket list?

Easy snap-on lids for 2 gallon buckets. For some reason they haven’t been available since the start of the pandemic. I use a lot of these for pottery.

When did you start using PostgreSQL, and why?

So Bruce Momjian and I started using PostgreSQL for the same reason: we both worked on Microsoft SQL Server for our day jobs, and hated how broken it was, and wanted a database that we could fix. In fact, I found out much later that Bruce actually worked on a support ticket for me when he was working for a vendor whose SQL-Server-based software I was deploying at a client. SQL Server was Sybase 6.5; Microsoft bought the old version off Sybase for cheap, slapped a new GUI on it, and didn’t fix any of the bugs.

This led me to PostgreSQL in 1998. Back then, MySQL didn’t even have joins, so I couldn’t consider it a real database. But PostgreSQL needed a lot of work just to install. This led to a fateful weekend in late 1998, when I got two different email messages that kinda set the course of the next 18 years for me. I reported a bug in an MS SQL forum, and got threatened by Microsoft for reporting “false information”.

The next day, a Sunday, I reported a SQL parsing bug on a PostgreSQL mailing list. Tom Lane, who I didn’t realize had joined the project only months before, answered my bug report with “You’re right, that’s broken. Try this patch”. I thanked him and then had to find out what a patch was and what you did with it.

I think after that it was pretty inevitable that more and more of my consulting practice centered around PostgreSQL, until in 2004 I stopped supporting MS SQL entirely.

Do you remember which version of PostgreSQL you started with?

6.5

I have a bachelor’s degree in fine art, from Pitzer College. So I’m one of the many people who, in the 90’s, taught myself programming and computers because of the tech boom. I paid for my art degree partly by working in the university computer lab. I just didn’t see it as a career at the time, because when I went to school, CS was a focus in the math department.

What really helped me with working in open source, though, were the three years I was a fundraiser for the San Francisco Opera, and the year I spent as a labor organizer for United Farmworkers. Both fundraising and labor organizing and running an open source project have a lot in common – they’re all about motivating people.

What other databases are you using? Which one is your favorite?

I still love PostgreSQL, and if I don’t need something else specific, that’s the one I run (usually in a container these days). Interesting other databases I like lately have included:

  • CitusDB (which I helped launch)
  • Etcd, because if you want unkillable redundancy for really simple data, it’s hard to beat.
  • Redis, because if you actually make full use of the core database, you can see what a “proto-relational” database might be like, and there are some cool uses of it.
  • Kafka, because I love streaming databases and I don’t feel like they’ve gotten enough attention or adoption (sob PipelineDB & TelegraphCQ). Streaming DBs really cover a bunch of common use cases that are poorly covered by current storage-oriented databases, but it seems really hard to get folks to understand that.
  • CockroachDB is my other sad story; it was the near-perfect fusion of PostgreSQL and Etcd for me, but then they made it not open source and I can’t use it anymore.

You’re Core Emeritus, and left the Core Team in 2016. Why did you leave, and what have you been doing since?

I joined Core in 2003 and had a really good 13 years there. One of my goals when I joined was to put PostgreSQL in the top three or four commercial databases by adoption and respect, and our community did that over the next decade (although how it actually unfolded was a bit surprising).

Given that, by 2016 my best work on behalf of PostgreSQL was behind me. It was time to step down and provide opportunities for others, who still had unimplemented new ideas, to lead.

At the same time, I had become involved in the Kubernetes community, and was spending an increasing amount of time on it. This didn’t jibe with my consulting work, because database users were slow to adopt the new technology (for some very good reasons). Red Hat offered me a position that let me work on Kubernetes and related technologies full-time, and it seemed like the right time to make the change. I wasn’t the first PostgreSQL advocacy leader, or the last.

So, I stepped down and other people like Jonathan Katz and Claire Giordiano stepped up. It’s how open source is supposed to work.

Really, nothing anymore. Even Patroni is now in the hands of other people who have a lot more time to work on it than I do. Which is awesome! It’s a great day when folks being paid to improve a project on the clock can take it over.

Now, if we add new database enablement tooling to Kubernetes I’ll probably come back to doing PostgreSQL-on-Kubernetes stuff. I’ve found it works well if I create some prototypes and then the commercial providers pick them up and make them production. But nothing’s planned right now.

What is your favorite PostgreSQL lesser-known feature?

I don’t think most developers appreciate what a great tool PostgreSQL is for CSV processing. I throw it into automated workflows all the time as a way to clean up and transform CSV data. Both easier and more powerful than client-side code.

What is the most annoying PostgreSQL thing you can think of? And any chance to fix it?

We still have over 200 configuration variables. I tried for years to reduce the number, largely without success. I’m hoping that someone else will succeed where I didn’t.

What is the feature you like most in the latest PostgreSQL version?

I’m gonna say being able to limit WAL retention for Replication Slots. This eliminates a whole set of failure cases for automated management of Postgres replication.

Adding to that, what feature/mechanism would you like to see in PostgreSQL? And why?

Consensus replication, or some other form of multi-writer replication, would be the killer feature. It would let PostgreSQL take over the use-cases of several other databases, and it would make packaging and managing cloud-native PostgreSQL so much easier. Patroni has several thousand lines of code related to managing “who is the current primary” that could go away.

Do you think Postgres has a high entry barrier?

I think Postgres has a medium entry barrier, which is one of the things that’s helped it succeed. No, it’s not as easy as Redis, but it’s not as hard as Cassandra or Oracle.

Do you think PostgreSQL will be here for many years in the future?

Oh, yes. Postgres has been incorporated, either as a database, or as code, into so many other projects and products now that it’s guaranteed to be around for at least another 10 years even if all development stopped. And I have faith that the PostgreSQL community will continue to innovate, making it relevant well beyond that.