Jeff Davis



Tags:   postgresql    microsoft    citus    committer    california    range-types    exclusion-constraints    checksums    hashagg-disk   
Category:   Interviews   
Interview conducted by: Andreas Scherbaum

PostgreSQL is the World’s most advanced Open Source Relational Database. The interview series “PostgreSQL Person of the Week” presents the people who make the project what it is today. Read all interviews here.

Please tell us about yourself, and where you are from.

I’m married, and I have two children and a labrador. I grew up in a suburb in central California.

Jeff Davis

Jeff Davis

How do you spend your free time? What are your hobbies?

I’m learning Spanish, and I have some sailing lessons planned for next month. My dog is active enough that I count her as a hobby, too.

Last book you read? Or a book you want to recommend to readers?

The last book I read was Catcher in the Rye by J.D. Salinger. I’m reading a biography right now: The Rise of Theodore Roosevelt by Edmund Morris. Next up is either The Trial by Kafka or Antifragile by Nassim Taleb.

For database books, I recommend An Introduction to Database Systems, by CJ Date. It has shaped my expectations of what’s possible. The author’s other books spend too much time criticizing SQL though, which detracts from some of his insights.

Any favorite movie, or show?

MacGyver. Perhaps that’s why my patches resemble duct tape and paper clips in some places.

What does your ideal weekend look like?

Hike on Saturday with my family, and a small gathering with a few people in the evening. On Sunday, just cooking, eating, and movies.

What’s still on your bucket list?

Go to Peru (and a few other places in Latin America), and travel more in the middle of the U.S.

When did you start using PostgreSQL, and why?

At my first job when I was 16, I was trying out MySQL for a web application. My boss said “try out PostgreSQL, it’s more free” (note that this was before MySQL was GPL). I used both for quite a while, but Postgres was much more pleasant to use.

How did you get hooked on PostgreSQL?

I was hooked when I sent the first few emails to the postgres mailing lists. I was new to open source, and obviously very inexperienced. When I sent a bug report, I expected something along the lines of “thank you, we will look into that,” but what I saw was an immediate and frank discussion of the problem, followed by a fix.

I don’t think that was very common at the time, and it really made me feel like a part of things rather than an outsider. A big thanks to Tom, Bruce, and Jan.

Do you remember which version of PostgreSQL you started with?

6.5.3. I filed my first bug report against 7.0.0.

I studied Electrical Engineering at UCSD, but by the time I was in university I was already sure I’d be a software engineer. I had already been programming for a couple of years, and already been using Postgres.

Columnar Compression, which is part of the Citus extension. It allows the user to select some tables to be stored as “columnar”, which is much more compact and also faster to scan; but slower for single-tuple operations.

What is your favorite PostgreSQL extension?

Citus ;-) It’s a scale-out Postgres solution in a pure extension (meaning that it works on standard Postgres; it’s not a fork), which also includes complementary functionality like columnar compression.

What is the most annoying PostgreSQL thing you can think of? And any chance to fix it?

Postgres is still very intertwined with the filesystem, and it causes a variety of major and minor annoyances. For instance, different files have different caching and checksums, and achieve durability and atomicity in different ways. Also, something like unlogged tables makes it hard to tell whether the data was reset or not.

Not likely to be fixed soon, but I think it will improve.

What is the feature you like most in the latest PostgreSQL version?

Disk-based hash aggregation, which I authored. I got a lot of great help and it turned out better than I imagined. It protects users against runaway memory usage for certain kinds of queries, and enables faster execution in some cases as well.

Adding to that, what feature/mechanism would you like to see in PostgreSQL? And why?

A more flexible type system. It would enable new type system features like generics and algebraic types, which are popular in programming languages for good reasons. I think these type system features are even more important for databases to make schemas feel more flexible without losing the structure.

Postgres has been a leader of advanced types: JSON, Range Types, etc. But I think we could do much more if we could freely combine types at query time rather than requiring explicit entries in pg_type for each one. For example, support for arrays means that each type has two entries in pg_type: one for the base type, and one for the array type. That doesn’t scale well, and it introduces a lot of implementation ugliness.

Could you describe your PostgreSQL development toolbox?

Ubuntu, emacs, git, gcc, gdb, llvm, lldb.

Which PostgreSQL conferences do you visit? Do you submit talks?

PGCon almost always. It’s also good to visit academic conferences or other open source conferences occasionally.

Do you think Postgres has a high entry barrier?

I think we’re at the point where SQL itself is the biggest barrier to entry for PostgreSQL.

What is your advice for people who want to start PostgreSQL developing - as in, contributing to the project. Where and how should they start?

Be persistent, but have an open mind. It often just takes a long time to go from an idea to an accepted patch, and it often changes form in the process.

Do you think PostgreSQL will be here for many years in the future?

Yes. Postgres is the most adaptable database system, and has always responded well to the inevitably-changing world. The rising interest in JSON and flexible schemas was seen as an existential threat to the relational world at first, but Postgres was able to adapt very quickly and successfully.

Postgres’s rapid pace of development and very powerful extension APIs makes me think it will continue to adapt quickly in the future.

For what purposes would you recommend Postgres?

Postgres is a great default choice whenever you need a database. It’s general-purpose enough to solve a large variety of cases out-of-the-box, while its extensibility also serves many special cases well. It’s also pleasant to use.

I am skeptical of special-purpose database systems unless they integrate really well with general-purpose systems. Special-purpose systems are necessary sometimes, but your use case will almost always grow into areas where they are terrible. Data is more valuable when combined with other data, so there will be constant pressure for your use case to grow and merge with other use cases – don’t expect to stay within the sweet spot of a special-purpose database system.

There are other good general purpose systems besides Postgres, but that decision is more complex. I’m biased strongly towards Postgres, so to avoid too much advocacy, I’ll leave it at: “Postgres is a great default”.

Which other Open Source projects are you involved or interested in?

Citus, Rust.