PostgreSQL is the World’s most advanced Open Source Relational Database. The interview series “PostgreSQL Person of the Week” presents the people who make the project what it is today. Read all interviews here.
LinkedIn, Twitter, and blogs
Tell us about your present job, and how it relates to PostgreSQL
I came to PostgreSQL only relatively recently, in the spring of 2019, when I left my job in the PL/SQL Team at Oracle HQ to join Yugabyte Inc. This is an exciting Silicon Valley startup, one of whose founders had been a close colleague of mine in the PL/SQL team. My first blog post in my new job, Why I Moved from Oracle to YugaByte, explains how I was easily persuaded to make this change after close to thirty years with Oracle.
Really — you had no exposure to PostgreSQL before 2019?
Yes, that’s true. There, I said it.
What does Yugabyte do?
In short, it developed, and continues to develop and maintain, YugabyteDB — an open source, distributed PostgreSQL database.
Tell us more
PostgreSQL users, especially in the present Cloud Era, often say that they want horizontal scalability. Well, YugabyteDB does exactly this — and it does it in the only sensible way that it could be done. It uses the actual open source PostgreSQL “upper half” C code layer for SQL processing. And it replaces PostgreSQL’s “lower half” storage layer with Yugabyte-authored code that brings intrinsic scalability with global distribution, and intrinsic fault tolerance. It uses a distributed transactional scheme to replicate every persisted datum (using an arbitrarily large replication factor).
Critically, with respect to the ethos of PostgreSQL, the entirety of YugabyteDB is fully open source (using the Apache 2.0 model). The code is here: github.com/yugabyte/yugabyte-db. Of course, Yugabyte Inc has a scheme to make money. This is based, as you’d expect, on licencable management software and on a fully managed cloud service.
There’s a nice two-part overview of YugabyteDB’s architectural scheme on our blog site. Start with Distributed PostgreSQL on a Google Spanner Architecture — Storage Layer. It links to Part Two about the SQL processing layer.
What’s your role at Yugabyte?
Early on in the life of Yugabyte Inc, my colleagues had to make a key decision about the user documentation for the SQL and PL/pgSQL functionality, as the application developer needs to understand this: (1) should it be simply a pointer to the PostgreSQL documentation? Or (2) should the YugabyteDB documentation provide its own version of this. They chose the second option. The PostgreSQL documentation serves as the definitional authority. But it is famously terse; and code examples are sparse, to say the least. The YugabyteDB documentation aims to complement this by providing detailed accounts of the semantics and by providing many rich code examples and use-case illustrations.
My job, together with others, is to write this documentation.
Do you enjoy writing documentation like this?
Yes, very much. It’s often said that the best way to learn something deeply is to explain it to others. And I’ve certainly acquired a pretty deep understanding of what I’ve written about — for example, window functions, aggregate functions, the “with” clause (and especially the recursive variant for graph traversal), arrays, JSON, and the story of the date-time data types and interval arithmetic. Naturally, then, when I’m not writing English prose, I’m designing code illustrations. And to this end, I use psql with vanilla PostgreSQL, and Yugabyte’s psql clone with YugabyteDB. I have to ensure that the developer experience of the syntax and semantics in these two environments is indistinguishable. And indeed it is.
What version of PostgreSQL is YugabyteDB built on?
Right now, YugabyteDB, Version 2 series, uses the Version 11.2 PostgreSQL code. The plan, of course, is to update presently to a current version. However, for my tests with vanilla PostgreSQL, I always make sure that I use the current version. It seems that very many of the improvements between 11.2 and the current version have to do with the “lower half” storage layer and so these have no consequence for YugabyteDB.
Do you use an IDE?
No I don’t — at least not for SQL and PL/pgSQL code. I realize that this makes me sound like a luddite, but my particular focus is to design the minimum, self-contained, code examples that will bring the teaching benefit that the current context of explanation needs. So I rarely have the problems that IDEs solve (like “Where is this element defined and used?” across a large interconnected code base). However, I do need to work with plain text files that follow several different conventions in addition to those of SQL and PL/plSQL (for example, and especially, Markdown source). That’s why I prefer to use my favorite generic plain text editor.
I have written a fair amount of Python code to orchestrate the kinds of tests to which psql is not well oriented. For example, when I need to investigate and explain the business of transaction isolation levels, many points are best made using two concurrent sessions that interleave possibly contending operations in a well-controlled mutually interleaving order. Python supports this effortlessly while psql simply is of no help here. I do use the PyCharm IDE when I work with Python code and I appreciate most of its benefits. I have to say, though, that I find the visual distraction brought by coloring different kinds of code element differently to be net detrimental.
Do you use Git?
I said earlier that the source code of YugabyteDB is managed here: github.com/yugabyte/yugabyte-db. This repository includes the various source artifacts (Markdown source, images, a scheme for creating SQL syntax diagrams, and so on) that the generator for our documentation site uses. So, in that sense, I use Git all the time. But, before you ask, I use it in order to survive and I don’t have any tips to pass on except the blindingly obvious: don’t simply rely on a cookbook of obscure magic spells but, rather, do your diligent best to acquire, and to continue to enrich, a robust mental model of how it all works. This is gold-dust when (as can happen, especially late in the evening) you get an obscure “Computer says no” error.
What’s the best advice you’ve ever been given?
“Don’t worry. This will pass. They’re not going to fire you — so just stick to your guns and say no.”
How do you feel about PostgreSQL?
What is your favorite PostgreSQL extension?
At the risk of sounding boring: “pgcrypto” so that I can use the random number generator, “gen_random_uuid()”; and “tablefunc” so that I can use “normal_rand()” to generate normally distributed random numbers to support various kinds of test. I’m surprised that these basic features aren’t part of the native functionality.
What annoys you most in PostgreSQL?
Given my many years with Oracle Database’s PL/SQL, you won’t be surprised by my answer. It’s that PL/pgSQL doesn’t support inner subprograms or packages. The lack of these critical notions for building modular, reusable, software comes close to defeating the use of PL/pgSQL for programming in the large.
I’m also frustrated by the fact that, while plsql’s metacommand “\i” has its “\ir” counterpart that uses a relative path, its “\o” metacommand has no such counterpart. This makes it very hard to compose bigger test scripts from smaller components (with each component in its own directory subtree). The “\copy” metacommand suffers from the identical problem (no relative path notion for either reading or writing).
What features would you like to see in a future PostgreSQL version?
I’d like support for those missing relative path notions in psql that I just mentioned. I’d also welcome a psql scheme for declaring strongly typed client-side variables that you can bind to placeholders in SQL statements — just like Oracle’s SQL*Plus has had since the dawn of time.
But far more critically, I’d of course like to see support for inner subprograms and packages in PL/pgSQL. I’d also like to see richer notions for transaction control in stored procedures — including, and especially, autonomous transactions. You need these to commit incident reports into tables without interfering with the application’s overall scheme for transaction control. This, by the way, is very easy, and is therefore common practice, in Oracle Database.
You can see that others bemoan this lack because Internet search turns up hits like “PostgreSQL does not explicitly support Autonomous Transactions. We recommend that refactoring is done to eliminate the requirement of using Autonomous Transactions. If that is not possible, then the workaround is to use the PostgreSQL dblink.” That advice isn’t viable for the use-case that I just sketched.
Do you think that PostgreSQL will be here for many years in the future?
I sincerely hope so. Yugabyte Inc pays my salary — and they’ve bet their farm on it!
Would you recommend Postgres for business or for side projects?
Well, PostgreSQL implements a robust, multi-user, transactional scheme that brings a reliable database of record for use cases where concurrent activity in the absence of such a scheme has the potential to bring wrong results. This sounds far more like a business use case than a side project.
Having said this, If I’m interrupted while composing a SQL code example by a phone call that needs me to do some quick formula evaluation, then I can get the answer quicker using an ad hoc “select” statement than by switching context and opening a spreadsheet application. As an extension of this, if I have some data in a spreadsheet that I want to analyze in the way that a PostgreSQL window function supports, then given that I already know how to go about it, it’s remarkably easy to get the data into a table (using psql’s “\copy” metacommand) and then to get my answer by typing SQL — especially with the help of a few subqueries inside a “with” clause. But I’d hardly recommend learning PostgreSQL just so that you can support that kind of ad hoc requirement!
Do you think that Postgres has a high entry barrier?
That’s a bit like asking “Does learning to play the guitar have a high barrier?”. There’s no getting away from native aptitude (or the lack thereof) for certain kinds of learning challenges. The main challenge a complete novice faces is learning SQL itself — in other words, learning to think declaratively using set theory and formal logic. I’m thinking about application design and implementation rather than administering the ongoing health of a deployed production system. I won’t comment on things from the administrator viewpoint.
Which skills are a must have for a PostgreSQL developer/user?
Well, beyond an ineffable native aptitude for the art and craft of programming (in the most general sense), it’s the same as for most things in life: readiness to study, to learn from colleagues, to experiment, and to recognize that to err is human.
How do you engage with the Postgres community?
Which PostgreSQL conferences do you visit? Do you submit talks?
Well… I was accepted to give two papers at the big New York conference in 2020. But that event was a covid casualty. (I did one of the talks as part of the online series that took the cancelled conference’s place.) Pre-covid, I did speak at a few in-person Postgres meetups and webinars. And I have two talks scheduled for September and October 2021 in the Postgres Conference Webinar Series.
I’ve also spotted and reported a few PostgreSQL bugs.
Do you read the @lists.postgresql.org channels like pgsql-general, pgsql-hackers, and their cousins?
I don’t read them regularly in the way that one reads a daily newspaper. But I do find them useful when I have questions — especially of the kind “Is the test case that I show you here evidence of a bug or of my faulty mental model?” I’ve received both answers over time!
Tell us about your earlier working life
Where did you work before Yugabyte, and when did you enter the database field?
It’s the same answer for both questions.
Relatively late in my working life, and not long after my return from working in Norway (see below) I found myself back in the UK, living very close to Windsor Castle and looking for a new job. I soon found out that Oracle Corporation had its UK HQ not far away and I sent them a speculative “Got a job for me?” letter. It turned out that they had — and so I joined their CASE development team (who remembers CASE?) in January 1990 working in UI design. That was fun, but I missed programming and so I worked for a while in the text retrieval development team. This was pre-Internet, and of course long before Google and the like. “Text retrieval” meant a purpose-built bolt-on to the basic Oracle Database to manage inverted indexes and a pretty complex API to execute text queries using a special extension to SQL. On the strength of that experience, I switched into Oracle’s consulting outfit and spent a happy time as the EMEA Text expert, traveling around for short-duration gigs supporting field engineers with pre- and post-sales activities. That got me a reputation (as a useful nuisance) with the Text team at Oracle HQ and I was offered the chance to join them and to relocate to the San Francisco Bay Area. Who would turn down such a chance? Not me — and I’ve been living in San Francisco since the mid-nineties.
I had a few different roles and after a while settled into a niche that suited me very well. I was the product manager for PL/SQL and then, later, also for edition-based redefinition. (I was also seconded for a while to work as the product manager for Oracle Multitenant during the period before it was released for general use.)
Did you have a working life before databases?
Yes, after my MPhil, I trained as a school teacher and then taught physics and mathematics for three years to teenagers in a couple of schools on the south coast of England. But I realized that I missed programming and found a job at the University of Oxford implementing algorithms, using FORTRAN, in the general field of image processing and pattern recognition. Back then, it was a new and exciting field. But the state-of-the-art was way behind what everybody takes for granted now. That work took me to a job at the Norwegian Computing Center in Oslo and then to a start-up (which failed).
My six years in Oslo were critical for my software engineering education. If you’re lucky enough to work with peers who have vast experience, imagination, patience, and a passion for explaining things then you can receive a much better education than on any formal training course — and such was my good fortune. The theory of object-oriented programming, and the world’s first language that embodied these ideas, Simula, were invented at the Norwegian Computing Center. (Kristen Nygaard’s office was just a few doors down the corridor from mine.) I had several years writing Simula, apprenticed to world-class experts in the field.
Where did you grow up and where were you educated?
I spent all of my childhood in Birmingham UK. Then I went to the University of Manchester. My BSc degree was called simply Physics. But many of the classes were Mathematics. At that time, the word “software” wasn’t in general use. People simply said “programming” and I learned to program in a long-forgotten language called Atlas Autocode — designed to run on the University’s one and only computer. After Manchester, I did a so-called MPhil in experimental physics at the University of Sussex (close to Brighton on the coast south of London). There I learned the tools of the trade for building bespoke apparatus: precision metal working (lathes, milling machines and the like) and, believe it or not, plumbing. I’ll say no more. But it set me up well for a lifetime of do-it-yourself home renovation projects. I also had to write lots of programs, in FORTRAN on punched cards, using 3D geometry to process the data that my experiments churned out on paper tape. Things were so very different back then!
How do you spend your free time?
What are your hobbies?
Pre-covid, I used to do a long bike ride every weekend (over the Golden Gate Bridge from home and on into Marin and beyond). I just couldn’t do that wearing a mask. But I’m looking forward to kick-re-starting that hobby any day now. I was working enthusiastically to comprehend and speak, and to read and write, standardized modern Chinese — a.k.a. 普通话。Lockdown made that fall beside the wayside too…
Would you like to recommend any books to readers?
“The Sense of Style” by Stephen Pinker. “Applied Mathematics for Database Professionals” by Toon Koppelaars. Both are ripping good yarns. I highly recommend each of them.
Any favorite movie, or show?
TV shows: Lilyhammer [sic]. Wallander — both with Rolf Lassgård and with Krister Henriksson. The Nordic Noire takes me back to my Nordic Period. Movies: “Pathfinder” — a Norwegian production from 1987 set in the far north of Scandinavia about a thousand years ago. The actors were all Sámi people; and the dialog was entirely in their language.
How would you spend your ideal weekend?
Biking. Chinese lesson. Entertaining friends at home. I’m very much hoping that all this will be coming back soon!
What’s still on your bucket list?
What I just said about my ideal weekend. And some international travel to see family and friends in Europe and China.