PostgreSQL is the World’s most advanced Open Source Relational Database. The interview series “PostgreSQL Person of the Week” presents the people who make the project what it is today. Read all interviews here.
Please tell us about yourself, your hobbies and where you are from.
I am a developer working on Greenplum at VMware. I live in the Bay Area. In my spare time, I enjoy coding side projects, taking computer science courses, running, traveling, and baking. I am an adventurous eater and love trying new foods.
Any Social Media channels of yours we should be aware of?
- Twitter - I don’t post much, though
- My fork of Postgres
- Current branch I am most actively developing on (as of time of writing)
When did you start using PostgreSQL, and why?
I started using PostgreSQL in tools I built while working in data management consulting at PwC. I wrote a C-language extension of the core format() function which could interpolate rows and hstores. PostgreSQL had all of the features I was looking for and was extensible. I liked that it was open source so that I could debug problems I was having. I got more involved with Postgres development after joining Pivotal as a software engineer working on Greenplum, an MPP fork of Postgres.
Do you remember which version of PostgreSQL you started with?
I think I started with Postgres 9.1 or 9.2.
Have you studied at a university? If yes, was it related to computers? Did your study help you with your current job?
I studied literary critical theory and film in college. I did take some computer science courses and was interested in programming. I think that my background in media studies gives me an unusual perspective. I learned to think about complex systems in a creative way that differs from traditional systems programming models.
What other databases are you using? Which one is your favorite?
On which PostgreSQL-related projects are you currently working?
I’ve been hacking on a feature called adaptive hashjoin. It is a fallback strategy for disk-based hashjoin. When the inner side of a multi-batch hashjoin is particularly skewed, instead of exceeding work_mem, hashjoin will divide the offending batch’s inner side into “stripes” and load them into the hashtable one at a time, probe the hashtable, then reset it and repeat until that inner side batch is exhausted. It ends up being a kind of block nested hash-loop join.
I’m currently changing the logic for falling back on the serial version, which is on my fork of Postgres on a branch. This is the diff with master.
How do you contribute to PostgreSQL?
I contribute to PostgreSQL through patch review and patch authorship. I have reviewed a patch in most of the commitfests for the last few years.
If I don’t have much time that month, I usually sign up to review a small patch and try it out (check that it works-as-advertised and that it passes regression and TAP tests). If I don’t have time to review the code itself, I call that out in my review so that another reviewer will sign up to do the code review.
If I have more time, I do code review.
If I have a lot of time and am interested in a particular patch, I’ll either reach out to the author and see if there is anything I can do to help or offer on the thread to pick up a task that seems like it needs to get done.
In the past, I have helped with performance testing, writing sections of the feature–such as column projection for disk-based hash aggregation, and small tasks, like rebasing the patch and updating the patchset so it passes tests on the commitfest patch tester.
Not everything I helped with ended up being used in the final version but participating has taught me a lot.
Developing Greenplum, we sometimes encounter bugs or inefficiencies in Postgres code and report them. Often, we will develop and propose a patch to fix them.
Any contributions to PostgreSQL which do not involve writing code?
I have spoken at PGCon and PostgresOpen on hacking Postgres planner. I’m speaking at virtual PGCon this year with Jeff Davis on work_mem and the Postgres query executor.
I have attended PUG meetups. I planned and hosted a patch review meetup for the South Bay PUG. It went pretty well, so we are planning on hosting more meetups for patch review, and, eventually, Postgres hacking.
What is the feature you like most in the latest PostgreSQL version?
I’m excited for disk-based hash aggregation. It was committed recently. I helped out a bit with it and got some of my team members involved in developing it. It is always exciting to me when features are added to Postgres that support large data sets and analytical workloads.
Adding to that, what feature/mechanism would you like to see in PostgreSQL? And why?
I would like to see adaptive hashjoin added to Postgres. Allowing users to determine the memory budget for their database is an important feature, both for users and cloud providers. And for that contract to work, that memory budget should be respected.
Could you describe your PostgreSQL development toolbox?
I use Ubuntu about 75% of the time and macOS the other 25%. I use Vim for writing code and editing text. I use an IDE (CLion) for reading code and debugging. In terms of hours per day, I spend more time debugging and reading code than writing it. I started using Ubuntu more often because gdb had follow-fork and lldb didn’t, which was fairly essential for debugging multi-process code. Since then, I’ve found other appealing aspects of working on Linux, but I am definitely looking forward to trying some other distros.
Which skills are a must have for a PostgreSQL developer/user?
I think diverse skill sets are important. Everyone works in a different way. I learn a lot from pairing with other Postgres developers.
I don’t have as many occasions to use Postgres at scale these days, so it is important for me to talk with users whenever I have the chance. My partner is a Postgres user and I learn a lot from his experiences. I also meet and talk to users at conferences and meetups.
I’d say that I’ve noticed that the skill that is most useful as a hacker is viewing “getting stuck” as a learning experience. There are a lot of times when something doesn’t make sense, and, when I’ve looked at it as something I needed to just “get through”, I didn’t learn as much from it.
Do you use any git best practices, which makes working with PostgreSQL easier?
Rebase your development branches often and save yourself headaches from big merge conflicts. Thanks to Alvaro Herrera for this tip for generating patches for the mailing list:
git format-patch -vN origin/master
(where N is the version of the patchset you are posting and origin is the Postgres git remote) generates good patch files to attach in emails to hackers.
Oh, and “set ft=mail” in Vim when composing your email is nice.
On patchset best practices: When creating a patchset for the mailing list, look at it as a polished product, like a deck or a report, and think about how you would consume it if you weren’t involved in writing it. For example, commits doing refactoring to make room for your code should usually be in separate patches which come first in the patchset.
Which PostgreSQL conferences do you visit? Do you submit talks?
I enjoy PGCon a lot. It feels like summer camp for Postgres developers. There are hackers in the community from all over the world who I’ve had a chance to meet in person at PGCon, which is a lot of fun. It will be interesting to see what that experience is like this year with it being virtual. I like PostgresOpen because it has fewer tracks so there are more shared experiences.
I’ve submitted talks to most of the conferences I attend.
I would mention that it is also a good experience to go to a conference that is not about Postgres specifically and spread the word about Postgres. Lots of people are interested and it is a good way to grow the community as well as to get ideas for ways to improve it by listening to what developers like and don’t like about their current databases.
I spoke at All Things Open last year on the Postgres planner and got a lot of good feedback from people who wanted to learn more about hacking on Postgres.
Do you think Postgres has a high entry barrier?
I can’t speak to other peoples’ experiences. For me, hacking on Postgres was one of my first experiences with development, so, there were things I struggled with that were related more to development in general and not specific to Postgres.
There were parts of the Postgres development process which were harder for me to learn than their alternatives. I found emailing patches harder to get used to than submitting PRs. I think that a lot of the decisions the community has made about the code base and development process make sense once you understand the rationale. I like having the opportunity to learn from a community that has been working with the same code base for many years, through many trends in software development, and tried different methods and seen what works and doesn’t work.
If you are just interested in getting involved in open source software development in general, Postgres might be less satisfying as a patch author because it can take a long time to write a patch that gets committed. However, if you are interested in Postgres specifically and ready to spend some time getting there, there are lots of people in the community who are more than happy to help.
I would add to this that Postgres hackers care a lot about developer usability. You may not find the quality-of-life features that you have had in other codebases, but the goal that hackers have is that you can
git clone [postgres git remote] ./configure && make install
And then you should be able to run tests or initialize the database and start it and it should just “work”. In practice, this doesn’t always hold (if you are missing a dependency, etc), but the goal is that you shouldn’t need a day to get up and running with Postgres development.
The community also cares a lot about things that slow down iteration time during development – such as the runtime of the regress test suite.
What is your advice for people who want to start PostgreSQL developing - as in, contributing to the project. Where and how should they start?
Patch review is a good place to get started with developing PostgreSQL. Sign up to review a patch on the commitfest app.
Another good way to get started hacking on Postgres is to write extensions. If you have some Postgres feature that you would like to see in core and you want to write it, I recommend proposing it on the hackers mailing list. If you can get others interested in developing your feature, they can help you to figure out how to develop it. Another way to build momentum for your idea is to propose it during a lightning talk at a Postgres conference or as a session at an unconference.
Are you reading the -hackers mailinglist? Any other list?
What other places do you hang out?
I am on the Greenplum open source slack.
Which other Open Source projects are you involved or interested in?
I’m a developer in the Greenplum community.
I’m hoping to get more involved in other open source projects soon. I work on small personal projects, but, I would love to get more involved in another open source community. I haven’t picked one yet, though.
Anything else you like to add?
I and a few other contributors are looking for volunteers interested in getting started with hacking on Postgres or on pushing patches forward that they already have in progress to participate in a few pair programming sessions.