Marco Slot



Tags:   postgresql    citus    microsoft    extensions    distributed-systems    sharding    netherlands    pg_cron   
Category:   Interviews   
Interviewed by: Andreas Scherbaum

PostgreSQL is the World’s most advanced Open Source Relational Database. The interview series “PostgreSQL Person of the Week” presents the people who make the project what it is today. Read all interviews here.

Please tell us about yourself, and where you are from.

I’m originally from a small city by the sea called Den Helder in the Netherlands, but nowadays I live in Haarlem with my wife and cat by a tiny canal.

I entered the PostgreSQL world when I joined a small Turkish start-up called Citus Data in 2014, which has been an exciting adventure. Before Citus Data I was doing a PhD in distributed systems and working for Amazon Web Services on CloudFront and Route 53. I knew very little about databases, but I loved working on distributed systems and Citus Data was working on scaling out PostgreSQL, which seemed interesting. Working with PostgreSQL code turned out to be really enjoyable to the point where I started dedicating a lot of time to our PostgreSQL extensions and everything related to PostgreSQL (conferences, blog posts, writing new tools and extensions, helping users).

I mainly work on the Citus extension, which is an open source extension that transparently shards tables across many PostgreSQL servers. It is used for really large databases (up to a petabyte), and especially for Software-as-a-Service applications and real-time analytics dashboards. Since most PostgreSQL features work as they normally do, but operations are routed or parallelized across a cluster, Citus turns PostgreSQL into a really powerful distributed database. In 2019 Microsoft acquired Citus Data, so now I do all the same things at Microsoft.

Marco Slot

Marco Slot

How do you spend your free time? What are your hobbies?

I like cycling and walking, preferably on mountains and in different countries, although the latter has been a little tricky lately. Fortunately, Haarlem is a beautiful city and within cycling distance of the dunes and the sea. I also like to write code, mainly PostgreSQL extensions.

Any Social Media channels of yours we should be aware of?

What’s a book you want to recommend to readers?

I think “Designing data-intensive applications” by Martin Klepmann is the most important book to read for anyone working with a non-trivial amount of data.

Any favorite movie, or show?

I think the series I most enjoyed recently was Marvelous Mrs. Maisel.

How would your ideal weekend look like?

It could be exploring little old towns with my wife, a long mountain hike, or coding a new PostgreSQL extension.

What’s still on your bucket list?

Kamchatka and sub-orbital space flight.

What is the best advice you ever got?

The most powerful force of innovation is when technology enables the business to grow, and the business enables the technology to grow.

When did you start using PostgreSQL, and why?

I first used PostgreSQL in 2010 when I was working on a traffic simulation for my PhD and needed to extract OpenStreetMap data. PostgreSQL with PostGIS was just the right tool for the job, but I only started using/hacking it on a daily basis when I joined Citus Data.

Do you remember which version of PostgreSQL you started with?

I think 8.4.

Yes, I studied about as much Computer Science as one can. I did an B.Sc in Computer Science and an M.Sc in Parallel and Distributed Computer Systems at VU University Amsterdam when Andy Tanenbaum was still teaching there, and then a Ph.D. in cooperative self-driving cars at Trinity College Dublin. It prepared me for my current job very well, although I didn’t actually learn much about databases. The VU has a strong (distributed) systems discipline, and a PhD can be a surprisingly good preparation for a start-up and leadership roles, mainly because it requires a lot of discipline and self-management, and after working on cooperative self-driving cars for a while, distributed databases seem relatively easy.

What other databases are you using? Which one is your favorite?

We use a bit of SQL server at Microsoft, which is a fine database. Of course, PostgreSQL is my favourite. I spend about as much time in psql as I do in bash, so PostgreSQL is like my second OS. I do find myself typing \d instead of ls into bash an awful lot. I also like abusing systems like Azure blob storage and S3 as databases for personal projects.

Mainly Citus and Azure Database for PostgreSQL, in particular Hyperscale (Citus). My team also maintains many other PostgreSQL extensions such as pg_auto_failover, pg_cron, postgresql-hll, postgresql-topn, cstore_fdw.

How do you contribute to PostgreSQL?

I like working on open source PostgreSQL extensions (apart from it being my job), because users will be able to deploy my code immediately and I can control the scope to match my time limitations. Unfortunately, I haven’t really had enough time (or patience) to work on PostgreSQL code directly, but I think that enabling people to scale out PostgreSQL through Citus, or very easily configure set up high availability through pg_auto_failover, are pretty significant contributions by my team. pg_cron is an extension that I wrote mostly by myself and has solved the common problem of scheduling jobs in your PostgreSQL database without requiring separate job scheduling infrastructure.

Any contributions to PostgreSQL which do not involve writing code?

I think one thing to call out is that we donated 1% of Citus Data shares to the PostgreSQL foundations before the acquisition by Microsoft. That probably ended up being one of the biggest donations to PostgreSQL ever, and I’m really glad and proud we were able to do that as a company.

What is your favorite PostgreSQL extension?

After Citus, it’s probably pg_partman, because it gives you simple and straight-forward time partitioning.

What is the most annoying PostgreSQL thing you can think of? And any chance to fix it?

Logical replication slots are not replicated to a hot standby, causing failovers to break logical replication.

What is the feature you like most in the latest PostgreSQL version?

The custom table access methods are going to open up a lot of new innovation. I also love all index improvements.

Adding to that, what feature/mechanism would you like to see in PostgreSQL? And why?

I do have a long wish list, but I don’t want to be greedy. If we can fix the logical replication slot issue I’ll be very happy.

Could you describe your PostgreSQL development toolbox?

Mostly just typical Linux command line tools like tmux, vim, make, psql, git, grep, find, sed.

I tend to do very little customization of my development environment, mainly because I always work on distributed systems and spend a lot of time being connected to different machines where I end up tripping over any non-standard habits. I also found my productivity has a lot more to do with external factors than my development environment.

I do use Ctrl+R a lot to search backwards in command history, and I love that it works consistently across bash and psql. If I work on PostgreSQL code on a new machine, I might generate some artificial bash history like:

find src/ -name "*.c" | sed -e 's/^/vim /' >> ~/.bash_history

That way, I can quickly find any of the PostgreSQL source files in bash using Ctrl+R without extra tooling. Similarly, I like adding – comments to my queries in psql to find them later using Ctrl+R by typing a few characters from the comment.

Which skills are a must have for a PostgreSQL developer/user?

I believe it’s important for PostgreSQL users to write SQL whenever possible. Whatever convenience your framework provides to avoid boilerplate, SQL provides a lot more convenience when it comes to actually solving your problem and will likely do it in a much more efficient way. It’s also important to spend time in psql or another interactive tool to get a feeling for what’s going on in your database.

Which PostgreSQL conferences do you visit? Do you submit talks?

I’ve spoken at PGConf.EU, PGConf.Russia, FOSDEM, PGCon, PostgresOpen, local meetups, various PGDays. I have been on a little conference hiatus, which was unintentionally extended by COVID-19, but would love to go back and meet with the community.

Do you think Postgres has a high entry barrier?

Most of the time no, but performance tuning can be very hard, especially at scale. Picking the right storage format, partitioning, indexes, triggers, transformations, etc. to get good performance for all your queries, and this can involve a lot of complicated steps. I sometimes wonder: Could we build an optimizer that, rather than working on a single SQL query, looks across all the SQL queries, their performance expectations, and then considers all the options for configuring the database and picks the one with the lowest overall cost?

What is your advice for people who want to start PostgreSQL developing - as in, contributing to the project. Where and how should they start?

I think adding custom types (with relevant functions) is easier and much more powerful than most people realize. I’d love to see PostgreSQL support for popular data formats like YAML, RSS, EXIF, iCal, protobufs. Once PostgreSQL has support for a data format then, given all of its other capabilities, it automatically becomes the world’s most powerful tool for building applications using that data format. We’ve seen that with PostGIS and JSONB, but there is a lot more room for innovation on this front.

Are you reading the -hackers mailinglist? Any other list?

I mainly lurk on pgsql-hackers all the time.

What other places do you hang out?

I’m mainly on the Citus Data Slack and sometimes on the PostgreSQL Slack.

Which other Open Source projects are you involved or interested in?

I think V8 and SQLite are my favourite open source projects to play with other than PostgreSQL. The notion of having a battle-tested, isolated programming runtime like V8 that you can add to your software gives you many interesting possibilities that go far beyond Chromium and Node.js. Embedding a tiny SQL database into your software is similarly powerful.

Anything else you like to add?

Thanks for doing these interviews. It’s a nice way to get to know the community now that we cannot visit conferences.