Nikolay Samokhvalov



Tags:   postgresql    community    database-experiments    russian    postgres-ai    database-lab    change-management   
Category:   Interviews   
Interviewed by: Andreas Scherbaum

PostgreSQL is the World’s most advanced Open Source Relational Database. The interview series “PostgreSQL Person of the Week” presents the people who make the project what it is today. Read all interviews here.

Please tell us about yourself and where you are from.

I’m from Russia, a small place in the Penza region. Born in 1981 and moved to the Moscow area in 1998 – to study Mathematics and Physics at Moscow Institute of Physics and Technology (MIPT, “Phystech”), quite famous in Russia and outside it. I chose Computer Science and, particularly, Databases specialization almost in the beginning – and stick to it, without regrets. Databases are at heart of nearly any computer system!

At the end of the second part of my life (“Moscow period”), in 2013 – right before the Crimea crisis – I decided to move to the U.S., and have lived in California since then.

Nikolay Samokhvalov

Nikolay Samokhvalov

I’ve got bachelor and master degrees in MIPT (it’s a university, but they keep “Institute” in the name not to change the brand), with so-called “base institute” at Institute for System Programming (ISP) – a research institute with rich history, where I studied Computer Science from grand Russian researchers, who worked on first computers in USSR (including BESM-6 created in the 1960s and used in space programs such as Appolo-Soyuz mission).

In 1998, I didn’t have my own computer. I remember that we studied the Turing machine, and we’ve got a homework task to write 2 programs on a piece of paper (I guess they are classic):

  1. Take a number written as a sequence of “I” signs on an infinite belt. Use commands “shift left”, “shift right”, “read from the current cell”, “write I” – and increment the number (add another “I”). The difficulty is: it’s unknown where the number is; you need to find it first
  2. There are two numbers at unknown positions; you need to find them, erase them, and write their sum. I remember that I wrote both programs. One was something like 37 commands, and the second one – ~140. Another guy from our group wrote a simulator for the Turing machine, using C (obviously, he had a computer). Then the professor asked him to enter my programs. Both worked! I was extremely happy. And we both got excellent grades for the semester – only two of us, among ~20 group mates. I came from a small town, without strong education at school, but at that point I realized that I do like programming and I can achieve good results.

I’ve got a lot of fundamental and theoretical knowledge. I cannot say that I use everything daily and cannot say that I remember everything well, but understanding Computer Science’s fundamentals is definitely helpful.

After working with Oracle and SQL Server in the early 2000s, I was lucky to be a co-founder of the historically first Russian social network, MoiKrug (acquired by Yandex). We needed to choose a free DBMS. Of course, I first chose MySQL – it was version 3.something. After 2 weeks of working with it, I almost fell into depression. But in 2005, we hired a guy from MSU, who studied astronomy, and he mentioned Oleg Bartunov and …Postgres (Oleg is also an astronomer). I spent 1 day to migrate all the codebase from MySQL to Postgres and became “Postgresman”, a Russian community activist.

Since then, I have used Postgres in all my projects (including 3 social networks: MoiKrug.ru, MirTesen.ru, and Postila.ru). Also helped dozens of companies consulting them on how to optimize performance and scalability. In 2005-2007 I was also working on my Ph.D., and the topic was “XML in the context of relational databases”. I tried to be a Postgres hacker but eventually abandoned it since I don’t like long release cycles and low-level programming (I have high respect for all people who write C code). But I managed to participate in the XML features development in 2006-2007; it was an interesting experiment for me – but eventually, others (first of all, Tom Lane, Peter Eisentraut, Pavel Stehule) did the majority of that work.

In 2007, together with Oleg Bartunov and Teodor Sigaev, I visited Ottawa and PGCon – my first talk in English with Peter Eisentraut. It was a very stressful experience. But I learned a lot. That year was life-changing – we launched regular Postgres meetups in Moscow. I started a PostgreSQL consulting practice and joined the program committee of the just-founded Highload++ conference series – I remember how I immediately invited Bruce Momjian when the idea of inviting foreign speakers popped up. Highload++ is now famous and huge (3000+ attendees, 150+ talks on a single event). Bruce became a usual suspect at Russian conferences since then, and the audience loves his talks.

I moved to California in 2013. Sold my stakes in Russian businesses, social networks, and by 2017 entirely switched to the technical topic, Database. It means – Postgres. I relaunched my consulting practice in the U.S. I quickly realized that I don’t like the fact that many tasks are poorly automated: SQL performance troubleshooting and optimization, health checks, change management, and testing of so-called database migrations (not a good term, but it’s widely used – I mean schema changes and complex data changes going through some version control system such as Flyway, Sqitch, Liquibase, Rails DB migrations, and so on). I decided that I’ll continue with consulting, but I’ll automate everything I can do. I do not want to spend my life on boring tasks – I want the highest level of automation possible. This is how Postgres.ai started.

Let me advertise the open-source projects that I (together with others from the Postgres.ai team) created following this philosophy:

  • Database Lab Engine (DLE) – this is the main product we are focusing on now. It uses thin-cloning (either ZFS, which is the default, or LVM) of PostgreSQL databases. You can clone a 10 TiB database in just 5 seconds. You can run 20 database clones on a single machine with a single data volume. Those clones are read-write and fully independent. It gives an unbelievable power for development and testing. Moreover, DLE can receive new WALs and replay them – basically, being a “super replica”, multi-head follower. It does feel like magic and allows you to perform SQL optimization tasks, verify database migrations, export data not touching production nodes, time-travel, and troubleshoot plans. I see a big future for it, and it will revolutionize how people work with Postgres daily.

  • SQL Optimization Chatbot (“Joe bot”) – a chatbot (currently available on Slack and Postges.ai GUI) that helps developers optimize SQL. It works on top of DLE. Users ask to EXPLAIN some query, Joe implicitly creates a fresh clone (usually, of production, unmodified or modified), and runs EXPLAIN, then EXPLAIN (ANALYZE, BUFFERS) there. The timing might be different from production (we’re working on the “estimator” feature right now) because of filesystem differences and differences in caches’ states. Still, the EXPLAIN plan structure and data volumes (rows, buffers) are the same. Then users can create a new index and re-check the plan again. This tool drastically changes how people optimize SQL. And how they learn SQL!

  • postgres_dba – provides an interactive menu in your psql, where you can run scripts such as bloat estimation or a list of the largest tables (it has a toolkit that includes many tools other people developed and I polished, but you can add your own scripts too).

  • postgres-checkup – a boring tool that automates regular health-checking of PostgreSQL databases. It’s something that monitoring doesn’t provide at all or usually only provides poorly. And it’s something that helps you control the database health over time: bloat, unused / redundant / invalid / missing indexes, int4 PKs, deep query analysis based on pg_stat_statements, and pg_stat_kcache, and so on. I have big plans for version 2, but currently, everyone is busy with Database Lab – so if you’re reading this and you’re ready to help (on either paid or free basis) with postgres-checkup version 2, ready to write code in Go, please reach out to me!

How do you spend your free time? What are your hobbies?

During COVID times, it’s VR. I’m a big fan of a new game, “Population:One”. Two significant differences compared to regular computer games: 1) you’re on your feet and moving a lot, 2) the microphone is on by default, and you constantly talk to people. I even have made a few new friends in various countries via this game. This is my way to survive COVID times.

My main hobby, though, is freediving. I’ve studied it at Molchanovs school, the best on the planet. And managed to hold breath for up to 5 minutes and dive up to 50 meters, not breathing using just a monofin. I’m a big fan of my teacher, Alexey Molchanov, the world champion, and multiple records holder; he just has got another record, diving 80 meters deep on one breath under the ice of Baikal lake.

Any Social Media channels of yours we should be aware of?

LinkedIn and Twitter are for Postgres stuff; everything else is for family and non-technical things.

Last book you read? Or a book you want to recommend to readers?

I don’t read fiction books — my childhood was filled with them, and it’s enough. The latest book (still reading, though): Systems Performance: Enterprise and the Cloud. All Gregg’s materials are like the Polar star to me – this is where I want to head; this is how good content should look like. And it’s partially also a reason why I’m still not blogging – I have many drafts that are just drafts that need to be improved.

I recommend not only reading but watching good technical content on YouTube. Of course, I recommend checking out our young channel, Postgres.tv – make sure you’ve checked not only the streams Ilya Kosmodemyansky and I organized but also our playlists from numerous Postgres events. This is a logical next step of our long-term efforts of maintaining the #RuPostgres community of Russian-speaking PostgreSQL users. We moved most of our activities to “purely online” a few years ago, long before the COVID lockdown happened – mainly because many Russian-speaking community members now live all over the world: Germany, the USA, Australia, and so on.

So, back to what I recommend in terms of content consumption. There is lots of good technical material on YouTube. Not only Postgres, of course. For example, check out Andy Pavlo’s courses on databases at CMU. I always have at least 10 hours of new materials to watch downloaded on my phone (YouTube Premium needed) – this is very useful when you travel or just walk your dog.

I hope we’ll have even more good technical videos in the future.

Any favorite movie or show?

Right now, it’s “Putin’s Palace” by Navalny’s team (it has English subtitles) and “Gold Mine” about Lukashenko (unfortunately, only in Russian). In general, public investigations based on publicly available information, tech, internet – it’s both a great show and big hope that our society will be developing and improving. Separately, I do like investigations by Bellingcat and particularly Christo Grozev and the fact that they use SQL extract facts from raw data (although, they chose not the best DBMS for that! :) )

How would your ideal weekend look like?

Friday evening is dinner with my wife. I try not to work on Saturdays. I have 4 kids – from 1 to 16 years old, so I always have a lot to do. Sunday morning is also family time, but usually more about family business rather than the rest. Sunday evening is when my work week already starts. It’s morning in Novosibirsk and Moscow, and the majority of the Postgres.ai team is now in those time zones.

What’s still on your bucket list?

  • Create technology that will help every business avoid downtime related to databases while making changes at a high pace. My team and I are working hard on it already, it’s called Database Lab, but we are only at the very beginning of our journey.

  • Return to regular freediving practices.

  • Start blogging about Postgres performance and scalability regularly.

Do you remember which version of PostgreSQL you started with?

Sure. 8.0, it was 2005. But especially I remember 8.3 – it was 2008, that’s when the XML data type and functions were released, and it became obvious to me that I’ll use PostgreSQL for a long time.

How do you contribute to PostgreSQL? Any contributions to PostgreSQL which do not involve writing code?

I don’t change the PostgreSQL code myself these days, unfortunately. I have a strong personal preference to work with modern tools such as GitLab or GitHub, follow workflows based on structured code review with per-line comments (MRs and PRs correspondingly), be able to use CI/CD pipelines, and so on. At some point, I decided that work via email is not for me – although I still do it once per year when translating PostgreSQL press releases to Russian, and each time it confirms my conclusions and feelings.

But I have other activities that – I believe – will help to develop the PostgreSQL ecosystem:

  • I (with the Postgres.ai team, of course) develop open-source products – see the list above. Currently, I aim to solve the problem of the unpredictability of database changes, including schema changes – and I think Database Lab (with its open-source core component, Database Lab Engine) will be the tool that will solve it for many development teams.
  • Community activities:
    • #RuPostgres
    • Postgres.tv
    • I regularly speak at conferences myself and participate as a member of program committees to make conferences better

What is your favorite PostgreSQL extension?

I love and hate pg_stat_statements .

What is the most annoying PostgreSQL thing you can think of? And any chance to fix it?

To me personally, the most annoying thing in the Postgres ecosystem and in how people work with Postgres is the presence of gaps in the optimization process and lack of good testing of changes. We need to test changes and troubleshoot/optimize SQL at a much higher pace. Much, much higher: 10x more tests, 100x more. But the traditional processes are expensive, slow, and painful. Usually, people tend to share imperfect dev & test environments, they have collisions, and eventually, they tend to skip essential checks. Hence, they get mistakes and see downtime caused by poorly tested database migrations. I explained it above – with Database Lab, I’m going to change it. Something special happens when you have 1 database and 1 engineer, not 1 database and 10 engineers.

Let me name a few:

  • Outdated tooling for development: sending emails with attached .diff or .patch files (and sometimes with .docx), lack of MRs/PRs with good search, lack of centralized CI/CD tooling. I know the consensus is different from my vision, so there are low chances that it will be changed.
  • BUFFERS is not default in EXPLAIN ANALYZE. It should be on by default – BUFFERS are very important in optimization. In OLTP databases, we aim to work with as little data as possible (this is why indexes exist). You can see row numbers, planned and actual plans in the EXPLAIN ANALYZE output, but the BUFFERS option gives you even more useful information. My advice: always write EXPLAIN (ANALYZE, BUFFERS) instead of EXPLAIN ANALYZE. Vik Fearing created a patch to make BUFFERS default. However, it was initially rejected. But there are chances it will make its way to Postgres at some point.
  • Everyone hates VACUUM and bloat. I’m no different here. However, wearing a “PostgreSQL consultant” hat, I cannot complain :)

What are the features you like most in the latest PostgreSQL version?

Btree optimization (work by Peter Geoghegan, Anastasia Lubennikova, etc.) is huge in Postgres 12 and 13.

Observability improvements – new data in pg_stat_statements, EXPLAIN output, log sampling capabilities – those are great things as well.

Unfortunately, I don’t see PostgreSQL 13 in heavily loaded systems in big projects yet. So ask me once again in a couple of years :-) Also, we had an excellent Zoom/YouTube with Ilya Kosmodemyansky, discussing Postgres 13 features.

Adding to that, what feature/mechanism would you like to see in PostgreSQL? And why?

We definitely need a way to track what particular queries (with parameters) are good examples of the most problematic query groups in pg_stat_statements – maybe including plans. You can build something with existing tools (e.g., auto_explain and deal with log parsing – perhaps with pgBadger); there are promising efforts to create a pg_stat_statements alternative, like pg_stat_monitor from Percona. But there is still a huge gap here that makes the SQL optimization process not smooth, not seamless. I discussed it in my SQL optimization training sessions at PGCon 2020 and PgConf.online 2021. And looks like people agree that this is an important topic – at the recent PgConf conference, this talk took the 1st place among 50+ based on the audience’s evaluation.

More thoughts in random order:

  • Connection Pooler needs to be in the Postgres core
  • The backup system eventually needs to be in the core (and yes, I also think that dumps are not backups, and the documentation needs to be fixed)
  • Better sharding tools are definitely needed, not sure if in the core, though (the only thing I’m jealous of MySQL users is Vitess)

Could you describe your PostgreSQL development toolbox?

I’m a big fan of the terminal. Therefore, I spend most of my time dealing with Postgres in tmux + psql (with postgres_dba and pspg, highly recommend both) + vim.

For optimization, I use many tools. Check out my “Seamless SQL Optimization” talks (slides, updated 2021-03-01) – I give an overview of many tools that can help you improve your optimization process.

And of course, I use Database Lab and work with clones a lot. Right now, I already cannot imagine working without them.