Reading time: 9 minutes
PostgreSQL is the World’s most advanced Open Source Relational Database. The interview series “PostgreSQL Person of the Week” presents the people who make the project what it is today. Read all interviews here.
Please tell us about yourself, and where you are from.
My name is Frits Hoogland, I am born, raised and living in Lutjebroek, which is a tiny village in the Netherlands (of which most people don’t know it actually exists). I am married to Suzanne and have a son named Jason.
My first contact with computers was the MSX home computer with the Z80A CPU, which was an 8 bits CPU that allowed 16 bits operations by combining CPU registers. It’s probably learning some of these CPU specifics that sparked my interest into wanting to understand the fundamental working of computer systems.
After school I first went on programming, and then got into networking in the early 1990’s with Novell NetWare, which slowly moved to Windows NT because of the market situation. During that time I got interested in Unix and databases, for which (one of) the main databases at that time was Oracle. For this too I applied my approach of trying to get a fundamental understanding of how it actually works.
After spending years with Dutch consulting companies I ended up in an American company called Enkitec, where we did specialized consulting and implementation of Oracle and Oracle Exadata. The success of the company led to it getting acquired by a huge consultancy company, which ultimately led to me and many of my peers deciding to move on.
At that time I had read about Yugabyte, which is a distributed database that allows itself to be used as a SQL database, and decided to talk to them and got hired. The SQL layer of Yugabyte is entirely provided by the PostgreSQL database, which is how I got into PostgreSQL.
With PostgreSQL I try to apply my approach of trying to get a fundamental understanding of how things actually work. The PostgreSQL database being open source and therefore allowing it to be able to see and read the code greatly helps for this.
How do you spend your free time? What are your hobbies?
In my free time I like to play sports, for which I like running, and I play table tennis and participate in table tennis competitions.
I also like listening to music and playing music on bass guitar, for which my main music taste is rock and hard rock.
Any Social Media channels of yours we should be aware of?
When did you start using PostgreSQL, and why?
I started using PostgreSQL around 2007 with PostgreSQL version 7. At first it was just more personal interest, especially because of PostgreSQL being open source, which made it possible to understand what is going on by reading the sourcecode. That lead to reusing some SQL scripts that I had found from Glenn Fawcett which persisted some of the pg_stat* views for historical use, for which I extended it to store all of the available views in order to build a history. Once I was happy about it, I decided to open source the code under the name of ‘pgstatspack’.
At that time I did some work which involved PostgreSQL as a database, but once I became more specialized as Exadata consultant, this faded away.
Do you remember which version of PostgreSQL you started with?
The PostgreSQL version I started with was PostgreSQL version 7.4, which came with RedHat Enterprise Linux packages as an RPM.
What other databases are you using? Which one is your favorite?
I am (YugabyteDB is) using another database named ‘rocksdb’, which is the actual database that provides the physical storage with YugabyteDB. This is a very interesting database, because it uses LSM-Tree based processing, which is fundamentally different from PostgreSQL’s block based processing. One of the key characteristics of an LSM-Tree based storage engine is that it only ever appends: even a delete is an append to the database.
A simplified explanation of how this works is that in the YugabyteDB PostgreSQL layer, whenever we need to write data, we transform the PostgreSQL level data into a protobuf and send it to the DocDB layer. In the storage server in the DocDB layer, the protobuf is used to generate our WAL, and insert the data into the rocks db memtable, where every change to a column in a row or to the row itself such as a delete is an insert.
To read a MVCC version of a row, the read request uses the rocks db memtable or sst files to read the row data using the hybrid logical clock time. If a row is deleted, it will find the ‘tombstone’ marker of the row and consider the row deleted. If a row exists, it will read the column hybrid time consistent column version.
How do you contribute to PostgreSQL?
I am contributing to PostgreSQL specifically by researching how PostgreSQL works and interacts with the operating system. The results of that can be found in my blog.
What is your favorite PostgreSQL extension?
I do believe that extensions are aids to extend functionality when there is a specific need, and therefore for me I do not have a specific extension that is a favorite, but rather certain extensions that are needed in specific situations. If I were forced to pick one, it would probably be pg_hint_plan, because there can be situations that the planner cannot or cannot easily create the desired plan.
Having said that: most people do consider pg_stat_statements to be a core component of the database, however it is a (core) extension, and as such the most useful extension without a doubt is pg_stat_statements.
What is the most annoying PostgreSQL thing you can think of? And any chance to fix it?
The most annoying thing right now for me is that there is no statistic in the database that registers the full SQL execution time. There is a possibility to record the full SQL execution time though, via the parameter
log_min_duration_statement, but that records it in the logfile. The time registered in
pg_stat_statements is the execution phase time only for versions up to 12, which is confusingly called
total time, and more aptly named
total_exec_time in versions starting from 13. However, a SQL execution can contain 4 phases. In my opinion, when trying to understand query response time, it is very important to understand the full database processing time, not only the time in the phases that are expected to take the majority of the time.
What is the feature you like most in the latest PostgreSQL version?
The feature I am most happy with in recent PostgreSQL versions is that the current database activity view (
pg_stat_activity) can show the
query_id which is the same
query_id as queryid in
pg_stat_statements, so a query can be factually identified between the two. This can be difficult without the
pg_stat_activity contains the
raw SQL, and
pg_stat_statements contains the normalized SQL.
Do you use any git best practices, which makes working with PostgreSQL easier?
This question is probably aimed at developers. I am developing software too, but have been more of a DBA most of the time. However, I would urge people in a DBA role, or even an operator like kind of role to keep on reading: everyone using SQL and/or scripts should use git, and if they don’t: learn git. You will thank me later.
git is the standard version control system today. It is reasonably easy to use for simple things, like adding new files and adding changes, and is a bit harder for more complicated things, but learning git and learning how to use it is something that will pay off sooner or later. It is an absolute godsend to be able to see older versions of your script, or have the ability to store scripts in a central place (GitHub/GitLab), and have the ability to clone the script-set somewhere else in an extremely easy way, provided the git executable is installed.
Which PostgreSQL conferences do you visit? Do you submit talks?
I do submit talks for PostgreSQL conferences.
Recent conferences I have visited are the PostgreSQL User Group NL in Amsterdam, PGConf.EU Berlin 2022 and Scale X19 in Los Angeles. I would recommend and advise anyone interested in PostgreSQL to visit conferences in person. Not only is it fun to attend presentations and meet up with a lot of people that you otherwise would only see online, but it also supports the conference and the community, and in a lot of cases is very inspiring. If you have doubts about it, take my advice and go: I am confident you will be pleasantly surprised!
Do you think PostgreSQL will be here for many years in the future?
Nobody can predict the future. However, we can look at the current and immediate past situations. I do believe that PostgreSQL itself is solid and provides all common functionality that a relational database needs, which is supported by a steady growth figure in the db-engines ranking overview, where the other classic relational databases such as Oracle, MySQL and SqlServer are slowly declining. However, I do not know where the db-engines gets its usage data from, so I cannot judge if, and if so how biased that data is.
Another thing that stands out to me is that lots of database related projects and products are using and/or re-using the PostgreSQL query layer. To me this means PostgreSQL is considered the standard by the general database community.
Would you recommend PostgreSQL for business, or for side projects?
To me, recommending PostgreSQL or any of the products that use or reuse PostgreSQL, including the distributed database that the company I work for makes is not a question. In the past, databases like Oracle included unique features in the area of scalability and availability, and therefore were the only choice if any of these features were required. This situation has changed, and open source databases can provide most features that previously were the unique domain of proprietary databases. This doesn’t mean open source databases, including PostgreSQL, do things exactly like proprietary databases do: every database implements its features on its own, but the general features in the area of scalability and availability are there today.
So yes, I would recommend PostgreSQL for both business and for side projects, if only to have an option to move away from the annoying sales practices that can happen with proprietary databases.
Which other Open Source projects are you involved or interested in?
I am working as a developer advocate for Yugabyte, and our YugabyteDB database is completely open source. So that means that I am involved in another open source project, outside of some small personal open source projects for utilities for using with YugabyteDB.