Mark Wong



Tags:   postgresql    postgres    pg_top    crochet    julia    pl/julia    2ndquadrant    pgus    amigurumi    pg_systat    pg_proctab   
Category:   Interviews   
Interviewed by: Andreas Scherbaum

PostgreSQL is the World’s most advanced Open Source Relational Database. The interview series “PostgreSQL Person of the Week” presents the people who make the project what it is today. Read all interviews here.

Please tell us about yourself, your hobbies and where you are from.

I am currently living in and am originally from Portland, Oregon. I am working for 2ndQuadrant who provides a range of Postgres products and services. I’ve taken up a variety of hobbies over the years. A couple of my favorites are medium and large format photography, and crocheting stuffed animals. You may be familiar with one particularly cute little elephant endearingly named Chelnik by Gabrielle Roth.

Mark Wong

Mark Wong

When did you start using PostgreSQL, and why?

I started using PostgreSQL because of work.

My career started with an internship and then full-time work at Sequent Computer Systems, after which IBM had acquired Sequent. I was then contracted out to the Open Source Development Labs (OSDL) through IBM, before being laid off and shortly hiring on directly at the OSDL.

Throughout those initial years I worked on developing and publishing TPC benchmarks at Sequent and IBM. When I got involved at the OSDL, they wanted to show where a completely open source solution based on Linux stood in the database world by using TPC derived workloads. We started with SAP DB, its name at the time, but we weren’t able to get enough traction with them and began to look at other open source database management systems.

And that led me to Postgres.

How do you contribute to PostgreSQL? Any contributions to PostgreSQL which do not involve writing code?

My contributions have changed and varied over the years.

Initially I did a lot of systems performance testing and development of benchmarking kits. I had access to fairly large systems and I was trying to help evaluate any performance related development work in the Linux and Postgres communities. My access to those kinds of resources have since changed as my employer has changed.

I got involved with the Portland PostgreSQL Users Group shortly after it was created by Selena Deckelmann and Gabrielle Roth. It’s been a great experience networking with enthusiasts, local businesses, and universities. Portland State University has a theoretical and applied database management research research group, Datalab, where some of the professors and Ph.D. candidates have joined us over the years. Today I help Grant Holly plan monthly meetings.

There are a variety of little tools that I’ve created over the years. I think the one that is the most notable is pg_top. pg_top is a Postgres specific fork of top with some Postgres specific functionality. There are a couple companion tools that I think are also worth mentioning, pg_proctab and pg_systat. pg_proctab is a Postgres extension that gives the database access to the operating system process table. pg_systat is a recent fork of systat for monitoring Postgres related statistics. I had some of that implemented in pg_top but I felt systat had an interface better suited for that purpose.

A new project of mine is PL/Julia to get Julia supported in Postgres. Maybe I can help expand the Postgres documentation on creating procedural language handlers. It’s still a work in progress but hopefully far enough long that others will get involved when I get the word out that it exists.

Over the years I’ve mentored various Postgres projects through the Google Summer of Code program with OSDL as well as the Postgres community. I don’t think I’ll have time to mentor this year, but I expect there are still plenty of good projects and people involved this year.

I’m currently serving on the United States PostgreSQL Association Board of Directors. We underwrite events like PostgresOpen and some one day events, most recently in San Francisco. In addition to that I am on the Expo Committee with Michael Brewer and Jonathan Katz that plans and staffs exhibitions for Postgres around North America.

Recently I was invited to the Postgres Funds Group where I’m one of many who reviews requests for purchasing swag, equipment for the community infrastructure, just to name a couple of things.

I believe I took a rather unconventional academic path where I let a summer job immediately after high school shape what I studied in college. I was already set to attend the School of Engineering at Tufts University but I had an internship with the Technical Marketing group at Sequent before setting off to college.

That summer was my introduction to database systems, hardware, and a great work environment. I started taking a closer look at the computer science program at Tufts and generally at the graduate level to start planning out my academic path because of my internship. What I found at the time was that most computer science graduate programs didn’t require a major in computer science, and that database systems courses were mostly found at the graduate level.

That made my decisions pretty easy at that point. I set out to find a program that would prepare me for graduate school and let me round out my skills. Four years later, I graduated from Tufts with a degree in Civil Engineering, passed the FE/EIT exam in the Commonwealth of Massachusetts, enrolled in the Computer Science and Engineering program at the Oregon Graduate Institute of Science and Technology and simultaneously started working full time. I only completed the Master’s program but tried to do it in a way so I could continue in the Ph.D. program if I was motivated enough.

I imagine it sounds odd to choose a Civil Engineering program after what I said about having an internship at a pioneering company in symmetric multiprocessing systems. The Civil Engineering program at Tufts was very accommodating to the courses I wanted to take without adding to a normal course load. It allowed me to fit in core computer science classes as well as writing, project management, and human factors courses. In hindsight I felt I made the right choice in taking a broad range of classes as opposed to taking classes in mainly one area.

Do you think Postgres has a high entry barrier? What is your advice for people who want to start PostgreSQL developing - as in, contributing to the project. Where and how should they start?

I think getting involved in relational database management systems is in general more difficult. Although there are some things that are easier these days as there is more freely available content online.

It’s harder for students to prepare because I don’t think very many universities give students enough, if any, exposure to database management systems at the undergraduate level. Operating systems courses can help someone get familiar with database management systems internals since databases can be thought of as a specialized operating system, but that’s just one aspect.

Internships and job opportunities, technical and non-technical, also tend to be in the minority because I think that is a reflection of how many database companies and database jobs there are relative to the rest of the industry.

My suggestion for those about to get started in school at the undergraduate or graduate level is to look into the career services and co-op programs available. This isn’t one sided though. Opportunities could be better if more database companies had the capacity to get involved in the same programs.

There are additional avenues for more seasoned people to network these days with resources like Meetup to help find regional communities. Projects out in the open provide more opportunities for people to just jump in with publicly available mailing lists, forums, chat rooms, source code, etc.

Which other Open Source projects are you involved or interested in?

I find operating systems interesting, in particular the things they do to make the best use of hardware. Particularly parallelism, memory management, and file systems. I’ve really only worked with Linux, but some of the recent operating systems topics that I’ve read about are DragonflyBSD, its HAMMER filesystem, and Linux’s btrfs.

collectd is a system stats collection daemon that I think is a good tool. Not only does it gather systems statistics, but you can also write custom SQL queries to run against Postgres and collect database statistics.

I analyze simple datasets on occasion, primarily system statistics, and I use that as an excuse to introduce myself to new programming languages. I found myself learning a little bit of R and Julia for doing simple statistical calculations and data visualization.

Anything else you like to add?

If you’re looking for free crochet patterns I originally started making elephants from a Lion Brand pattern. I came across another pattern that I started using sometime after.

Thanks for taking the time to do this interview with me!