Fun with backups

Backing a large database up can be tricky.

The default Postgres backup facility works very well on relatively small databases. However, as the complexity increases and the number of tables grows, it becomes slower and slower. On a 10,000-table database pg_dumpall could spend several hours just on gathering database structure information.

One particular reason for that is locking: pg_dump sets a lock on every single table it backs up, and those locks are expensive. A remedy to this problem could be disabling table locks if pg_dump supported such an option. Unfortunately, it does not, so there is no choice but brute force.

Grab the sources, find pg_dump.c, locate and comment out this fragment completely:

if (tblinfo[i].dobj.dump && tblinfo[i].relkind == RELKIND_RELATION)
         {
             resetPQExpBuffer(query);
             appendPQExpBuffer(query,
                               "LOCK TABLE %s IN ACCESS SHARE MODE",
                          fmtQualifiedId(tblinfo[i].dobj.namespace->dobj.name,
                                         tblinfo[i].dobj.name));
             do_sql_command(g_conn, query->data);
         }

Then rebuild the whole source tree without installing it (just make should be enough), locate pg_dump executable, rename it to something like pg_dump_nolock and place under /usr/local/bin or similar location.

The performance gain depends on the schema size, in my case it was more than 100%.

This approach is not for everybody, though. As no locks are applied to the tables, no backup consistency is guaranteed. It has to be ensured by some other means like time-split backup/upgrade procedures, filesystem-level locks, etc. This, however, is rarely an issue: production database schemas don't change often.

Dynamic queries: the Postgres way

Have you ever been in a situation when you needed a query to be generated and executed dynamically as a result of another query?

If yes, read further: there is a simple and elegant way to achieve that.

First, create this function:

CREATE OR REPLACE FUNCTION dselect(varchar)
  RETURNS SETOF record AS $$
    DECLARE rec record;
    BEGIN
      FOR rec IN EXECUTE $1 LOOP
        RETURN NEXT rec;
      END LOOP;
    END
  $$ LANGUAGE 'plpgsql';

Second, umm.. well, that's it.

Anything you pass as a parameter will be interpreted and executed as an SQL statement.

The function is SELECTable. That is, you can use it in SELECTs:

SELECT * FROM dselect('SELECT id, name FROM users') AS t(id int, name varchar);

Please note the "AS t(id int, name varchar)" part. Postgres has no idea about what this function returns, so a column definition list should be provided. If not, Postgres will complain:

SELECT * FROM dselect('SELECT id, name FROM users');
ERROR:  a column definition list is required for functions returning "record"

Of course, the column definition list depends on the query and should match the actual query results.

So why this function is needed at all?

Because in situations like this:

CREATE SCHEMA sc_foo;
CREATE TABLE sc_foo.activity (date date, descr text);
INSERT INTO sc_foo.activity (date, descr) VALUES ('2009-08-07', 'went there');
INSERT INTO sc_foo.activity (date, descr) VALUES ('2009-06-04', 'hanging around');

CREATE SCHEMA sc_bar;
CREATE TABLE sc_bar.activity (date date, descr text);
INSERT INTO sc_bar.activity (date, descr) VALUES ('2009-10-11', 'came here');

You can do this:

SELECT 
    nspname 
FROM
    pg_namespace 
WHERE
    nspname LIKE 'sc_%'  AND
    (SELECT date FROM dselect('SELECT max(date) FROM ' || nspname || '.activity') AS t(date date)) > '2009-09-09';

 nspname 
---------
 sc_bar
(1 row)

In one pass this query goes over all the schemas whose names start with 'sc_', grabs the latest date from the schema's activity tables and matches the result against the provided date.

Of course, this approach is quite ineffective. Each time the function is called, a query is parsed and executed using a separate plan. A simple UNION of two queries would do the same, but... what if there are ten schemas? How about a hundred? I'm actually working with a database containing thousands of them.

I use this function when I need to collect statistics or do some db administration tasks, it saves me a lot of time.

So you want it fast

Postgres can be slow sometimes. There could be a whole bunch of reasons why: a slow disk system, suboptimal SQL queries or even more subtle things like inefficient schema layout.

But before diving into the realm of fine SQL tuning it is always worth to make one simple check. It costs nothing yet it can save countless hours of chasing non-existing issues in non-relevant areas.

Where is my cache, dude?

Postgres is sensitive to the filesystem cache. To be completely precise, any RDBMS that stores data in the file system (as opposed to storing it in a separate partition) suffers from the file system cache issues.

Please note, this is in no way related to the Postgres shared buffers, which is a totally different thing. What I am talking about is the operating system file cache, an area in the main memory where the operating system stores often read and written files for quicker access. The size of that area is extremely important for the normal Postgres work.

So, here is the very first step towards blazing fast Postgres:

$ cat /proc/meminfo | grep -P '(^Mem|^Cached)'

You will see something like this:

MemTotal:       509884 kB
MemFree:        177016 kB
Cached:         254920 kB

This is an example of a healthy, albeit small system. In fact, it is one of my virtual machines. It has enough free memory for emergencies like vacuums; it has about 250 megabytes of the file system cache, everything is fine.

Now look at this:

$ cat /proc/meminfo | grep -P '(^Mem|^Cached)'
MemTotal:      1747764 kB
MemFree:        14376 kB
Cached:         7944 kB

See the problem? Nearly all the memory is taken, apparently by some processes. Because no memory is left for caching, there is nothing to speed up file access with. How can Postgres possibly be fast in this environment?

By the way, the same information (and a whole lot more) can be obtained with the top utility. It's an interactive program, but it can be run in the batch mode: top -b -n 1

Here is how the top's output looks like:

top - 18:24:50 up  4:56,  1 user,  load average: 0.00, 0.00, 0.00
Tasks:  89 total,   1 running,  88 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.1%sy,  0.1%ni, 99.6%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    509884k total,   333180k used,   176704k free,    11976k buffers
Swap:  548568k total,        0k used,  548568k free,   255152k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                     
1 root      20   0 10308  716  600 S  0.0  0.1   0:01.18 init                                                        
2 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kthreadd                                                    
3 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 migration/0
.....

The memory usage information is in the top part.

The line starting with "Mem:" contains some essential memory information. This line is self-explanatory:

Mem:    509884k total,   333180k used,   176704k free,    11976k buffers

The "buffers" field shows how much memory is allocated for... well... the system buffers. It is a specialized area in the memory being kept just in case. When something happens, like a program allocates a lot of memory, this area is used first. The system slices an appropriate chunk of memory off the buffers and gives it to the program. To remedy the loss it usually drops a part of the file system cache.

When there is no cache left, the system starts unloading inactive processes to the disk. This process is called swapping and is very hurtful for the system performance. By any means swapping should be avoided.

The line coming after the memory status is about Swap--the area on the disk where swapped processed are loaded out. It looks like this:

Swap:  548568k total,        0k used,  548568k free,   255152k cached

Zero swap usage, all the swap area is free. That's fine, this is how it supposed to be.

Some people suggest to turn swapping off completely by disabling the swap area. It's not a good idea. Swap is a safety measure designed to prevent system crashes in situations where no physical memory is left. It is pretty much like a safety belt: having it on will not prevent a system from accidents, but will let its data survive.

The last number in the line is for the file system cache. Oddly enough, it appears here among swap metrics, which confuses a lot of people, including myself.

Now a few words on how to make the file system cache bigger. Unfortunately there is no magical config parameter for that. The system will make the decision itself based on how much free memory is available. It's a wise idea not to allow processes take more than 50% of the physical memory. Say there is 1G of RAM. Very roughly the postmaster process takes about 30M, ten connections about 10M each, various Postgres services 50M altogether, then a 150M shared memory area for the Postgres shared buffers. Add 150-200M for various operating system services. All this gives about 500M of the process memory while the rest half of the gigabyte is available for caches and buffers. An ideal situation for a database under 500M.

If processes take more than 1/2 of the main memory, the situation can be improved by eliminating unnecessary ones (slocate is the candidate #1), decreasing the Postgres shared buffers and so on. It's a good idea not to have Postgres and Apache running on the same machine for this very reason.

Surprisingly often, this is all you need to make Postgres a happier, faster program.

Installing Tsearch2 on Postgres 8.2

Before we start I assume you know how to create new PostgresSQL databases and have access to the root and postgres system accounts.

Among other new shiny features, PostgreSQL 8.3 has Tsearch2 as a contrib package; that is, full text search is available in 8.3 out of the box. That’s really awesome news because enabling it manually on 8.2 databases can be tricky. If you have any chance to upgrade to 8.3, do it. This also makes sense because the latest version runs much, much faster.

Tsearch2 is a part of the postgresql-contrib package. Assuming your operating system is Ubuntu, simply run sudo apt-get install postgresql-contrib and that’s it.

We need a database to start with. It’s really a good idea to try it first on a separate test database. If something goes wrong, you always can drop it and create a new test database without losing any valuable data.

So, let’s create our test database:

$ createdb -E utf8 test

I always use utf8 as the database encoding unless I have a really good reason not to do that. Using utf8 will save you lots of time and probably some gray hair. Trust me, utf8 is your friend; I’ve learned that the hard way.

Now, if you are a happy 8.3 user then all is ready to go. If for some reason the upgrade is not possible, you will have to configure your databases manually to enable Tsearch2.

This is how. First, enable plpgsql, one of the stored procedure languages:

$ sudo -u postgres psql test -c "create language plpgsql"

Now, create service Tsearch2 tables and all necessary supporting functions, data types, etc.

$ sudo -u postgres psql test < /usr/share/postgresql/8.2/contrib/Tsearch2.sql

You have to do both things as a postgres user because otherwise you will get a bunch of permission errors. Unfortunately, all the new objects will be owned by the postrgres user, which makes them essentially unusable by anyone else. To solve this problem, use Eugene Morozov’s grantall.sh script.

#!/bin/sh
# Usage: grantall database role
OBJECTS=$(psql -t -c '\dt' $1 | cut -f 4 -d ' ')
OBJECTS="$OBJECTS $(psql -t -c '\ds' $1 | cut -f 4 -d ' ')"
OBJECTS=$(echo $OBJECTS | sed 's/ /, /g')
SQL="GRANT ALL ON $OBJECTS TO $2"
psql -c "$SQL" $1
psql -c "GRANT ALL ON SCHEMA public TO $2" $1

This script is so useful, so I’d suggest saving it as ~/bin/grantall.sh. Now make it executable:

$ chmod a+x ~/bin/grantall.sh

Then run it.

$ sudo -u postgres ~/bin/grantall.sh test user

User is your database user name, which almost certainly matches your system user name.

Doing this every time you want to enable full text search in a database can be annoying. There is a neat trick that can help. When PostgeSQL creates a new database it actually makes a copy of a special template database, template1 by default. If you install Tsearch2 there, every time you create a new database, it will automatically copy Tsearch2 objects from there.

Unfortunately, the copied objects will retain their original permissions, which means even though you installed Tsearch2 into the template1 database, you still have to run the grantall.sh script.

Now it’s time to start a PostgreSQL command line. Let’s say ‘test’ is the database’s name.

$ psql test
Welcome to psql 8.2.7, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

test=>

Let’s check that it is working properly.

test=> select to_tsvector('simple', 'full text search') @@ to_tsquery('simple', 'text');
?column?
----------
t
(1 row)

We’re there, Hurray!