Last week we ran into a product degradation where a user with an enormous number of tables was seeing timeouts trying to fetch a list of their cluster’s databases from one of our API endpoints. The endpoint had been using pg_database_size()
to get sizes for each database, with these being used as a proxy for which were most active to determine the one that should take precedence in the UI.
I verified the problem locally by creating a database with a heck of a lot of tables:
$ createdb many_table_test
$ for i in {1..220000}; do
psql postgres:///many_table_test -c "CREATE TABLE table_$i (id bigserial PRIMARY KEY)";
done
And watching pg_database_size
become degenerately slow:
# select pg_database_size('many_table_test');
pg_database_size
------------------
5180509331
(1 row)
Time: 22739.724 ms (00:22.740)
23 seconds to calculate the size of a database on a fast, local disk.
Taking a look at Postgres’ source, the problem is quickly apparent. Calculating database size involves descending through every one of its files on disk and adding them all up:
/* Return physical size of directory contents, or 0 if dir doesn't exist */
static int64
db_dir_size(const char *path)
{
int64 dirsize = 0;
...
while ((direntry = ReadDir(dirdesc, path)) != NULL)
{
if (stat(filename, &fst) < 0)
...
dirsize += fst.st_size;
}
return dirsize;
}
Every table (all 200k+ of them) is a separate file:
$ psql river_test
river_test=# SELECT pg_relation_filepath('river_job');
pg_relation_filepath
----------------------
base/305728/305756
river_test=# SELECT pg_relation_filepath('river_leader');
pg_relation_filepath
----------------------
base/305728/305783
I patched the immediate problem by removing uses of pg_database_size
and falling back instead to pg_stat_database
, which includes a number of statistics that work as rough proxies for size/activity. I used xact_commit
, the number of committed transactions. A call to pg_stat_reset()
would reset the number, but in any active database it’d grow quickly again.