PostgreSQL audit logging
per user for GDPR
Answer
PostgreSQL does not log queries per named user by default. To comply with GDPR Article 32, install the pgaudit extension, add it to shared_preload_libraries, and configure pgaudit.log = 'read,write,ddl' with pgaudit.log_relation = on. This gives you a query-level log per PostgreSQL role. The problem: pg_audit logs the database role — not the human. If your application uses a shared role (app_user), every log line is identical regardless of who ran the query. This guide covers the complete setup and both structural gaps pg_audit cannot close.
What PostgreSQL logs without pg_audit
Out of the box, PostgreSQL logs connection events and server errors. Individual queries are not logged unless log_statement is set. Even with log_statement = 'all', the default log line looks like this:
2024-01-15 14:23:47 UTC [18392] app_user@production LOG: statement: SELECT id, email FROM users WHERE id = 4892
The role shown is app_user — a shared application credential used by all engineers connecting to production. There is no information about which human ran this query. Under GDPR, this log is non-compliant: it records what happened, but not who did it.
Who ran this query?
Unknown. app_user is shared across all engineers.
Which tables contain personal data?
Inferable from schema, not from log without pg_audit.
Was access authorized?
Unverifiable. No per-human policy engine.
Can this log be deleted?
Yes. Any superuser with server access can truncate it.
Installing and configuring pg_audit
pg_audit is a PostgreSQL extension maintained by the pgAudit project. It must be loaded at server start — you cannot enable it via SQL alone after the server is running.
Step 1 — Install the extension package
# Debian/Ubuntu sudo apt install postgresql-16-pgaudit # RHEL/Rocky Linux (PGDG repo) sudo dnf install pgaudit_16 # macOS — no Homebrew formula; compile from source git clone https://github.com/pgaudit/pgaudit.git && cd pgaudit make install USE_PGXS=1 PG_CONFIG=$(pg_config --bindir)/pg_config
Step 2 — Add to shared_preload_libraries
Edit postgresql.conf or use ALTER SYSTEM. A server restart is required.
-- In psql (requires superuser) ALTER SYSTEM SET shared_preload_libraries = 'pgaudit'; -- Then restart PostgreSQL, then create the extension CREATE EXTENSION IF NOT EXISTS pgaudit;
Step 3 — Configure audit parameters
-- Log reads and writes on all relations ALTER SYSTEM SET pgaudit.log = 'read,write,ddl'; -- Log the specific relation (table/view) for each statement ALTER SYSTEM SET pgaudit.log_relation = 'on'; -- Include bind parameters (required for GDPR data subject queries) ALTER SYSTEM SET pgaudit.log_parameter = 'on'; SELECT pg_reload_conf(); -- Verify settings are active SHOW pgaudit.log; -- read,write,ddl
Per-role configuration (recommended)
If you have per-operator database roles, configure auditing at the role level to avoid logging high-volume application read traffic:
-- Create per-operator roles CREATE ROLE engineer_alice LOGIN PASSWORD '...'; -- Enable auditing only for operator roles ALTER ROLE engineer_alice SET pgaudit.log = 'read,write'; ALTER ROLE engineer_alice SET pgaudit.log_relation = 'on';
The pg_audit log format
A pg_audit log entry consists of the standard PostgreSQL log line prefix followed by the AUDIT: marker and nine comma-separated fields. Here is a real example:
2024-01-15 14:23:47 UTC [18392] app_user@production LOG: AUDIT: SESSION,1,1,READ,SELECT,TABLE,public.users,"SELECT id, email, last_login FROM users WHERE id = 4892",<not logged>
2024-01-15 14:23:47 UTC [18392]Standard PostgreSQL log prefix. PID is the backend process.app_user@productionPostgreSQL role — NOT the human operator. This is the core GDPR problem.SESSIONSESSION (entire session) or OBJECT (specific object grants). See FAQ.1Integer ID for this statement within the session.1Sub-statement ID — increments for CTEs, function calls within the statement.READREAD, WRITE, FUNCTION, ROLE, DDL, MISC, or MISC_SET.SELECTThe SQL command tag: SELECT, INSERT, UPDATE, DELETE, COPY, TRUNCATE, etc.TABLETABLE, INDEX, SEQUENCE, VIEW, MATERIALIZED VIEW, COMPOSITE TYPE, FOREIGN TABLE, FUNCTION, SCHEMA, UNKNOWN.public.usersSchema-qualified object name. Requires pgaudit.log_relation = on."SELECT id, email..."The full SQL statement text.<not logged>Bind parameters if pgaudit.log_parameter = on. Essential for data subject queries.The shared role problem
Most production applications connect via a single shared role — app_user, postgres, or a service account. When an engineer connects to production via a database GUI, psql, or an internal tool, they authenticate as that shared role. Every pg_audit log line shows the same role regardless of who ran the query.
A supervisory authority reviewing a GDPR breach will ask: "Can you show us every query Alice ran on the users table on January 15?" If Alice authenticated as app_user, the answer is: no, because you cannot distinguish Alice's queries from the other 12 engineers who also connected as app_user that day.
Workaround: SET LOCAL session variable
The most common workaround is to inject the human identity as a PostgreSQL session variable at the start of each transaction:
-- Your application wraps every DB operation: BEGIN; SET LOCAL app.current_user = 'alice@company.com'; SELECT id, email, last_login FROM users WHERE id = $1; COMMIT; -- pg_audit logs the query under app_user. -- Your application log must separately record -- that alice@company.com initiated this session.
This approach has two compliance problems. First, the app.current_user value is set by application code — it is not verified by the database. Any code with database access can write any value. Second, the correlation between the session variable and the pg_audit log line lives in two separate log streams. A complete audit trail requires joining them, which introduces both technical complexity and gaps.
Approach comparison
All approaches measured against the three GDPR Article 32 requirements that matter for a supervisory authority inquiry.
| Approach | Per-user attribution | Tamper-proof | GDPR Art. 32 | Superuser deletable | Effort |
|---|---|---|---|---|---|
| PostgreSQL default logs | ❌ Role only | ❌ No | ❌ No | ✅ Yes | None |
| pg_audit (shared app role) | ❌ Role only | ❌ No | ❌ No | ✅ Yes | Low |
| pg_audit + SET LOCAL | ⚠️ App-level | ❌ No | ⚠️ Partial | ✅ Yes | Medium |
| pg_audit + per-user roles | ✅ Yes | ❌ No | ⚠️ Partial | ✅ Yes | High |
| pg_audit + external SIEM | ✅ Yes | ⚠️ Depends | ⚠️ Partial | ❌ No | Very high |
| Scalple | ✅ Yes | ✅ INSERT-only | ✅ Yes | ❌ No | None |
Two structural gaps pg_audit cannot close
Log files are superuser-deletable
pg_audit writes to the PostgreSQL log file — typically /var/log/postgresql/postgresql.log. A user with server access can truncate, rotate, or delete this file. In cloud-managed PostgreSQL, the cloud provider controls log retention. Neither is structurally tamper-evident. Under GDPR, an audit log that a superuser can modify is not a reliable record for a supervisory authority inquiry. A cloud provider's retention guarantee is contractual, not structural.
Database role ≠ human identity
pg_audit records the PostgreSQL role, not the authenticated human. The database has no concept of a human identity separate from a role. If your team uses shared credentials, per-user roles, or any connection pooler that multiplexes connections, the log cannot reliably attribute a query to a specific person. GDPR requires individual attribution. A log line showing app_user satisfies no supervisory authority.
How Scalple closes both gaps
Scalple operates as a database access layer — engineers authenticate to Scalple with their individual identity (SSO, MFA), not to PostgreSQL directly. The database role is managed by Scalple, never exposed to the engineer. Every operation is logged in an INSERT-only audit table before the query executes. The audit schema revokes DELETE and UPDATE at the database level — no superuser can modify entries. The log shows the verified human identity, the exact query, and a tamper-evident timestamp. When your DPA asks for every query Alice ran on January 15, you export it in under 30 seconds.
-- Scalple audit log (INSERT-only, human-attributed) SELECT audit_time, human_identity, -- alice@company.com, not app_user command, object_schema, object_name, statement, row_count, duration_ms FROM scalple_audit_log WHERE human_identity = 'alice@company.com' AND audit_time::date = '2024-01-15' ORDER BY audit_time;
PostgreSQL audit logging — frequently asked questions
Does PostgreSQL log queries per user by default?
No. PostgreSQL's default logging captures connection events, server errors, and slow queries (if log_min_duration_statement is configured), but does not log individual queries per user by default. Even when log_statement = 'all' is set, it logs all queries but attributes them to the PostgreSQL role — typically a shared application user like app_user or postgres — not to the human operator who initiated the request. For GDPR Article 32, attribution to a role is insufficient: you need attribution to a named, verified individual.
What does a pg_audit log line look like?
A pg_audit log entry follows the format: AUDIT: <AUDIT_TYPE>,<STATEMENT_ID>,<SUBSTATEMENT_ID>,<CLASS>,<COMMAND>,<OBJECT_TYPE>,<OBJECT_NAME>,<STATEMENT>,<PARAMETER>. A real example: '2024-01-15 14:23:47 UTC [18392] app_user@production LOG: AUDIT: SESSION,1,1,READ,SELECT,TABLE,public.users,"SELECT id, email FROM users WHERE id = 4892",<not logged>'. The prefix before AUDIT: is the standard PostgreSQL log line prefix showing timestamp, PID, and — critically — the database role (app_user), not the human operator. The PARAMETER field shows '<not logged>' unless pgaudit.log_parameter = on is set.
Is pg_audit alone sufficient for GDPR Article 32 compliance?
No, for two structural reasons. First, pg_audit logs the PostgreSQL role, not a human identity. If your application connects with a shared role, every audit line shows app_user — making the log legally non-attributable. Second, pg_audit writes to the PostgreSQL log file or syslog. Both are modifiable by a superuser. A supervisory authority will ask whether the log could have been altered after a breach. With file-based logging, it could. GDPR requires tamper-evident storage — structurally impossible to modify, not just access-controlled.
How do I attribute PostgreSQL queries to a human user, not a shared role?
There are three approaches, each with trade-offs. First: per-operator database roles — create one PostgreSQL role per human (alice, bob), issue individual credentials, and configure pgaudit.log per role. This is the most GDPR-correct but breaks most connection poolers and adds high operational overhead. Second: SET LOCAL with a session variable — at the start of each transaction, execute SET LOCAL app.current_user = 'alice@company.com', then read that variable in your log pipeline. The pg_audit log line still shows the application role, but your application log correlates the session. This is application-level only and can be spoofed. Third: a dedicated access layer like Scalple that authenticates the human before creating the database session and logs human identity independently of the database role.
What is the difference between SESSION and OBJECT audit mode in pg_audit?
SESSION audit mode (pgaudit.log = 'read,write,ddl') logs all statements of the specified classes for the entire session, regardless of which object is accessed. It is configured globally or per role. OBJECT audit mode logs statements that affect specific objects — tables, views, sequences — that have been granted to the pgaudit role. OBJECT mode is more granular (you can audit SELECT on the users table without auditing every other read) but requires explicit GRANT statements to the pgaudit role for each object you want to monitor. For GDPR compliance, you typically want SESSION mode with pgaudit.log = 'read,write' plus pgaudit.log_relation = on to capture the specific table names.
Can a database superuser delete pg_audit logs?
Yes. pg_audit writes to the standard PostgreSQL log file (typically /var/log/postgresql/postgresql.log or equivalent). A user with filesystem access to the server — including any superuser who can execute system commands via COPY TO or untrusted PLs — can truncate, rotate, or delete those files. In cloud-managed PostgreSQL (AWS RDS, Google Cloud SQL, Azure Database), the log file is managed by the cloud provider and not directly deletable by the database superuser, but it is accessible to cloud provider employees and can be disabled via configuration. No file-based logging is structurally tamper-evident. An INSERT-only database table — where DELETE and UPDATE are revoked at the database level — provides structural tamper evidence.
pg_audit is a good start. It is not an audit trail.
Scalple provides the human-attributed, INSERT-only audit trail that GDPR Article 32 requires — without per-operator database roles, SET LOCAL hacks, or SIEM integrations. Deploy in your EU infrastructure in under 10 minutes.