Query WarehousePG data from ClickHouse, and write data back to WarehousePG, without ETL pipelines or data duplication. ClickHouse connects to WarehousePG over the standard Postgres wire protocol through three built-in mechanisms:
- Table function — ad-hoc, one-off queries with no permanent object in ClickHouse.
- Table engine — repeated access, with the connection defined once as a named table.
- Database engine — exposing an entire WarehousePG database, with all tables becoming queryable automatically.
Note
All three mechanisms connect to the WarehousePG coordinator only and don't engage WarehousePG's massively parallel processing (MPP) architecture. Queries run through the coordinator as single-node Postgres queries. SELECT and INSERT are supported across all three mechanisms. UPDATE, DELETE, TRUNCATE, and DDL operations aren't supported. Make those changes directly in WarehousePG.
Configuring WarehousePG to accept connections from ClickHouse
On the coordinator host, confirm
$COORDINATOR_DATA_DIRECTORY/postgresql.confhaslisten_addressesset to'*'or your network range. If not, update it:listen_addresses = '*'
Add an entry to
pg_hba.confthat allows connections from your ClickHouse host or hosts. Each node in a ClickHouse cluster connects to the coordinator independently, so the entry must cover all node addresses. A CIDR range works if all nodes are on the same subnet:echo "host <database> <user> <clickhouse-host-network>/24 md5" >> $COORDINATOR_DATA_DIRECTORY/pg_hba.conf
Replace
<clickhouse-host-network>with the network address of your ClickHouse host or subnet.Reload the coordinator configuration:
gpstop -uFrom any ClickHouse node, open a
clickhouse-clientsession and verify the WarehousePG coordinator is reachable:-- in ClickHouse SELECT nspname FROM postgresql('<coordinator-ip>:<port>', '<database>', 'pg_namespace', '<user>', '<password>', 'pg_catalog') LIMIT 1;
A successful connection returns a row. An authentication error confirms the network path is open but credentials or
pg_hba.confneed attention.
Using the table function
Use the postgresql table function to query a WarehousePG table inline, without creating any object in ClickHouse. Each call opens a connection, runs the statement, and closes it.
The function takes five positional arguments: host:port, database, table, user, and password. An optional sixth argument specifies the schema name and defaults to public. All arguments are positional, so you can't omit the password even if WarehousePG uses trust authentication. Pass an empty string '' in that case.
Open a
clickhouse-clientsession on any node, then pass the connection details inline with each query:-- in ClickHouse SELECT * FROM postgresql('<coordinator-ip>:<port>', '<database>', '<table>', '<user>', '<password>') ORDER BY id;
Use
INSERT INTO TABLE FUNCTIONto write rows to WarehousePG:-- in ClickHouse INSERT INTO TABLE FUNCTION postgresql('<coordinator-ip>:<port>', '<database>', '<table>', '<user>', '<password>') (id, region, amount) VALUES (100, 'new-row', 999);
Using the table engine
Use the PostgreSQL table engine to register a WarehousePG table as a named object in ClickHouse and query it by name. Queries go directly to WarehousePG in real time, with no data copied or cached locally.
Creating the linked table
In WarehousePG, create the table to expose to ClickHouse:
-- in WarehousePG CREATE TABLE analytics_data ( id integer, event_type varchar(50), event_time timestamp, value numeric(12, 4) ) DISTRIBUTED BY (id); INSERT INTO analytics_data (id, event_type, event_time, value) VALUES (1, 'page_view', '2025-01-01 10:00:00', 1.0), (2, 'click', '2025-01-01 10:05:00', 2.5);
From any ClickHouse node, open a
clickhouse-clientsession and create a table mapped to the WarehousePG table usingENGINE = PostgreSQL. On a cluster, useON CLUSTER <cluster-name>to create the table on all nodes, where<cluster-name>is your ClickHouse cluster name (check withSELECT DISTINCT cluster FROM system.clusters):-- in ClickHouse CREATE TABLE whpg_analytics ON CLUSTER <cluster-name> ( id Int32, event_type String, event_time DateTime, value Decimal(12, 4) ) ENGINE = PostgreSQL( '<coordinator-ip>:<port>', '<database>', 'analytics_data', '<user>', '<password>' );
The
ENGINE = PostgreSQLsyntax takes five positional arguments:host:port, database, table, user, and password. An optional sixth argument specifies the schema (defaults topublic). If WarehousePG is configured withtrustauthentication, pass an empty string''for the password.In
clickhouse-client, verify the table was created with the expected column types:-- in ClickHouse DESCRIBE TABLE whpg_analytics;
Or view the full
CREATE TABLEstatement:-- in ClickHouse SHOW CREATE TABLE whpg_analytics;
The password appears as
[HIDDEN]in the output.
Reading data
Query the ClickHouse table as you would any local table:
-- in ClickHouse SELECT * FROM whpg_analytics;
┌─id─┬─event_type─┬─────────event_time─┬──────value─┐ │ 1 │ page_view │ 2025-01-01 10:00:00│ 1.0000 │ │ 2 │ click │ 2025-01-01 10:05:00│ 2.5000 │ └────┴────────────┴────────────────────┴────────────┘
Data inserted into WarehousePG after the ClickHouse table is created is immediately visible on the next SELECT. ClickHouse doesn't cache WarehousePG data locally when using the PostgreSQL engine.
Writing data
Use INSERT on the ClickHouse table to write data to WarehousePG. ClickHouse translates the insert into a COPY ... FROM STDIN statement on the WarehousePG side:
-- in ClickHouse INSERT INTO whpg_analytics (id, event_type, event_time, value) VALUES (3, 'purchase', '2025-01-01 10:10:00', 99.99);
Confirm the row is visible by querying the table in WarehousePG:
-- in WarehousePG SELECT * FROM analytics_data;
Using the database engine
Use the PostgreSQL database engine to expose an entire WarehousePG database in ClickHouse. Every table becomes queryable automatically without individual table definitions.
Create the database:
-- in ClickHouse CREATE DATABASE whpg_db ON CLUSTER <cluster-name> -- ClickHouse cluster name ENGINE = PostgreSQL('<coordinator-ip>:<port>', '<database>', '<user>', '<password>');
List the available tables:
-- in ClickHouse SHOW TABLES FROM whpg_db;
Query any table directly by its WarehousePG name:
-- in ClickHouse SELECT count() FROM whpg_db.analytics_data;
Storing credentials in a named collection
Store connection credentials once in a named collection and reference it by name across queries and CREATE TABLE statements. Named collections work with all three mechanisms.
Enable named collection management for your ClickHouse user by creating
/etc/clickhouse-server/users.d/named_collection_control.xmlon each server node with the following content:<clickhouse> <users> <default> <named_collection_control>1</named_collection_control> </default> </users> </clickhouse>
From
clickhouse-clienton each server node, reload the configuration (server nodes only, not Keeper nodes):-- in ClickHouse SYSTEM RELOAD CONFIG; SYSTEM RELOAD USERS;
Create the named collection on all server nodes using
ON CLUSTER <cluster-name>:-- in ClickHouse CREATE NAMED COLLECTION whpg ON CLUSTER <cluster-name> AS host='<coordinator-ip>', port=<port>, user='<user>', password='<password>', database='<database>';
Reference it by name in any of the three mechanisms:
-- in ClickHouse -- Table function SELECT * FROM postgresql(whpg, table = '<table>'); -- Table engine CREATE TABLE <table-name> (id Int32, event_type String, value Decimal(12, 4)) ENGINE = PostgreSQL(whpg, table = '<table>'); -- Database engine CREATE DATABASE whpg_db ON CLUSTER <cluster-name> ENGINE = PostgreSQL(whpg);
Understanding the query flow
Use EXPLAIN to see how data moves between the systems:
-- in ClickHouse EXPLAIN SELECT id, value FROM whpg_analytics WHERE id = 1;
┌─explain──────────────────────────────────────────────────────────────────────────────────────┐ │ Expression ((Project names + (Projection + Change column names to column identifiers))) │ │ ReadFromPostgreSQL │ └───────────── ───────────────────────────────────────────────────────────────────────────────────┘
The ReadFromPostgreSQL step establishes a TCP connection to the coordinator, sends the SQL query, and converts the row-oriented stream into ClickHouse's columnar format. Simple filters and column projections are pushed down to WarehousePG, so only the matching rows and columns travel over the network. Each query opens a fresh connection and closes it on completion.