EDB Docs - EDB Postgres AI for ClickHouse v26.3 - Connecting ClickHouse to WarehousePG

Query WarehousePG data from ClickHouse, and write data back to WarehousePG, without ETL pipelines or data duplication. ClickHouse connects to WarehousePG over the standard Postgres wire protocol through three built-in mechanisms:

Table function — ad-hoc, one-off queries with no permanent object in ClickHouse.
Table engine — repeated access, with the connection defined once as a named table.
Database engine — exposing an entire WarehousePG database, with all tables becoming queryable automatically.

Note

All three mechanisms connect to the WarehousePG coordinator only and don't engage WarehousePG's massively parallel processing (MPP) architecture. Queries run through the coordinator as single-node Postgres queries. SELECT and INSERT are supported across all three mechanisms. UPDATE, DELETE, TRUNCATE, and DDL operations aren't supported. Make those changes directly in WarehousePG.

Configuring WarehousePG to accept connections from ClickHouse

On the coordinator host, confirm $COORDINATOR_DATA_DIRECTORY/postgresql.conf has listen_addresses set to '*' or your network range. If not, update it:
```
listen_addresses = '*'
```
Add an entry to pg_hba.conf that allows connections from your ClickHouse host or hosts. Each node in a ClickHouse cluster connects to the coordinator independently, so the entry must cover all node addresses. A CIDR range works if all nodes are on the same subnet:
```
echo "host    <database>    <user>    <clickhouse-host-network>/24    md5" >> $COORDINATOR_DATA_DIRECTORY/pg_hba.conf
```
Replace <clickhouse-host-network> with the network address of your ClickHouse host or subnet.
Reload the coordinator configuration:
```
gpstop -u
```
From any ClickHouse node, open a clickhouse-client session and verify the WarehousePG coordinator is reachable:
```
-- in ClickHouse
SELECT nspname
FROM postgresql('<coordinator-ip>:<port>', '<database>', 'pg_namespace', '<user>', '<password>', 'pg_catalog')
LIMIT 1;
```
A successful connection returns a row. An authentication error confirms the network path is open but credentials or pg_hba.conf need attention.

Using the table function

Use the postgresql table function to query a WarehousePG table inline, without creating any object in ClickHouse. Each call opens a connection, runs the statement, and closes it.

The function takes five positional arguments: host:port, database, table, user, and password. An optional sixth argument specifies the schema name and defaults to public. All arguments are positional, so you can't omit the password even if WarehousePG uses trust authentication. Pass an empty string '' in that case.

Open a clickhouse-client session on any node, then pass the connection details inline with each query:

-- in ClickHouse
SELECT *
FROM postgresql('<coordinator-ip>:<port>', '<database>', '<table>', '<user>', '<password>')
ORDER BY id;

Use INSERT INTO TABLE FUNCTION to write rows to WarehousePG:

-- in ClickHouse
INSERT INTO TABLE FUNCTION
  postgresql('<coordinator-ip>:<port>', '<database>', '<table>', '<user>', '<password>')
  (id, region, amount)
VALUES (100, 'new-row', 999);

Using the table engine

Use the PostgreSQL table engine to register a WarehousePG table as a named object in ClickHouse and query it by name. Queries go directly to WarehousePG in real time, with no data copied or cached locally.

Creating the linked table

In WarehousePG, create the table to expose to ClickHouse:

-- in WarehousePG
CREATE TABLE analytics_data (
    id         integer,
    event_type varchar(50),
    event_time timestamp,
    value      numeric(12, 4)
) DISTRIBUTED BY (id);

INSERT INTO analytics_data (id, event_type, event_time, value)
VALUES
    (1, 'page_view', '2025-01-01 10:00:00', 1.0),
    (2, 'click',     '2025-01-01 10:05:00', 2.5);

From any ClickHouse node, open a clickhouse-client session and create a table mapped to the WarehousePG table using ENGINE = PostgreSQL. On a cluster, use ON CLUSTER <cluster-name> to create the table on all nodes, where <cluster-name> is your ClickHouse cluster name (check with SELECT DISTINCT cluster FROM system.clusters):
```
-- in ClickHouse
CREATE TABLE whpg_analytics ON CLUSTER <cluster-name>
(
    id         Int32,
    event_type String,
    event_time DateTime,
    value      Decimal(12, 4)
)
ENGINE = PostgreSQL(
    '<coordinator-ip>:<port>',
    '<database>',
    'analytics_data',
    '<user>',
    '<password>'
);
```
The ENGINE = PostgreSQL syntax takes five positional arguments: host:port, database, table, user, and password. An optional sixth argument specifies the schema (defaults to public). If WarehousePG is configured with trust authentication, pass an empty string '' for the password.
In clickhouse-client, verify the table was created with the expected column types:
```
-- in ClickHouse
DESCRIBE TABLE whpg_analytics;
```
Or view the full CREATE TABLE statement:
```
-- in ClickHouse
SHOW CREATE TABLE whpg_analytics;
```
The password appears as [HIDDEN] in the output.

Reading data

Query the ClickHouse table as you would any local table:

-- in ClickHouse
SELECT * FROM whpg_analytics;

Output

┌─id─┬─event_type─┬─────────event_time─┬──────value─┐
│  1 │ page_view  │ 2025-01-01 10:00:00│     1.0000 │
│  2 │ click      │ 2025-01-01 10:05:00│     2.5000 │
└────┴────────────┴────────────────────┴────────────┘

Data inserted into WarehousePG after the ClickHouse table is created is immediately visible on the next SELECT. ClickHouse doesn't cache WarehousePG data locally when using the PostgreSQL engine.

Writing data

Use INSERT on the ClickHouse table to write data to WarehousePG. ClickHouse translates the insert into a COPY ... FROM STDIN statement on the WarehousePG side:

-- in ClickHouse
INSERT INTO whpg_analytics (id, event_type, event_time, value)
VALUES (3, 'purchase', '2025-01-01 10:10:00', 99.99);

Confirm the row is visible by querying the table in WarehousePG:

-- in WarehousePG
SELECT * FROM analytics_data;

Using the database engine

Use the PostgreSQL database engine to expose an entire WarehousePG database in ClickHouse. Every table becomes queryable automatically without individual table definitions.

Create the database:

-- in ClickHouse
CREATE DATABASE whpg_db ON CLUSTER <cluster-name>  -- ClickHouse cluster name
  ENGINE = PostgreSQL('<coordinator-ip>:<port>', '<database>', '<user>', '<password>');

List the available tables:

-- in ClickHouse
SHOW TABLES FROM whpg_db;

Query any table directly by its WarehousePG name:

-- in ClickHouse
SELECT count() FROM whpg_db.analytics_data;

Storing credentials in a named collection

Store connection credentials once in a named collection and reference it by name across queries and CREATE TABLE statements. Named collections work with all three mechanisms.

Enable named collection management for your ClickHouse user by creating /etc/clickhouse-server/users.d/named_collection_control.xml on each server node with the following content:

<clickhouse>
    <users>
        <default>
            <named_collection_control>1</named_collection_control>
        </default>
    </users>
</clickhouse>

From clickhouse-client on each server node, reload the configuration (server nodes only, not Keeper nodes):
```
-- in ClickHouse
SYSTEM RELOAD CONFIG;
SYSTEM RELOAD USERS;
```

Create the named collection on all server nodes using ON CLUSTER <cluster-name>:

-- in ClickHouse
CREATE NAMED COLLECTION whpg ON CLUSTER <cluster-name> AS
  host='<coordinator-ip>', port=<port>, user='<user>',
  password='<password>', database='<database>';

Reference it by name in any of the three mechanisms:

-- in ClickHouse

-- Table function
SELECT * FROM postgresql(whpg, table = '<table>');

-- Table engine
CREATE TABLE <table-name> (id Int32, event_type String, value Decimal(12, 4))
ENGINE = PostgreSQL(whpg, table = '<table>');

-- Database engine
CREATE DATABASE whpg_db ON CLUSTER <cluster-name>
ENGINE = PostgreSQL(whpg);

Understanding the query flow

Use EXPLAIN to see how data moves between the systems:

-- in ClickHouse
EXPLAIN SELECT id, value FROM whpg_analytics WHERE id = 1;

Output

┌─explain──────────────────────────────────────────────────────────────────────────────────────┐
│ Expression ((Project names + (Projection + Change column names to column identifiers)))       │
│   ReadFromPostgreSQL                                                                           │
└────────────────────────────────────────────────────────────────────────────────────────────────┘

The ReadFromPostgreSQL step establishes a TCP connection to the coordinator, sends the SQL query, and converts the row-oriented stream into ClickHouse's columnar format. Simple filters and column projections are pushed down to WarehousePG, so only the matching rows and columns travel over the network. Each query opens a fresh connection and closes it on completion.

Connecting ClickHouse to WarehousePG v26.3