Connecting ClickHouse to WarehousePG v26.3

Query WarehousePG data from ClickHouse, and write data back to WarehousePG, without ETL pipelines or data duplication. ClickHouse connects to WarehousePG over the standard Postgres wire protocol through three built-in mechanisms:

  • Table function — ad-hoc, one-off queries with no permanent object in ClickHouse.
  • Table engine — repeated access, with the connection defined once as a named table.
  • Database engine — exposing an entire WarehousePG database, with all tables becoming queryable automatically.
Note

All three mechanisms connect to the WarehousePG coordinator only and don't engage WarehousePG's massively parallel processing (MPP) architecture. Queries run through the coordinator as single-node Postgres queries. SELECT and INSERT are supported across all three mechanisms. UPDATE, DELETE, TRUNCATE, and DDL operations aren't supported. Make those changes directly in WarehousePG.

Configuring WarehousePG to accept connections from ClickHouse

  1. On the coordinator host, confirm $COORDINATOR_DATA_DIRECTORY/postgresql.conf has listen_addresses set to '*' or your network range. If not, update it:

    listen_addresses = '*'
  2. Add an entry to pg_hba.conf that allows connections from your ClickHouse host or hosts. Each node in a ClickHouse cluster connects to the coordinator independently, so the entry must cover all node addresses. A CIDR range works if all nodes are on the same subnet:

    echo "host    <database>    <user>    <clickhouse-host-network>/24    md5" >> $COORDINATOR_DATA_DIRECTORY/pg_hba.conf

    Replace <clickhouse-host-network> with the network address of your ClickHouse host or subnet.

  3. Reload the coordinator configuration:

    gpstop -u
  4. From any ClickHouse node, open a clickhouse-client session and verify the WarehousePG coordinator is reachable:

    -- in ClickHouse
    SELECT nspname
    FROM postgresql('<coordinator-ip>:<port>', '<database>', 'pg_namespace', '<user>', '<password>', 'pg_catalog')
    LIMIT 1;

    A successful connection returns a row. An authentication error confirms the network path is open but credentials or pg_hba.conf need attention.

Using the table function

Use the postgresql table function to query a WarehousePG table inline, without creating any object in ClickHouse. Each call opens a connection, runs the statement, and closes it.

The function takes five positional arguments: host:port, database, table, user, and password. An optional sixth argument specifies the schema name and defaults to public. All arguments are positional, so you can't omit the password even if WarehousePG uses trust authentication. Pass an empty string '' in that case.

  1. Open a clickhouse-client session on any node, then pass the connection details inline with each query:

    -- in ClickHouse
    SELECT *
    FROM postgresql('<coordinator-ip>:<port>', '<database>', '<table>', '<user>', '<password>')
    ORDER BY id;
  2. Use INSERT INTO TABLE FUNCTION to write rows to WarehousePG:

    -- in ClickHouse
    INSERT INTO TABLE FUNCTION
      postgresql('<coordinator-ip>:<port>', '<database>', '<table>', '<user>', '<password>')
      (id, region, amount)
    VALUES (100, 'new-row', 999);

Using the table engine

Use the PostgreSQL table engine to register a WarehousePG table as a named object in ClickHouse and query it by name. Queries go directly to WarehousePG in real time, with no data copied or cached locally.

Creating the linked table

  1. In WarehousePG, create the table to expose to ClickHouse:

    -- in WarehousePG
    CREATE TABLE analytics_data (
        id         integer,
        event_type varchar(50),
        event_time timestamp,
        value      numeric(12, 4)
    ) DISTRIBUTED BY (id);
    
    INSERT INTO analytics_data (id, event_type, event_time, value)
    VALUES
        (1, 'page_view', '2025-01-01 10:00:00', 1.0),
        (2, 'click',     '2025-01-01 10:05:00', 2.5);
  2. From any ClickHouse node, open a clickhouse-client session and create a table mapped to the WarehousePG table using ENGINE = PostgreSQL. On a cluster, use ON CLUSTER <cluster-name> to create the table on all nodes, where <cluster-name> is your ClickHouse cluster name (check with SELECT DISTINCT cluster FROM system.clusters):

    -- in ClickHouse
    CREATE TABLE whpg_analytics ON CLUSTER <cluster-name>
    (
        id         Int32,
        event_type String,
        event_time DateTime,
        value      Decimal(12, 4)
    )
    ENGINE = PostgreSQL(
        '<coordinator-ip>:<port>',
        '<database>',
        'analytics_data',
        '<user>',
        '<password>'
    );

    The ENGINE = PostgreSQL syntax takes five positional arguments: host:port, database, table, user, and password. An optional sixth argument specifies the schema (defaults to public). If WarehousePG is configured with trust authentication, pass an empty string '' for the password.

  3. In clickhouse-client, verify the table was created with the expected column types:

    -- in ClickHouse
    DESCRIBE TABLE whpg_analytics;

    Or view the full CREATE TABLE statement:

    -- in ClickHouse
    SHOW CREATE TABLE whpg_analytics;

    The password appears as [HIDDEN] in the output.

Reading data

Query the ClickHouse table as you would any local table:

-- in ClickHouse
SELECT * FROM whpg_analytics;
Output
┌─id─┬─event_type─┬─────────event_time─┬──────value─┐
│  1 │ page_view  │ 2025-01-01 10:00:001.0000 │
│  2 │ click      │ 2025-01-01 10:05:002.5000 │
└────┴────────────┴────────────────────┴────────────┘

Data inserted into WarehousePG after the ClickHouse table is created is immediately visible on the next SELECT. ClickHouse doesn't cache WarehousePG data locally when using the PostgreSQL engine.

Writing data

Use INSERT on the ClickHouse table to write data to WarehousePG. ClickHouse translates the insert into a COPY ... FROM STDIN statement on the WarehousePG side:

-- in ClickHouse
INSERT INTO whpg_analytics (id, event_type, event_time, value)
VALUES (3, 'purchase', '2025-01-01 10:10:00', 99.99);

Confirm the row is visible by querying the table in WarehousePG:

-- in WarehousePG
SELECT * FROM analytics_data;

Using the database engine

Use the PostgreSQL database engine to expose an entire WarehousePG database in ClickHouse. Every table becomes queryable automatically without individual table definitions.

  1. Create the database:

    -- in ClickHouse
    CREATE DATABASE whpg_db ON CLUSTER <cluster-name>  -- ClickHouse cluster name
      ENGINE = PostgreSQL('<coordinator-ip>:<port>', '<database>', '<user>', '<password>');
  2. List the available tables:

    -- in ClickHouse
    SHOW TABLES FROM whpg_db;
  3. Query any table directly by its WarehousePG name:

    -- in ClickHouse
    SELECT count() FROM whpg_db.analytics_data;

Storing credentials in a named collection

Store connection credentials once in a named collection and reference it by name across queries and CREATE TABLE statements. Named collections work with all three mechanisms.

  1. Enable named collection management for your ClickHouse user by creating /etc/clickhouse-server/users.d/named_collection_control.xml on each server node with the following content:

    <clickhouse>
        <users>
            <default>
                <named_collection_control>1</named_collection_control>
            </default>
        </users>
    </clickhouse>
  2. From clickhouse-client on each server node, reload the configuration (server nodes only, not Keeper nodes):

    -- in ClickHouse
    SYSTEM RELOAD CONFIG;
    SYSTEM RELOAD USERS;
  3. Create the named collection on all server nodes using ON CLUSTER <cluster-name>:

    -- in ClickHouse
    CREATE NAMED COLLECTION whpg ON CLUSTER <cluster-name> AS
      host='<coordinator-ip>', port=<port>, user='<user>',
      password='<password>', database='<database>';
  4. Reference it by name in any of the three mechanisms:

    -- in ClickHouse
    
    -- Table function
    SELECT * FROM postgresql(whpg, table = '<table>');
    
    -- Table engine
    CREATE TABLE <table-name> (id Int32, event_type String, value Decimal(12, 4))
    ENGINE = PostgreSQL(whpg, table = '<table>');
    
    -- Database engine
    CREATE DATABASE whpg_db ON CLUSTER <cluster-name>
    ENGINE = PostgreSQL(whpg);

Understanding the query flow

Use EXPLAIN to see how data moves between the systems:

-- in ClickHouse
EXPLAIN SELECT id, value FROM whpg_analytics WHERE id = 1;
Output
┌─explain──────────────────────────────────────────────────────────────────────────────────────┐
│ Expression ((Project names + (Projection + Change column names to column identifiers)))       │
│   ReadFromPostgreSQL                                                                           │
└────────────────────────────────────────────────────────────────────────────────────────────────┘

The ReadFromPostgreSQL step establishes a TCP connection to the coordinator, sends the SQL query, and converts the row-oriented stream into ClickHouse's columnar format. Simple filters and column projections are pushed down to WarehousePG, so only the matching rows and columns travel over the network. Each query opens a fresh connection and closes it on completion.