Writing Hybrid search queries v7

The patterns below cover every supported combination of vector search with keyword and relational filters. Complete the relevant setup page before using a pattern that requires native full-text search (FTS):

Vector + relational filter

Filter semantic search results by any column in the source table. The filter is applied at the same query level as the vector ORDER BY, allowing iterative index scans to produce exactly LIMIT results.

SET hnsw.iterative_scan = relaxed_order;

SELECT s.id, s.description,
       kb.value <=> aidb.kb_query_encode('public.my_kb_vector', 'summer fashion')::vector AS distance
FROM my_kb_vector kb
JOIN my_source_table s ON s.id::TEXT = kb.source_id
WHERE s.category_id = 2                          -- relational filter on source table
ORDER BY kb.value <=> aidb.kb_query_encode('public.my_kb_vector', 'summer fashion')::vector
LIMIT 10;

Vector + native PostgreSQL FTS

Use standard PostgreSQL tsvector / ts_rank as the keyword leg. This example assumes a search_vector tsvector column on the source table, populated and indexed as described in Setting up native PostgreSQL FTS.

SET hnsw.iterative_scan = relaxed_order;

SELECT s.id, s.title,
       kb.value <=> aidb.kb_query_encode('public.pipeline_docs', 'connection pooling')::vector AS distance,
       ts_rank_cd(s.search_vector, plainto_tsquery('english', 'connection pooling'))           AS fts_score
FROM public.pipeline_docs AS kb
JOIN my_docs AS s ON s.id::TEXT = kb.source_id
WHERE s.search_vector @@ plainto_tsquery('english', 'connection pooling')  -- keyword filter
ORDER BY kb.value <=> aidb.kb_query_encode('public.pipeline_docs', 'connection pooling')::vector
LIMIT 10;

The WHERE clause acts as a hard keyword filter: only rows that contain the search terms are considered. Results are then ordered by semantic similarity. To surface results that match either signal — not just both — use weighted linear fusion instead.

Weighted linear fusion

Weighted linear fusion combines normalized vector similarity and full-text scores into a single composite score using two explicit weights — one for semantic relevance, one for keyword relevance — that must sum to 1.0. Unlike RRF, which uses only rank positions, weighted linear fusion respects actual score magnitudes, making it useful when you want to control exactly how much each signal contributes to the final ranking.

When to use weighted linear fusion versus RRF

Weighted linear fusionRRF
Uses actual score valuesYesNo (rank positions only)
Requires score normalizationYesNo
Tunable per use caseYes (explicit weights)Partially (smoothing constant k)
Robust to score scale differencesNo (normalize carefully)Yes (rank-based, scale-independent)
Best forWorkloads where score magnitude carries informationGeneral-purpose merging of heterogeneous result sets

Adjust the weights based on your workload:

Use caseSuggested weights
Concept/meaning is most important0.7 semantic, 0.3 keyword
Exact terminology matters as much as meaning0.5 semantic, 0.5 keyword
Keyword precision is critical (for example, code or rule search)0.3 semantic, 0.7 keyword

With native PostgreSQL FTS:

Because vector distances and ts_rank values use different scales, both legs are normalized to [0, 1] before combining. Cosine distance (range [0, 2]) is converted to similarity using (2.0 - distance) / 2.0. ts_rank_cd already returns values in roughly [0, 1].

WITH semantic AS (
    SELECT source_id,
           (2.0 - (value <=> aidb.kb_query_encode('public.pipeline_docs', 'connection pooling')::vector)) / 2.0
               AS vec_score
    FROM public.pipeline_docs
    ORDER BY value <=> aidb.kb_query_encode('public.pipeline_docs', 'connection pooling')::vector
    LIMIT 50
),
keyword AS (
    SELECT id::TEXT AS source_id,
           ts_rank_cd(search_vector, plainto_tsquery('english', 'connection pooling')) AS fts_score
    FROM my_docs
    WHERE search_vector @@ plainto_tsquery('english', 'connection pooling')
    ORDER BY fts_score DESC
    LIMIT 50
)
SELECT
    COALESCE(s.source_id, k.source_id)                             AS id,
    0.6 * COALESCE(s.vec_score, 0) + 0.4 * COALESCE(k.fts_score, 0) AS fusion_score
FROM semantic s
FULL OUTER JOIN keyword k ON s.source_id = k.source_id
ORDER BY fusion_score DESC
LIMIT 10;

Worked example: relational predicate filtering

This example shows how to combine semantic vector search with a relational WHERE filter to narrow results to a specific category. A pipeline embeds product names into a knowledge base, and queries then retrieve semantically similar results filtered by category.

Setting up the source table and pipeline

  1. Create the source table:
CREATE TABLE products (
    id           SERIAL PRIMARY KEY,
    product_name TEXT NOT NULL,
    category     TEXT NOT NULL
);
  1. Populate it with sample data:
INSERT INTO products (product_name, category) VALUES
    ('Hamburger',    'meat'),
    ('Cheeseburger', 'meat'),
    ('Fish n Chips',  'fish'),
    ('Fries',         'vegetarian'),
    ('Burrito',       'meat'),
    ('Pizza',         'meat'),
    ('Sandwich',      'meat'),
    ('Veggie Burger', 'vegetarian'),
    ('Kebab',         'meat');
  1. Create a pipeline to embed product_name into a knowledge base:
SELECT aidb.create_pipeline(
    name               => 'food_products',
    source             => 'products',
    source_key_column  => 'id',
    source_data_column => 'product_name',
    step_1             => 'KnowledgeBase',
    step_1_options     => aidb.knowledge_base_config(
                              model       => 'bert',
                              data_format => 'Text'
                          )
);
  1. Run the pipeline to generate embeddings:
SELECT aidb.run_pipeline('food_products');

Standard semantic query (no filter)

This baseline query returns the top semantic matches with no constraints, so you can compare the result set against the filtered query that follows.

SELECT * FROM aidb.retrieve_text('public.pipeline_food_products', 'fast food', topk => 4);
Output
 key |     value     |      distance      | part_ids | pipeline_name | intermediate_steps
-----+---------------+--------------------+----------+---------------+--------------------
 8   | Veggie Burger | 0.9955745765192575 | {0}      | food_products | []
 6   | Pizza         | 1.0280680840689795 | {0}      | food_products | []
 4   | Fries         | 1.0630385212203115 | {0}      | food_products | []
 7   | Sandwich      | 1.0903594734917037 | {0}      | food_products | []

Custom query with relational filter

This query adds a WHERE clause on the source table to restrict results to a single category, while keeping the vector ORDER BY at the same query level so iterative index scans can satisfy the LIMIT efficiently.

SET hnsw.iterative_scan = relaxed_order;

SELECT vector.source_id, source.product_name, source.category
FROM public.pipeline_food_products AS vector
JOIN products AS source ON source.id::TEXT = vector.source_id
WHERE source.category = 'vegetarian'                                                         -- relational filter
ORDER BY vector.value <=> aidb.kb_query_encode('public.pipeline_food_products', 'fast food')::vector
LIMIT 10;
Output
 source_id | product_name  |  category
-----------+---------------+------------
 8         | Veggie Burger | vegetarian
 4         | Fries         | vegetarian

Only vegetarian items are returned, ordered by semantic relevance to "fast food".