The patterns below cover every supported combination of vector search with keyword and relational filters. Complete the relevant setup page before using a pattern that requires native full-text search (FTS):
Vector + relational filter
Filter semantic search results by any column in the source table. The filter is applied at the same query level as the vector ORDER BY, allowing iterative index scans to produce exactly LIMIT results.
SET hnsw.iterative_scan = relaxed_order; SELECT s.id, s.description, kb.value <=> aidb.kb_query_encode('public.my_kb_vector', 'summer fashion')::vector AS distance FROM my_kb_vector kb JOIN my_source_table s ON s.id::TEXT = kb.source_id WHERE s.category_id = 2 -- relational filter on source table ORDER BY kb.value <=> aidb.kb_query_encode('public.my_kb_vector', 'summer fashion')::vector LIMIT 10;
Vector + native PostgreSQL FTS
Use standard PostgreSQL tsvector / ts_rank as the keyword leg. This example assumes a search_vector tsvector column on the source table, populated and indexed as described in Setting up native PostgreSQL FTS.
SET hnsw.iterative_scan = relaxed_order; SELECT s.id, s.title, kb.value <=> aidb.kb_query_encode('public.pipeline_docs', 'connection pooling')::vector AS distance, ts_rank_cd(s.search_vector, plainto_tsquery('english', 'connection pooling')) AS fts_score FROM public.pipeline_docs AS kb JOIN my_docs AS s ON s.id::TEXT = kb.source_id WHERE s.search_vector @@ plainto_tsquery('english', 'connection pooling') -- keyword filter ORDER BY kb.value <=> aidb.kb_query_encode('public.pipeline_docs', 'connection pooling')::vector LIMIT 10;
The WHERE clause acts as a hard keyword filter: only rows that contain the search terms are considered. Results are then ordered by semantic similarity. To surface results that match either signal — not just both — use weighted linear fusion instead.
Weighted linear fusion
Weighted linear fusion combines normalized vector similarity and full-text scores into a single composite score using two explicit weights — one for semantic relevance, one for keyword relevance — that must sum to 1.0. Unlike RRF, which uses only rank positions, weighted linear fusion respects actual score magnitudes, making it useful when you want to control exactly how much each signal contributes to the final ranking.
When to use weighted linear fusion versus RRF
| Weighted linear fusion | RRF | |
|---|---|---|
| Uses actual score values | Yes | No (rank positions only) |
| Requires score normalization | Yes | No |
| Tunable per use case | Yes (explicit weights) | Partially (smoothing constant k) |
| Robust to score scale differences | No (normalize carefully) | Yes (rank-based, scale-independent) |
| Best for | Workloads where score magnitude carries information | General-purpose merging of heterogeneous result sets |
Adjust the weights based on your workload:
| Use case | Suggested weights |
|---|---|
| Concept/meaning is most important | 0.7 semantic, 0.3 keyword |
| Exact terminology matters as much as meaning | 0.5 semantic, 0.5 keyword |
| Keyword precision is critical (for example, code or rule search) | 0.3 semantic, 0.7 keyword |
With native PostgreSQL FTS:
Because vector distances and ts_rank values use different scales, both legs are normalized to [0, 1] before combining. Cosine distance (range [0, 2]) is converted to similarity using (2.0 - distance) / 2.0. ts_rank_cd already returns values in roughly [0, 1].
WITH semantic AS ( SELECT source_id, (2.0 - (value <=> aidb.kb_query_encode('public.pipeline_docs', 'connection pooling')::vector)) / 2.0 AS vec_score FROM public.pipeline_docs ORDER BY value <=> aidb.kb_query_encode('public.pipeline_docs', 'connection pooling')::vector LIMIT 50 ), keyword AS ( SELECT id::TEXT AS source_id, ts_rank_cd(search_vector, plainto_tsquery('english', 'connection pooling')) AS fts_score FROM my_docs WHERE search_vector @@ plainto_tsquery('english', 'connection pooling') ORDER BY fts_score DESC LIMIT 50 ) SELECT COALESCE(s.source_id, k.source_id) AS id, 0.6 * COALESCE(s.vec_score, 0) + 0.4 * COALESCE(k.fts_score, 0) AS fusion_score FROM semantic s FULL OUTER JOIN keyword k ON s.source_id = k.source_id ORDER BY fusion_score DESC LIMIT 10;
Worked example: relational predicate filtering
This example shows how to combine semantic vector search with a relational WHERE filter to narrow results to a specific category. A pipeline embeds product names into a knowledge base, and queries then retrieve semantically similar results filtered by category.
Setting up the source table and pipeline
- Create the source table:
CREATE TABLE products ( id SERIAL PRIMARY KEY, product_name TEXT NOT NULL, category TEXT NOT NULL );
- Populate it with sample data:
INSERT INTO products (product_name, category) VALUES ('Hamburger', 'meat'), ('Cheeseburger', 'meat'), ('Fish n Chips', 'fish'), ('Fries', 'vegetarian'), ('Burrito', 'meat'), ('Pizza', 'meat'), ('Sandwich', 'meat'), ('Veggie Burger', 'vegetarian'), ('Kebab', 'meat');
- Create a pipeline to embed
product_nameinto a knowledge base:
SELECT aidb.create_pipeline( name => 'food_products', source => 'products', source_key_column => 'id', source_data_column => 'product_name', step_1 => 'KnowledgeBase', step_1_options => aidb.knowledge_base_config( model => 'bert', data_format => 'Text' ) );
- Run the pipeline to generate embeddings:
SELECT aidb.run_pipeline('food_products');
Standard semantic query (no filter)
This baseline query returns the top semantic matches with no constraints, so you can compare the result set against the filtered query that follows.
SELECT * FROM aidb.retrieve_text('public.pipeline_food_products', 'fast food', topk => 4);
key | value | distance | part_ids | pipeline_name | intermediate_steps
-----+---------------+--------------------+----------+---------------+--------------------
8 | Veggie Burger | 0.9955745765192575 | {0} | food_products | []
6 | Pizza | 1.0280680840689795 | {0} | food_products | []
4 | Fries | 1.0630385212203115 | {0} | food_products | []
7 | Sandwich | 1.0903594734917037 | {0} | food_products | []Custom query with relational filter
This query adds a WHERE clause on the source table to restrict results to a single category, while keeping the vector ORDER BY at the same query level so iterative index scans can satisfy the LIMIT efficiently.
SET hnsw.iterative_scan = relaxed_order; SELECT vector.source_id, source.product_name, source.category FROM public.pipeline_food_products AS vector JOIN products AS source ON source.id::TEXT = vector.source_id WHERE source.category = 'vegetarian' -- relational filter ORDER BY vector.value <=> aidb.kb_query_encode('public.pipeline_food_products', 'fast food')::vector LIMIT 10;
source_id | product_name | category -----------+---------------+------------ 8 | Veggie Burger | vegetarian 4 | Fries | vegetarian
Only vegetarian items are returned, ordered by semantic relevance to "fast food".