Semantic Search
The semantic search API searches one or more collections and returns the most relevant indexed documents for the supplied search terms.
Search is performed in stages:
- Each query term is embedded with the internal
DocumentQuerytask. - Indexed documents were previously embedded with the internal
DocumentRetrievaltask. - The search prefilters candidates with compact embedding hashes.
- Candidate documents are scored with embedding similarity.
- The configured reranker can adjust the final order.
- Results are filtered by
minScore, sorted by score, and limited bytop.
After creating a collection, use its collection ID in the collections array when searching.
Warning
Semantic search incurs cost. Query embedding cost is based on the search term tokens. The smart reranker also incurs reranking cost based on the query and candidate document tokens.
Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
collections |
string[] |
Required | Collection IDs to search. Each collection must belong to the authenticated account. |
term |
string |
Required if terms is absent |
One search term. |
terms |
string[] |
Required if term is absent |
One or more search terms. |
top |
number |
5 |
Maximum number of documents returned. Current validation allows 1 to 128. |
minScore |
number |
0.2 |
Minimum final relevance score. Current validation allows values from 0.01 to 0.99. |
reranker |
string |
lexical |
lexical, smart, or none. |
includeReferences |
boolean |
false |
Includes related documents with the same reference ID when a matched document has a reference. |
The response includes the matched document ID, collection ID, document name, document content, metadata, score, and referenced documents when reference expansion is enabled.
Reranking
AIVAX can apply reranking after vector candidates are found.
Available rerankers:
| Reranker | Cost | Behavior |
|---|---|---|
none |
No reranking cost | Uses vector similarity only. |
lexical |
No reranking cost | Applies a local boost based on lexical matches, fuzzy token matches, and term proximity in the document name and content. |
smart |
Reranking cost applies | Uses Cloudflare Workers AI (@cf/baai/bge-reranker-base) to rescore candidate documents against the full query. |
The default reranker is lexical. To disable reranking, send "reranker": "none". To use the model-based reranker, send "reranker": "smart".
Note
Reranking does not search additional documents. It only reorders candidates already found by the vector search stage.
Multiple Terms
Multiple terms work as a ranked union by best match, not as a mandatory intersection.
Each document is compared against all supplied terms. The document score uses the best match among those terms. A document can rank well by matching one term strongly, even if it does not match the other terms.
Example:
Searching for:
cancelamento
multa
reembolso
in the suporte and contratos collections means:
best support or contract documents that match cancelamento or multa or reembolso
It does not mean:
documents that match cancelamento and multa and reembolso at the same time
It also does not mean:
documents that exist in both support and contracts
If the user intent is one composite idea, send that idea as one term:
cancelamento de assinatura anual sem multa
Use multiple terms when you want to cover synonyms, alternative phrasings, or several acceptable retrieval paths.
Search Quality
A complete query usually performs better than a list of disconnected keywords because it preserves the relationship between concepts.
Prefer:
como cancelar assinatura anual sem multa
Over:
cancelamento
assinatura
multa
Tune top and minScore together:
- Lower
minScorevalues return more candidates and more noise. - Higher
minScorevalues reduce noise but may return few or no results. - Higher
topvalues are useful when the answer must compare several policies, procedures, or source excerpts. - Lower
topvalues are better for direct FAQ-style answers.
If search returns poor results:
- Confirm that the documents are indexed.
- Query the collection directly before testing through an AI Gateway.
- Compare short queries, complete questions, and alternative phrasings.
- Check whether the relevant document is too short, too long, or not self-contained.
- Check whether the query language matches the document language.
- If the gateway rewrites questions before searching, test with the plain query path to isolate rewriting issues.
MCP
You can expose RAG collections as MCP (Model Context Protocol) tools. This lets compatible MCP clients search a collection directly.
Endpoint:
https://inference.aivax.net/v1/mcp/collections
Headers:
| Header | Description | Default |
|---|---|---|
Authorization |
Bearer token of your API key. | Required |
X-Mcp-Collection-Id |
One or more collection IDs. Use commas for multiple collections. | Required |
X-Mcp-Collection-Name |
Collection name used to generate tool names. | collection |
X-Mcp-Reranker |
lexical, smart, or none. |
lexical |
X-Mcp-Top-K |
Maximum number of results to return. | 5 |
X-Mcp-Min-Score |
Minimum relevance score greater than 0 and up to 1.0. | 0.4 |
X-Mcp-Use-References |
Current server behavior enables references when this header value is none; omit the header to disable references. |
disabled |
X-Mcp-Allow-Write |
Use yes to expose document write and delete tools. |
disabled |
X-Mcp-Naming-Convention |
default or agent. |
default |
Configuration Example
Visual Studio Code:
{
"servers": {
"my-rag-collection-mcp": {
"type": "http",
"url": "https://inference.aivax.net/v1/mcp/collections",
"headers": {
"Authorization": "Bearer {your_api_key}",
"X-Mcp-Collection-Id": "019b80d5-cee2-7010-ab22-f676271af866",
"X-Mcp-Collection-Name": "my_collection",
"X-Mcp-Top-K": "5",
"X-Mcp-Min-Score": "0.4",
"X-Mcp-Use-References": "none"
}
}
}
}
Generated Tools
With the default naming convention, the read tool is named:
{collection_name}_search
It accepts:
search_terms(string[]): one or more search terms.
The MCP read tool enforces two request-shaping limits:
- At most 10 search terms per call.
- At most 500 total characters across all search terms.
When X-Mcp-Allow-Write is disabled, only the search tool is exposed. This is the recommended mode for assistants that only need to read a knowledge base.
When X-Mcp-Allow-Write: yes is sent, the server also exposes document creation/update and delete tools. Enable this only for trusted clients, because a model with write access can change collection contents.
Use collection MCP when an external model or MCP client should decide when to search. For a typical AIVAX chat client, it is often simpler to attach the collection directly to the AI Gateway and let the gateway RAG pipeline retrieve documents automatically.
English
Português