Table of Contents

Plans and Limits

AIVAX has three account plans: Free, Pro, and Max. The current plan is stored on the account and controls model access, commissions, rate limits, RAG quotas, tool limits, storage quota, conversation retention, and subscription-model reserve windows.

For commercial subscription prices and plan packaging, use the AIVAX pricing page. This page documents the technical limits implemented by the API.

How limits are enforced

Limits are enforced at different layers:

  • Authentication rejects missing, expired, or unknown API keys.
  • Public API keys are restricted to public routes and have key-level and per-IP request and token limits.
  • Balance middleware rejects billable requests when the account balance is below the required minimum.
  • Storage middleware rejects requests when account storage exceeds the plan quota.
  • Inference checks model access, request rate, input-token rate, BYOK rate, and Free-plan context size.
  • RAG checks collection count, search rate, insertion rate, and JSONL import size.
  • Built-in tools check daily service limits.
  • Batch processing checks how many workflow items can be processed per day.

Reference:

Plan summary

Feature Free Pro Max
Model access Low-price/basic models Advanced models All models
Inference commission multiplier 1.25x 1.05x 1.00x
BYOK requests 30/min 200/min Unlimited
Maximum context 65,536 input tokens enforced at runtime Model or gateway limit Model or gateway limit
Subscription model reserve None 250 units/6h and 3,000 units/week 1,000 units/6h and 15,000 units/week
Storage quota 30 MB 2 GB 20 GB
Conversation retention 2 hours 2 days 30 days
Support level Email Priority Dedicated

Integrated inference

Integrated model requests are rate-limited by request count and input tokens.

Plan Request limit Input-token limit
Free 20/min and 500/day 1,000,000/min
Pro 200/min 20,000,000/min
Max Unlimited Unlimited

Model rate-limit groups adjust request-count thresholds:

Rate-limit group Threshold multiplier
Common 1.0x
Discounted 0.5x
Low 0.3x
Free 0.1x

For example, a Pro account normally has 200 integrated-model requests per minute. With a Discounted model group, the adjusted threshold is 100 requests per minute.

BYOK inference

BYOK means the gateway uses a provider key configured on the gateway instead of an integrated AIVAX model. BYOK still passes through AIVAX infrastructure and is rate-limited by plan:

Plan BYOK request limit
Free 30/min
Pro 200/min
Max Unlimited

Public API keys

Public keys have additional limits independent of the account plan.

Scope Request limits
Per remote address 3/5s, 20/min, 300/hour, 1,000/day
Global per key 10/5s, 60/min, 1,500/hour, 10,000/day
Scope Token limits
Per remote address 100,000/5min, 500,000/30min, 2,000,000/6h, 5,000,000/day
Global per key 500,000/5min, 2,000,000/30min, 10,000,000/6h, 25,000,000/day

Public keys are only accepted on routes marked public by the backend. In this checkout, public routes include RAG semantic search, RAG answer generation, and chat completions. For chat completions, public keys also require a full AI Gateway UUID, restrict request parameters, and strip server-side tool surfaces. See Authentication.

RAG and collection limits

Feature Free Pro Max
Collections 5 Unlimited Unlimited
Semantic searches 20/min 500/min 3,000/min
Document insertions 500/day 10,000/day Unlimited
JSONL documents per import request 1,000 10,000 1,000,000
Compound file processing Not available 3 files/day 10 files/day

The JSONL import endpoint rejects a request when it reaches the plan's per-request document limit.

Built-in tool limits

Tool category Free Pro Max
Web search 15/day 1,000/day 10,000/day
X/Twitter search Not available 1,000/day 10,000/day
Advanced web search Not available 100/day 1,000/day
Document and web page generation 5/day 1,000/day 50,000/day
Image generation and editing 5/day 500/day 5,000/day
General service actions 30/day 5,000/day 100,000/day
Bash commands 30/hour 1,500/hour 10,000/hour

General service actions cover service operations wired through the shared ServicesGeneral limiter.

Batch processing

Batch imports have request-size guards, and workflow processing has a daily plan limit.

Feature Free Pro Max
Workflow items processed 500/day 100,000/day Unlimited
Files per import request 1,000 1,000 1,000
Total import size 100 MB/request 100 MB/request 100 MB/request
Single imported file size 10 MB 10 MB 10 MB

Batch is asynchronous. If processing is paused or fails because of quota, retry after the quota window resets or upgrade the account.