Inference API provides an OpenAI-compatible interface for chat completions, streaming chat completions, server-side web search, embeddings, and model listing. Standard OpenAI-compatible SDKs and HTTP clients can connect to the API by using Tempico endpoints and a Tempico API key.
Authentication
All API requests require an API key. Bearer authentication is the recommended method.
For compatibility, the API also supports x-api-key: <API_KEY> header.
List models
Returns available model IDs for the current account. Returned id values can be used as the model field in chat completion and embedding requests.
{
"object": "list",
"data": [
{
"id": "kimi-k2.7-code:1t",
"object": "model",
"created": 0,
"owned_by": "tempicolabs",
"context_window": 262144,
"capabilities": [
"chat"
],
"max_output_tokens": 65536
},
{
"id": "embeddinggemma:300m",
"object": "model",
"created": 0,
"owned_by": "tempicolabs",
"context_window": 2048,
"capabilities": [
"embeddings"
]
}
]
}
| Field | Description |
id | Model ID used in API requests. |
created | Model creation timestamp when available. |
context_window | Maximum context length supported by the model, in tokens. |
capabilities | Supported API features for the model, such as chat or embeddings. |
max_output_tokens | Maximum output token limit when available. |
Chat completions
Generates a model response from a conversation in OpenAI chat format. The endpoint supports standard JSON responses, streaming output, and server-side web search.
Web search
Server-side web search works as a tool available to the model during chat completion generation. For most chat models, web search is enabled by default. The model decides when search is needed and what query should be used.
Search behavior can be controlled through the prompt. System or user messages can define when search should be used, what sources to prefer, and when the model should answer without searching.
Set web_search to false to disable server-side search for a specific request.
curl https://api.tempico.com/v1/chat/completions \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "kimi-k2.7-code:1t", "messages": [ { "role": "system", "content": "You are a concise technical assistant." }, { "role": "user", "content": "Search the web and summarize the current Python 3.13 release status." } ], "max_tokens": 800, "temperature": 0.2, "web_search": true, "web_search_options": { "search_context_size": "medium", "user_location": { "type": "approximate", "approximate": { "country": "US" } }, "safesearch": "moderate" } }'
| Field | Description |
model | Model that generated the response. |
messages | Conversation history sent to the model in OpenAI chat format. |
max_tokens | Maximum number of tokens the model can generate in the response. |
web_search | Enables or disables server-side web search. Enabled by default for most models. |
search_context_size | Amount of search context passed to the model. Supported values are low, medium, and high. |
country | Country code used as the location hint. |
safesearch | Search safety preference passed to the search backend. |
accept_language | Language preference for search results. |
Example Response
{
"id": "chatcmpl-0000000000000000",
"object": "chat.completion",
"created": 0,
"model": "kimi-k2.7-code:1t",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The answer text appears here."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 120,
"completion_tokens": 64,
"total_tokens": 184
}
}
| Field | Description |
finish_reason | Reason generation stopped, such as reaching a stop condition or token limit. |
usage | Token usage information for the request. |
prompt_tokens | Number of input tokens processed by the model. |
completion_tokens | Number of output tokens generated by the model. |
total_tokens | Sum of input and output tokens. |
Streaming chat completions
Returns a chat completion as a stream of Server-Sent Events. Each event contains a partial response chunk. The stream ends with data: [DONE].
curl https://api.tempico.com/v1/chat/completions \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "kimi-k2.7-code:1t", "stream": true, "messages": [ { "role": "user", "content": "Write a short Python example using requests." } ] }'
Example Stream Response
data: {
"id": "chatcmpl-0000000000000000",
"object": "chat.completion.chunk",
"created": 0,
"model": "kimi-k2.7-code:1t",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant"
},
"finish_reason": null
}
]
}
data: {
"id": "chatcmpl-0000000000000000",
"object": "chat.completion.chunk",
"created": 0,
"model": "kimi-k2.7-code:1t",
"choices": [
{
"index": 0,
"delta": {
"content": "import requests\n\n"
},
"finish_reason": null
}
]
}
data: {
"id": "chatcmpl-0000000000000000",
"object": "chat.completion.chunk",
"created": 0,
"model": "kimi-k2.7-code:1t",
"choices": [
{
"index": 0,
"delta": {},
"finish_reason": "stop"
}
]
}
data: [DONE]
| Field | Description |
stream | Enables streaming mode when set to true. The response is returned as Server-Sent Events instead of one JSON object. |
choices | Array of streamed output choices. For normal single-response generation, this usually contains one item. |
index | Position of the choice in the choices array. |
delta | Incremental update for the assistant message. This object can contain role, content, or be empty in the final chunk. |
finish_reason | Indicates why generation ended. The value is null while generation is still running. |
data: [DONE] | Final stream marker. No more chunks are sent after this event. |
Embeddings
Embeddings convert text input into vectors for semantic search, similarity matching, clustering, and ranking.
curl https://api.tempico.com/v1/embeddings \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "embeddinggemma:300m", "input": "Represent this text as a vector for semantic search." }'
Example Response
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
0.0123,
-0.0456,
0.0789
]
}
],
"model": "embeddinggemma:300m",
"usage": {
"prompt_tokens": 12,
"total_tokens": 12
}
}
| Field | Description |
input | Text or array of texts used to create embeddings. |
embedding | Vector representation of the input text. |
prompt_tokens | Number of input tokens processed by the model. |
total_tokens | Total tokens counted for the request. |