Inference API

Inference API provides an OpenAI-compatible interface for chat completions, streaming chat completions, server-side web search, embeddings, and model listing. Standard OpenAI-compatible SDKs and HTTP clients can connect to the API by using Tempico endpoints and a Tempico API key.

Authentication

All API requests require an API key. Bearer authentication is the recommended method.

Authorization: Bearer <API_KEY>

For compatibility, the API also supports x-api-key: <API_KEY> header.

List models

Returns available model IDs for the current account. Returned id values can be used as the model field in chat completion and embedding requests.

GET
https://api.tempico.com/v1/models
{
  "object": "list",
  "data": [
    {
      "id": "kimi-k2.7-code:1t",
      "object": "model",
      "created": 0,
      "owned_by": "tempicolabs",
      "context_window": 262144,
      "capabilities": [
        "chat"
      ],
      "max_output_tokens": 65536
    },
    {
      "id": "embeddinggemma:300m",
      "object": "model",
      "created": 0,
      "owned_by": "tempicolabs",
      "context_window": 2048,
      "capabilities": [
        "embeddings"
      ]
    }
  ]
}
FieldDescription
idModel ID used in API requests.
createdModel creation timestamp when available.
context_windowMaximum context length supported by the model, in tokens.
capabilitiesSupported API features for the model, such as chat or embeddings.
max_output_tokensMaximum output token limit when available.

Chat completions

Generates a model response from a conversation in OpenAI chat format. The endpoint supports standard JSON responses, streaming output, and server-side web search.

POST
https://api.tempico.com/v1/chat/completions
Web search

Server-side web search works as a tool available to the model during chat completion generation. For most chat models, web search is enabled by default. The model decides when search is needed and what query should be used.

Search behavior can be controlled through the prompt. System or user messages can define when search should be used, what sources to prefer, and when the model should answer without searching.

Set web_search to false to disable server-side search for a specific request.

curl https://api.tempico.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.7-code:1t",
    "messages": [
      {
        "role": "system",
        "content": "You are a concise technical assistant."
      },
      {
        "role": "user",
        "content": "Search the web and summarize the current Python 3.13 release status."
      }
    ],
    "max_tokens": 800,
    "temperature": 0.2,
    "web_search": true,
    "web_search_options": {
      "search_context_size": "medium",
      "user_location": {
        "type": "approximate",
        "approximate": {
          "country": "US"
        }
      },
      "safesearch": "moderate"
    }
  }'
FieldDescription
modelModel that generated the response.
messagesConversation history sent to the model in OpenAI chat format.
max_tokensMaximum number of tokens the model can generate in the response.
web_searchEnables or disables server-side web search. Enabled by default for most models.
search_context_sizeAmount of search context passed to the model. Supported values are lowmedium, and high.
countryCountry code used as the location hint.
safesearchSearch safety preference passed to the search backend.
accept_languageLanguage preference for search results.
Example Response
{
  "id": "chatcmpl-0000000000000000",
  "object": "chat.completion",
  "created": 0,
  "model": "kimi-k2.7-code:1t",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The answer text appears here."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 120,
    "completion_tokens": 64,
    "total_tokens": 184
  }
}
FieldDescription
finish_reasonReason generation stopped, such as reaching a stop condition or token limit.
usageToken usage information for the request.
prompt_tokensNumber of input tokens processed by the model.
completion_tokensNumber of output tokens generated by the model.
total_tokensSum of input and output tokens.

Streaming chat completions

Returns a chat completion as a stream of Server-Sent Events. Each event contains a partial response chunk. The stream ends with data: [DONE].

POST
https://api.tempico.com/v1/chat/completions
curl https://api.tempico.com/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.7-code:1t",
    "stream": true,
    "messages": [
      {
        "role": "user",
        "content": "Write a short Python example using requests."
      }
    ]
  }'
Example Stream Response
data: {
  "id": "chatcmpl-0000000000000000",
  "object": "chat.completion.chunk",
  "created": 0,
  "model": "kimi-k2.7-code:1t",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant"
      },
      "finish_reason": null
    }
  ]
}

data: {
  "id": "chatcmpl-0000000000000000",
  "object": "chat.completion.chunk",
  "created": 0,
  "model": "kimi-k2.7-code:1t",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "import requests\n\n"
      },
      "finish_reason": null
    }
  ]
}

data: {
  "id": "chatcmpl-0000000000000000",
  "object": "chat.completion.chunk",
  "created": 0,
  "model": "kimi-k2.7-code:1t",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ]
}

data: [DONE]
FieldDescription
streamEnables streaming mode when set to true. The response is returned as Server-Sent Events instead of one JSON object.
choicesArray of streamed output choices. For normal single-response generation, this usually contains one item.
indexPosition of the choice in the choices array.
deltaIncremental update for the assistant message. This object can contain role, content, or be empty in the final chunk.
finish_reasonIndicates why generation ended. The value is null while generation is still running.
data: [DONE]Final stream marker. No more chunks are sent after this event.

Embeddings

Embeddings convert text input into vectors for semantic search, similarity matching, clustering, and ranking.

POST
https://api.tempico.com/v1/embeddings
curl https://api.tempico.com/v1/embeddings \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embeddinggemma:300m",
    "input": "Represent this text as a vector for semantic search."
  }'

Example Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        0.0123,
        -0.0456,
        0.0789
      ]
    }
  ],
  "model": "embeddinggemma:300m",
  "usage": {
    "prompt_tokens": 12,
    "total_tokens": 12
  }
}
FieldDescription
inputText or array of texts used to create embeddings.
embeddingVector representation of the input text.
prompt_tokensNumber of input tokens processed by the model.
total_tokensTotal tokens counted for the request.

Pricing and Feature Availability

DATACENTER
LM1
Estonia
SG1

Singapore

USW2

USA

Infrastructure Units size

CPU

72 MHz

48 MHz

32 MHz

RAM

48 MB

48 MB

48 MB

Storage size

576 MB

576 MB

480 MB

IOPS

12 IOPS

12 IOPS

12 IOPS

Infrastructure Unit​ price

0.00074 €/h

0.00241 €/h

0.0033 €/h

PaaS ADD-ONS

Dedicated NAT gateway

0.03014 €/h

0.0399 €/h

0.0433 €/h

GPU RTX 5000 Ada

0.78 €/h

GPU RTX PRO 6000 Blackwell

1.31 €/h

Paid TLS certificate

From 16.50 €/year

Dedicated IPv4

7.23 €/month

4 €/month​

3.83 €/month​

Dedicated IPv6

Free

Varies

Varies

Dedicated IPv4 subnet​

Varies

Dedicated IPv6 subnet

Varies

CI/CD Add-ons

CI/CD runner

0.01628 €/h

Varies

Varies

Environment-dependent features

Stateful Firewall
TLS with Let's Encrypt
Web Application Firewall
Platform built-in logging
ARM CPUs
Anti-DDoS

Add-on

Add-on

Cloud-native services

Add-on

GPU-accelerated computing

Add-on

Custom PCI passtrough
Log storage

0.1 GB/month free,
2 €/GB/month optional expansion,
60 days max

Log ingestion

First 10 GB/month free,
then 0.06 €/GB/Month

Log Retrieval

1.50 €/GB/month