Inferences

Serverless Inference provides managed access to AI models across language, vision, embedding, reranking, moderation, and speech-to-text workloads. Applications can start using an inference product with an API key and endpoint, without managing the underlying compute, scaling, or deployment infrastructure.

Introduction

Use Inferences to integrate AI model capabilities into applications and workflows through managed SITE Cloud AI Platform products.

Overview

LLMVisionEmbeddingRerankingModerationAudio

LLM Inference provides access to large language models for chatbots, summarization, structured output, and custom NLP workflows.

Vision Inference supports multimodal use cases that combine image and text understanding, such as captioning, classification, visual question answering, and multimodal chat.

Embedding Inference generates vector embeddings for semantic search, clustering, retrieval-augmented generation, recommendation systems, and related workflows.

Reranking Inference reorders candidate search or retrieval results based on semantic relevance.

Moderation Inference detects, classifies, and manages unsafe or undesired content across text-based inputs and outputs.

Audio Inference converts spoken audio into structured text for transcription, analysis, call analytics, meeting transcription, and downstream language understanding.

Inference products can be combined. For example, Embedding Inference and Reranking Inference can be used together to build search, recommendation, and knowledge retrieval workflows.

Getting Started

The API key creation process is the same for all Serverless Inference products. The example below uses LLM Inference.

In Cloud Portal, open the inference product you want to use.
Click Create to open the creation modal.
Select the model you want to use from the available list.
Select the tenant and business group.
Enter the API key name.
Optionally add a description.
Click Generate to create the API key.
Copy the API key and endpoint.
Review the generated inference information, including name, description, business group, business group ID, model, status, creation details, and endpoint.

Model availability

The list of supported models is updated regularly.

Making an Inference Request

After creating an API key, use the endpoint for the selected Serverless Inference product.

Request and response schemas are available in the AI Platform API documentation:

Inference API documentation

Endpoint and documentation access

The LLM Inference endpoint shown in Cloud Portal and the documenation mentioned above, are accessible only with an active connection to the SITE Cloud environment, and are not accessible from the public internet.

Glossary

Term	Description
API Key	A secure access token used to authenticate requests to SITE Cloud APIs.
LLM	A large language model used for text generation, summarization, chatbots, and related language tasks.
VLM	A vision-language model that works with both image and text inputs.
Embedding	A numerical representation of text or data that captures semantic meaning and similarity.
Reranking	Reordering search or retrieval results based on semantic relevance.
Inference	Running data through a trained model to generate an output.
Serverless	A managed execution model where scaling and resource allocation are handled by SITE Cloud.
Token	A unit of text used by language models.