Skip to content

Document Parsing

Document Parsing extracts text and structural information from documents, scanned files, and images.

Use it to convert raw documents into machine-readable output for AI and retrieval workflows.

Introduction

Document Parsing supports OCR, chunking, and multimodal document understanding. Each variant is optimized for a different parsing use case.

Overview

Extracts text and structural information from documents using Optical Character Recognition.

Splits parsed documents into smaller semantic chunks for RAG workflows.

Parses multimodal documents that contain text and images.

Adds chunking for multimodal documents.

Chunking behavior

When chunking is enabled, images are excluded from the output to keep chunk generation text-focused.

Input Format

Document Parsing accepts file content in either of the following ways:

1. Base64‑encoded JSON

{
  "base64_string": "JVBERi0xLjUKJdDUxdgKNSAwIG9iago8PC9UeXBlIC9..."
}
Encode the source file before adding it to the request body.

2. File Upload (multipart/form‑data)

Send the file as a standard multipart/form-data request with a file field.

Supported File Formats

Document Parsing supports these formats:

Type Base64 Formats File Upload Formats
Documents pdf, docx, pptx, xlsx, html, md, csv pdf, docx, pptx, xlsx
Images jpeg, png

Getting Started

  1. In Cloud Portal, navigate to Document Parsing.
  2. Click Create.
  3. Select the variant you want to use.
  4. Select the tenant and business group.
  5. Enter the API key name.
  6. Optionally add a description.
  7. Click Create.
  8. Copy the generated API key.

API Reference

Request and response schema details are available in the AI Platform API documentation:

Endpoint and documentation access

The Document Parsing endpoints and documenation mentioned above, are accessible only with an active connection to the SITE Cloud environment, and are not accessible from the public internet.