Document Parsing
Document Parsing extracts text and structural information from documents, scanned files, and images.
Use it to convert raw documents into machine-readable output for AI and retrieval workflows.
Introduction
Document Parsing supports OCR, chunking, and multimodal document understanding. Each variant is optimized for a different parsing use case.
Overview
Extracts text and structural information from documents using Optical Character Recognition.
Splits parsed documents into smaller semantic chunks for RAG workflows.
Parses multimodal documents that contain text and images.
Adds chunking for multimodal documents.
Chunking behavior
When chunking is enabled, images are excluded from the output to keep chunk generation text-focused.
Input Format
Document Parsing accepts file content in either of the following ways:
1. Base64‑encoded JSON
{
"base64_string": "JVBERi0xLjUKJdDUxdgKNSAwIG9iago8PC9UeXBlIC9..."
}
2. File Upload (multipart/form‑data)
Send the file as a standard multipart/form-data request with a file field.
Supported File Formats
Document Parsing supports these formats:
| Type | Base64 Formats | File Upload Formats |
|---|---|---|
| Documents | pdf, docx, pptx, xlsx, html, md, csv |
pdf, docx, pptx, xlsx |
| Images | jpeg, png |
Getting Started
- In Cloud Portal, navigate to Document Parsing.
- Click Create.
- Select the variant you want to use.
- Select the tenant and business group.
- Enter the API key name.
- Optionally add a description.
- Click Create.
- Copy the generated API key.
API Reference
Request and response schema details are available in the AI Platform API documentation:
Endpoint and documentation access
The Document Parsing endpoints and documenation mentioned above, are accessible only with an active connection to the SITE Cloud environment, and are not accessible from the public internet.