Ingesting documents

Memic’s ingestion pipeline takes a file, chunks it, generates embeddings, and indexes it for semantic search. This guide walks through the flow and common patterns.

The upload flow

Every file upload is a three-step process:

Init — call POST /files/init with the filename and size. Memic returns a presigned upload URL.
Upload — PUT the file contents directly to the presigned URL. This bypasses the Memic API entirely so large files don’t hit request limits.
Confirm — call POST /files/{id}/confirm to signal upload is complete. Memic queues the file for processing.

The Python SDK wraps all three steps into a single client.files.upload(...) call, but if you’re calling the API directly (or uploading from a browser), you’ll do each step yourself. See the cURL quickstart for the raw HTTP flow.

Processing lifecycle

After confirm, a file goes through these statuses:

pending → processing → ready
                    ↘ failed

pending — queued, not yet picked up by a worker
processing — being chunked, embedded, and indexed
ready — searchable
failed — processing failed (check /files/{id}/status for the reason)

Typical times: a PDF under 10MB is usually ready within 30 seconds. Larger files or spreadsheets can take longer.

Polling vs waiting

For interactive flows, poll /files/{id}/status every 2–5 seconds until ready. The endpoint has a higher rate limit than search/chat specifically for this use case. For batch uploads where you don’t need immediate feedback, upload everything and check status in a single pass at the end.

Supported file types

PDF — with full text extraction and table detection
Plain text (.txt, .md)
Microsoft Office (.docx, .pptx, .xlsx)
Web formats (.html)
Image formats (.png, .jpg) — text extracted via OCR

Files outside this list are rejected at init time with invalid_file_type.

Maximum file size

The current limit is 100 MB per file. For larger files, split them into logical sections (e.g. chapters) and upload as separate files.

Batch uploads

The Python SDK provides a batch helper that uploads multiple files in parallel:

results = client.files.upload_batch([
    "./doc1.pdf",
    "./doc2.pdf",
    "./doc3.pdf",
])

Under the hood this calls init + PUT + confirm for each file with controlled concurrency.

Replacing a file

To update the content of a file, upload a new version with the same filename. Memic treats filenames as human labels, not unique keys — the new upload creates a new file_id. To replace and remove the old version: upload the new file, delete the old one with DELETE /files/{old_id}.

Search guide

Once files are ready, query them.

API reference

Full file endpoint docs.

Documentation Index

​The upload flow

​Processing lifecycle

​Polling vs waiting

​Supported file types

​Maximum file size

​Batch uploads

​Replacing a file

​Related