DocExtract API Documentation

Extract structured data from invoices, receipts, and bank statements with our powerful document processing API.

JS
Node.js
npm i docextract
Py
Python
pip install docextract
Go
Go
go get docextract
Rb
Ruby
gem install docextract

Authentication

All API requests require authentication using an API key. Include your API key in the X-API-Key header with every request.

HTTP Header
X-API-Key: your_api_key_here
Never expose your API key in client-side code. Always make API requests from your backend server.

You can generate API keys from your Dashboard. Each key can have specific scopes and IP restrictions.

Quickstart

Extract data from a document in just a few lines of code:

JavaScript
const DocExtract = require('docextract');

const client = new DocExtract('your_api_key');

const result = await client.extract({
  file: './invoice.pdf',
  type: 'auto'
});

console.log(result.extractedData);
// { vendor: 'Acme Corp', total: 1250.00, date: '2026-01-10', ... }
Python
from docextract import DocExtract

client = DocExtract("your_api_key")

result = client.extract(
    file="./invoice.pdf",
    type="auto"
)

print(result.extracted_data)
# {'vendor': 'Acme Corp', 'total': 1250.00, 'date': '2026-01-10', ...}
cURL
curl -X POST https://api.docextract.io/v1/extract/auto \
  -H "X-API-Key: your_api_key" \
  -F "[email protected]"

Auto Extract

POST /v1/extract/auto

Automatically detect document type and extract structured data.

Request Parameters

ParameterTypeDescription
filerequired file The document file (PDF, PNG, JPG, TIFF)
include_raw_textoptional boolean Include raw OCR text in response (default: false)
webhook_urloptional string URL to receive async processing results

Response

200 OK
{
  "success": true,
  "documentType": "invoice",
  "detectedType": "invoice",
  "confidence": {
    "overall": 0.94,
    "fields": {
      "vendor": 0.98,
      "total": 0.95
    }
  },
  "extractedData": {
    "vendor": "Acme Corporation",
    "invoiceNumber": "INV-2026-001",
    "date": "2026-01-10",
    "dueDate": "2026-02-10",
    "subtotal": 1150.00,
    "tax": 100.00,
    "total": 1250.00,
    "currency": "USD",
    "lineItems": [...]
  },
  "processingTimeMs": 1234
}

Extract Invoice

POST /v1/extract/invoice

Extract structured data specifically from invoices.

Extracted Fields

FieldTypeDescription
vendorstringVendor/supplier name
invoiceNumberstringInvoice reference number
datestringInvoice date (ISO 8601)
dueDatestringPayment due date
subtotalnumberSubtotal before tax
taxnumberTax amount
totalnumberTotal amount
currencystringCurrency code (USD, EUR, etc.)
lineItemsarrayList of line items

Extract Receipt

POST /v1/extract/receipt

Extract structured data from receipts and purchase records.

Extracted Fields

FieldTypeDescription
merchantstringStore/merchant name
datestringTransaction date
timestringTransaction time
itemsarrayPurchased items
subtotalnumberSubtotal amount
taxnumberTax amount
totalnumberTotal amount
paymentMethodstringPayment method used

Extract Bank Statement

POST /v1/extract/bank-statement

Extract transaction data from bank statements.

Extracted Fields

FieldTypeDescription
bankNamestringFinancial institution name
accountNumberstringAccount number (masked)
statementPeriodobjectStart and end dates
openingBalancenumberStarting balance
closingBalancenumberEnding balance
transactionsarrayList of transactions

Detect Document Type

POST /v1/detect

Detect document type without extracting data.

200 OK
{
  "type": "invoice",
  "confidence": 0.96,
  "signals": ["invoice_number", "due_date", "line_items"]
}

Node.js SDK

Full-featured SDK with TypeScript support and async handling.

Installation
npm install docextract
JavaScript
const { DocExtract } = require('docextract');
const fs = require('fs');

// Initialize client
const client = new DocExtract({
  apiKey: process.env.DOCEXTRACT_API_KEY
});

// Extract from file path
const result = await client.extract({
  file: './documents/invoice.pdf',
  type: 'auto'
});

// Extract from buffer
const buffer = fs.readFileSync('./invoice.pdf');
const result2 = await client.extract({
  file: buffer,
  filename: 'invoice.pdf',
  type: 'invoice'
});

// Extract from URL
const result3 = await client.extractFromUrl({
  url: 'https://example.com/invoice.pdf',
  type: 'auto'
});

// Batch processing
const results = await client.extractBatch({
  files: ['./inv1.pdf', './inv2.pdf', './inv3.pdf'],
  type: 'invoice',
  concurrency: 3
});

console.log(result.extractedData);

TypeScript Support

TypeScript
import { DocExtract, InvoiceData, ExtractionResult } from 'docextract';

const client = new DocExtract({ apiKey: process.env.DOCEXTRACT_API_KEY! });

const result: ExtractionResult<InvoiceData> = await client.extract({
  file: './invoice.pdf',
  type: 'invoice'
});

// Fully typed extracted data
console.log(result.extractedData.vendor);
console.log(result.extractedData.total);
console.log(result.extractedData.lineItems);

Python SDK

Pythonic SDK with sync and async support.

Installation
pip install docextract
Python
import os
from docextract import DocExtract

# Initialize client
client = DocExtract(api_key=os.environ["DOCEXTRACT_API_KEY"])

# Extract from file path
result = client.extract(
    file="./documents/invoice.pdf",
    type="auto"
)

# Extract from bytes
with open("invoice.pdf", "rb") as f:
    result = client.extract(
        file=f.read(),
        filename="invoice.pdf",
        type="invoice"
    )

# Extract from URL
result = client.extract_from_url(
    url="https://example.com/invoice.pdf",
    type="auto"
)

print(result.extracted_data)

Async Support

Python (Async)
import asyncio
from docextract import AsyncDocExtract

async def process_documents():
    client = AsyncDocExtract(api_key=os.environ["DOCEXTRACT_API_KEY"])

    # Process multiple documents concurrently
    tasks = [
        client.extract(file="inv1.pdf"),
        client.extract(file="inv2.pdf"),
        client.extract(file="inv3.pdf")
    ]

    results = await asyncio.gather(*tasks)

    for result in results:
        print(result.extracted_data)

asyncio.run(process_documents())

cURL Examples

Auto Extract from File
curl -X POST https://api.docextract.io/v1/extract/auto \
  -H "X-API-Key: your_api_key" \
  -F "[email protected]"
Extract with Options
curl -X POST https://api.docextract.io/v1/extract/invoice \
  -H "X-API-Key: your_api_key" \
  -F "[email protected]" \
  -F "include_raw_text=true"
Extract from URL
curl -X POST https://api.docextract.io/v1/extract/auto \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/invoice.pdf"}'
Detect Document Type
curl -X POST https://api.docextract.io/v1/detect \
  -H "X-API-Key: your_api_key" \
  -F "[email protected]"

Error Handling

The API uses standard HTTP status codes and returns detailed error information.

CodeErrorDescription
400VALIDATION_ERRORInvalid request parameters
401INVALID_API_KEYInvalid or missing API key
403FORBIDDENInsufficient permissions
404NOT_FOUNDResource doesn't exist
413FILE_TOO_LARGEFile exceeds 10MB limit
415UNSUPPORTED_FORMATFile format not supported
422EXTRACTION_FAILEDDocument couldn't be processed
422UNREADABLE_PDFPDF text extraction failed
422LOW_QUALITY_IMAGEImage quality too low for OCR
429RATE_LIMITEDToo many requests
500SERVER_ERRORInternal server error
Error Response
{
  "success": false,
  "error": {
    "code": "UNREADABLE_PDF",
    "message": "Could not extract text from PDF",
    "details": {
      "suggestion": "Upload a higher quality image or native PDF"
    }
  }
}

SDK Error Handling

JavaScript
import { DocExtract, DocExtractError } from 'docextract';

try {
  const result = await client.extract({ file: './doc.pdf' });
} catch (error) {
  if (error instanceof DocExtractError) {
    console.log(error.code);       // 'UNREADABLE_PDF'
    console.log(error.message);    // 'Could not extract text'
    console.log(error.statusCode); // 422
    console.log(error.suggestion); // 'Upload a higher quality...'
  }
}
Python
from docextract import DocExtract, DocExtractError

try:
    result = client.extract(file="doc.pdf")
except DocExtractError as e:
    print(e.code)        # 'UNREADABLE_PDF'
    print(e.message)     # 'Could not extract text'
    print(e.status_code) # 422
    print(e.suggestion)  # 'Upload a higher quality...'

Rate Limits

Rate limits vary by plan:

PlanRequests/minRequests/dayFile Size
Free101005 MB
Starter605,00010 MB
Pro30050,00025 MB
EnterpriseCustomCustomCustom
Rate limit headers are included in all responses:
X-RateLimit-Limit - Maximum requests allowed
X-RateLimit-Remaining - Requests remaining
X-RateLimit-Reset - Unix timestamp when limit resets

Webhooks

Receive async notifications when documents are processed.

Configuring Webhooks

Pass a webhook_url parameter with your extraction request:

Request with Webhook
curl -X POST https://api.docextract.io/v1/extract/auto \
  -H "X-API-Key: your_api_key" \
  -F "[email protected]" \
  -F "webhook_url=https://yourapp.com/webhooks/docextract"

Webhook Payload

Webhook Payload
{
  "event": "extraction.completed",
  "timestamp": "2026-01-11T10:30:00Z",
  "data": {
    "id": "ext_abc123",
    "documentType": "invoice",
    "extractedData": { ... },
    "confidence": {
      "overall": 0.94
    },
    "processingTimeMs": 1234
  }
}

Verifying Webhooks

All webhook requests include an X-DocExtract-Signature header for verification.

Node.js Verification
const crypto = require('crypto');

function verifyWebhook(payload, signature, secret) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(payload)
    .digest('hex');
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expected)
  );
}
Webhooks are retried up to 3 times with exponential backoff if your endpoint returns a non-2xx status code.