Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developer.box.com/llms.txt

Use this file to discover all available pages before exploring further.

Data extraction can be challenging because source content varies widely with different layouts, templates, and file types. The Box Extract API provides two AI-powered endpoints: structured extraction and freeform extraction that standardize how you extract metadata from files in Box. With these endpoints, you can:
  • Extract specific fields into a consistent schema or Box metadata template (structured extraction).
  • Use enhanced extraction for more complex use cases and improved accuracy powered by advanced AI models (structured extraction).
  • Extract content when the target fields are not known ahead of time (freeform extraction).
  • Run extractions without building your own orchestration, permission checks, or throttling logic around the extraction workflow.
  • Extract data from documents in various languages, file formats, scanned documents, or photos.
To explore the possibilities of these endpoints, discover the following use cases in various industries:
Recommended endpoint: structured metadata extraction (POST /2.0/ai/extract_structured)Ideal for high-volume, standardized documents where you need predictable data types:
  • Automated data entry: Structured metadata extraction ensures consistent JSON response every time thanks to a preconfigured metadata template.
  • Invoices and purchase orders: Extract line items, totals, and dates.
  • Client contracts: Parse standardized fields like “Effective Date” or “Total Contract Value” to update CRM records.
Use advanced features like Optical Character Recognition (OCR) and enhanced extraction agent for regulated sectors:
  • Know Your Customer (KYC) documents: Verify user’s identity by extracting text from scanned passports or driver’s licenses.
  • Loan origination: Automate income verification by pulling data from scanned utility bills or pay stubs.
  • Clinical Trial Enrollment: Extract patient criteria from medical forms to match candidates with trials.
  • Regulatory submissions: Organize and validate the data required for submissions such as FDA or EMA.
  • Permit applications: Accelerate zoning approvals by validating required documentation.
  • Public records requests: Automatically classify and prioritize documents for public requests such as Freedom of Information Act (FOIA).

Core benefits of using the Box Extract API

  • Managed scaling: The Box Extract API is designed to handle queues and rate limits, so you can run high-volume processing.
  • No defensive code: The Box Extract API is model-agnostic, which enables switching between supported LLMs with minimal code changes and minimizing vendor lock-in.
  • Security and compliance: All extracted data inherits the enterprise-grade security and governance policies of the Box platform.

Next steps

Get started with the Box Extract API with practical example-led quick starts guides, API reference pages, and extensive developer guides: