Skip to main content
Data extraction can be challenging because source content varies widely with different layouts, templates, and file types. The Box AI Extract API provides two AI-powered endpoints: structured extraction and freeform extraction that standardize how you extract metadata from files in Box. With these endpoints, you can:
  • Extract specific fields into a consistent schema or Box metadata template (structured extraction).
  • Use enhanced extraction for more complex use cases and improved accuracy powered by advanced AI models (structured extraction).
  • Extract content when the target fields are not known ahead of time (freeform extraction).
  • Run extractions without building your own orchestration, permission checks, or throttling logic around the extraction workflow.
  • Extract data from documents in various languages, file formats, scanned documents, or photos.
To explore the possibilities of these endpoints, discover the following use cases in various industries:
Recommended endpoint: structured metadata extraction (POST /2.0/ai/extract_structured)Ideal for high-volume, standardized documents where you need predictable data types:
  • Automated data entry: Structured metadata extraction ensures consistent JSON response every time thanks to a preconfigured metadata template.
  • Invoices and purchase orders: Extract line items, totals, and dates.
  • Client contracts: Parse standardized fields like “Effective Date” or “Total Contract Value” to update CRM records.
Use advanced features like Optical Character Recognition (OCR) and enhanced extraction agent for regulated sectors:
  • Know Your Customer (KYC) documents: Verify user’s identity by extracting text from scanned passports or driver’s licenses.
  • Loan origination: Automate income verification by pulling data from scanned utility bills or pay stubs.
  • Clinical Trial Enrollment: Extract patient criteria from medical forms to match candidates with trials.
  • Regulatory submissions: Organize and validate the data required for submissions such as FDA or EMA.
  • Permit applications: Accelerate zoning approvals by validating required documentation.
  • Public records requests: Automatically classify and prioritize documents for public requests such as Freedom of Information Act (FOIA).

Core benefits of using the Box AI Extract API

  • Managed scaling: The Box AI Extract API is designed to handle queues and rate limits, so you can run high-volume processing.
  • No defensive code: The Box AI Extract API is model-agnostic, which enables switching between supported LLMs with minimal code changes and minimizing vendor lock-in.
  • Security and compliance: All extracted data inherits the enterprise-grade security and governance policies of the Box platform.

Next steps

Get started with the Box AI Extract API with practical example-led quick starts guides, API reference pages, and extensive developer guides: