Skip to main content
Enterprise procurement documents rarely contain just a few flat fields. A single supplier agreement can include vendor contact details, delivery milestones, ship-to locations, and payment terms across paragraphs and tables. This tutorial shows you how to turn that complexity into clean, structured output. You use the struct and table field types in Box AI structured extraction to define a schema that matches the shape of the document, then call the extract endpoint with the Enhanced Extract Agent to return data that is ready for downstream systems, automations, and Box metadata.

What you are building

By the end of this tutorial, you have a working Python project that:
  • Authenticates to Box and reads a supplier agreement stored in a Box folder.
  • Defines an extraction schema that uses a struct field for grouped vendor details and a table field for a repeating delivery schedule.
  • Calls the POST /2.0/ai/extract_structured endpoint with the Enhanced Extract Agent.
  • Maps the response into a procurement record that you can push to an ERP, procurement platform, or project tracker.

Why struct and table

The POST /2.0/ai/extract_structured endpoint supports two complex field types in addition to scalar types such as string, float, date, enum, and multiSelect:
Field typeReturnsUse it for
structA single nested JSON objectRelated values that belong together, such as a vendor’s legal name, address, and contact details.
tableAn array of JSON objects with the same schemaRepeating rows, such as delivery milestones, line items, or payment schedules.
These types let you mirror how the data is actually used. Instead of parsing one long vendor block or collapsing a delivery schedule into a single string after extraction, you map the result directly into a supplier master record or procurement workflow.
For a full reference of both field types and their supported sub-field types, see .

Prerequisites

Before you start, make sure you have the following:
  • A free , which gives you access to the Box AI API.
  • A Box application configured with Client Credentials Grant authentication.
  • Python 3.11 or higher.
  • A supplier agreement uploaded to Box. You can download a and upload it to a Box folder. Note its file ID from the URL. For example, if the URL is https://app.box.com/file/123456789, the file ID is 123456789.
  • The following scopes enabled on your app:
    • Read and write all files and folders stored in Box
    • Manage AI

Step-by-step process

This tutorial uses one Box Platform capability and one Box AI agent:
ComponentPurposeAPI
Box AI ExtractPull grouped and repeating structured data from the agreementPOST /2.0/ai/extract_structured
Enhanced Extract AgentImprove accuracy for nested and repeating fieldsai_agent.id = enhanced_extract_agent
The Enhanced Extract Agent is a predefined Box AI agent, so you do not create or configure it yourself. You reference it by its ID (enhanced_extract_agent) in the ai_agent parameter of your extraction request, which you build in the Build the extraction function step below. To learn more, see the reference.
1

Set up the development environment

  1. Open your terminal and create a new project directory:
mkdir supplier-extraction && cd supplier-extraction
  1. Create and activate a Python virtual environment:
python3 -m venv .venv
source .venv/bin/activate
After activation, your terminal prompt shows (.venv) at the beginning. This confirms you are working inside the virtual environment.
Every time you open a new terminal window or tab, re-activate the virtual environment by running source .venv/bin/activate from the project directory. If you see ModuleNotFoundError when running commands, it usually means the venv is not activated.
  1. Install the required packages:
pip install box-sdk-gen python-dotenv
  1. Create a .env file to store your credentials, then add the following content. Replace the placeholder values with your actual credentials from the Box Developer Console:
BOX_CLIENT_ID=your_client_id
BOX_CLIENT_SECRET=your_client_secret
BOX_ENTERPRISE_ID=your_enterprise_id
AGREEMENT_FILE_ID=your_file_id
Never commit .env files to version control. Add .env to your .gitignore.
2

Authenticate the Box client

Create a file called box_client.py and add the following code. It builds an authenticated client that you use to call Box AI through the SDK.
import os
from dotenv import load_dotenv
from box_sdk_gen import BoxClient, BoxCCGAuth, CCGConfig

load_dotenv()

def get_box_client() -> BoxClient:
    config = CCGConfig(
        client_id=os.getenv("BOX_CLIENT_ID"),
        client_secret=os.getenv("BOX_CLIENT_SECRET"),
        enterprise_id=os.getenv("BOX_ENTERPRISE_ID"),
    )
    auth = BoxCCGAuth(config=config)
    return BoxClient(auth=auth)
Client Credentials Grant is recommended for server-to-server automations where no end user is present. For other authentication options, see .
A CCG application acts as a separate service account user that does not automatically have access to your content. Invite the service account email (found in the Developer Console under General Settings) as a collaborator on the folder that contains your agreement. Without access, API calls return 404 Not found.
3

Define the extraction schema

Create a file called schema.py and add the following code. The schema describes the agreement using two complex fields:
  • vendor is a struct field that groups related vendor details into one nested object.
  • delivery_schedule is a table field that returns one row per milestone.
Each complex field requires a fields array that defines its sub-fields. Sub-fields support scalar types only. Nested struct or table types are not allowed.
from box_sdk_gen import CreateAiExtractStructuredFields

EXTRACTION_FIELDS = [
    CreateAiExtractStructuredFields(
        key="agreement_effective_date",
        display_name="Effective date",
        type="date",
        prompt="The date the agreement takes effect.",
    ),
    CreateAiExtractStructuredFields(
        key="payment_terms",
        display_name="Payment terms",
        type="string",
        prompt="The payment terms, for example Net 30.",
    ),
    CreateAiExtractStructuredFields(
        key="vendor",
        display_name="Vendor",
        type="struct",
        prompt="The vendor or supplier issuing the agreement.",
        fields=[
            CreateAiExtractStructuredFields(key="legal_name", type="string", display_name="Legal name"),
            CreateAiExtractStructuredFields(key="address", type="string", display_name="Address"),
            CreateAiExtractStructuredFields(key="country", type="string", display_name="Country"),
            CreateAiExtractStructuredFields(key="contact_person", type="string", display_name="Contact person"),
            CreateAiExtractStructuredFields(key="email", type="string", display_name="Email"),
            CreateAiExtractStructuredFields(key="phone", type="string", display_name="Phone"),
        ],
    ),
    CreateAiExtractStructuredFields(
        key="delivery_schedule",
        display_name="Delivery schedule",
        type="table",
        prompt="Each delivery milestone defined in the agreement.",
        fields=[
            CreateAiExtractStructuredFields(key="phase", type="string", display_name="Phase"),
            CreateAiExtractStructuredFields(key="items", type="string", display_name="Items"),
            CreateAiExtractStructuredFields(key="delivery_date", type="date", display_name="Delivery date"),
            CreateAiExtractStructuredFields(key="destination", type="string", display_name="Destination"),
            CreateAiExtractStructuredFields(key="status", type="string", display_name="Status"),
        ],
    ),
]
Table extraction is not limited to visually formatted tables. The table type extracts repeating data whether it appears as a grid, key-value pairs, a form layout, or plain prose.
4

Build the extraction function

Create a file called extract.py and add the following code. It uses the SDK’s create_ai_extract_structured method to send the schema to Box AI and specifies the Enhanced Extract Agent, which improves accuracy for nested and repeating fields.
from box_sdk_gen import AiItemBase, AiAgentReference, AiAgentReferenceTypeField
from box_client import get_box_client
from schema import EXTRACTION_FIELDS

def extract_agreement(file_id: str) -> dict:
    client = get_box_client()

    response = client.ai.create_ai_extract_structured(
        items=[AiItemBase(id=file_id)],
        fields=EXTRACTION_FIELDS,
        ai_agent=AiAgentReference(
            id="enhanced_extract_agent",
            type=AiAgentReferenceTypeField.AI_AGENT_ID,
        ),
    )

    return response.to_dict()["answer"]
The Enhanced Extract Agent is not strictly required, but it improves results for richer schemas and complex document layouts, especially when nested and repeating fields are involved.
5

Map the structured output

Create a file called app.py and add the following code. Box AI returns the vendor field as a nested object and the delivery_schedule field as a list of rows. This script runs the extraction and maps the result into a flat record that is ready for a downstream system.
import json
import os
from dotenv import load_dotenv
from extract import extract_agreement

load_dotenv()

def to_procurement_record(answer: dict) -> dict:
    vendor = answer.get("vendor", {})
    schedule = answer.get("delivery_schedule", [])

    return {
        "vendor_name": vendor.get("legal_name"),
        "vendor_country": vendor.get("country"),
        "vendor_contact": vendor.get("contact_person"),
        "vendor_email": vendor.get("email"),
        "payment_terms": answer.get("payment_terms"),
        "effective_date": answer.get("agreement_effective_date"),
        "milestones": [
            {
                "phase": row.get("phase"),
                "due": row.get("delivery_date"),
                "destination": row.get("destination"),
                "status": row.get("status"),
            }
            for row in schedule
        ],
    }

if __name__ == "__main__":
    file_id = os.getenv("AGREEMENT_FILE_ID")

    answer = extract_agreement(file_id)
    print("Raw extraction:")
    print(json.dumps(answer, indent=2))

    record = to_procurement_record(answer)
    print("\nProcurement record:")
    print(json.dumps(record, indent=2))
At this point, your project directory should contain the following files:
supplier-extraction/
├── .env
├── .venv/
├── app.py
├── box_client.py
├── extract.py
└── schema.py
6

Run the extraction

Make sure you are in the supplier-extraction directory with the virtual environment activated, then run the script:
python3 app.py
Box AI returns the struct field as a single nested object and the table field as a list of objects. Your output looks similar to this:
{
  "agreement_effective_date": "2026-06-01",
  "payment_terms": "Net 45",
  "vendor": {
    "legal_name": "Nexus Industrial Solutions Ltd.",
    "address": "47 Canary Wharf, Level 12, London, E14 5AB",
    "country": "United Kingdom",
    "contact_person": "Oliver Hartmann",
    "email": "o.hartmann@nexusindustrial.co.uk",
    "phone": "+44 20 7946 0832"
  },
  "delivery_schedule": [
    {
      "phase": "Phase 1 - Infrastructure",
      "items": "Server Racks (qty 6), UPS (qty 3)",
      "delivery_date": "2026-07-15",
      "destination": "Chicago, IL",
      "status": "Scheduled"
    },
    {
      "phase": "Phase 2 - Networking",
      "items": "48-Port Switches (all), Fibre Cables (all)",
      "delivery_date": "2026-08-01",
      "destination": "Chicago, IL",
      "status": "Scheduled"
    },
    {
      "phase": "Phase 3 - Security",
      "items": "NGFW Cluster (all), Remaining Racks",
      "delivery_date": "2026-08-20",
      "destination": "San Francisco, CA",
      "status": "Pending"
    },
    {
      "phase": "Phase 4 - Services",
      "items": "Installation, Training",
      "delivery_date": "2026-09-01",
      "destination": "San Francisco, CA",
      "status": "Pending"
    }
  ]
}
The script then prints the mapped procurement record, which you can send to your ERP, procurement platform, or project tracker.

Troubleshooting

Your virtual environment is not activated. Run source .venv/bin/activate from the project directory before running any python3 commands. Each new terminal tab needs its own activation.
Check your .env file:
  • Verify BOX_CLIENT_ID and BOX_CLIENT_SECRET match the values in Developer Console > Configuration.
  • Confirm BOX_ENTERPRISE_ID is your enterprise ID.
  • Ensure your app is authorized in the Developer Console and uses Client Credentials Grant.
The service account does not have access to the file. Invite the service account email (found in Developer Console > General Settings) as a collaborator on the folder that contains your agreement.
Nested struct and table types are not supported as sub-fields, and a struct or table field must include a fields array. Confirm your sub-fields use only scalar types, and add a prompt to the complex field to clarify what to extract.

Scaling to production

To make agreements searchable and routable inside Box, write the flattened top-level values back to the file as a metadata instance, then use to filter by vendor, country, or effective date. See for an end-to-end metadata write-back pattern.
Instead of running the script manually, register a webhook on your agreements folder so each upload triggers extraction. See to validate incoming requests in production.
Add a field-level prompt to guide extraction, and keep the Enhanced Extract Agent for multi-page agreements with dense tables. For consistent schemas across many documents, define the fields in a instead of inline.

Next steps

Invoice intake automation

Watch a folder for new invoices, extract fields, and write them back as searchable metadata.

Extract API reference

See the full API specification for structured extraction.
Last modified on June 24, 2026