struct and table field types in Box AI structured extraction to define a schema that matches the shape of the document, then call the extract endpoint with the Enhanced Extract Agent to return data that is ready for downstream systems, automations, and Box metadata.
What you are building
By the end of this tutorial, you have a working Python project that:- Authenticates to Box and reads a supplier agreement stored in a Box folder.
- Defines an extraction schema that uses a
structfield for grouped vendor details and atablefield for a repeating delivery schedule. - Calls the POST /2.0/ai/extract_structured endpoint with the Enhanced Extract Agent.
- Maps the response into a procurement record that you can push to an ERP, procurement platform, or project tracker.
Why struct and table
The POST /2.0/ai/extract_structured endpoint supports two complex field types in addition to scalar types such asstring, float, date, enum, and multiSelect:
| Field type | Returns | Use it for |
|---|---|---|
struct | A single nested JSON object | Related values that belong together, such as a vendor’s legal name, address, and contact details. |
table | An array of JSON objects with the same schema | Repeating rows, such as delivery milestones, line items, or payment schedules. |
Prerequisites
Before you start, make sure you have the following:- A free , which gives you access to the Box AI API.
- A Box application configured with Client Credentials Grant authentication.
- Python 3.11 or higher.
- A supplier agreement uploaded to Box. You can download a and upload it to a Box folder. Note its file ID from the URL. For example, if the URL is
https://app.box.com/file/123456789, the file ID is123456789. - The following scopes enabled on your app:
- Read and write all files and folders stored in Box
- Manage AI
Step-by-step process
This tutorial uses one Box Platform capability and one Box AI agent:| Component | Purpose | API |
|---|---|---|
| Box AI Extract | Pull grouped and repeating structured data from the agreement | POST /2.0/ai/extract_structured |
| Enhanced Extract Agent | Improve accuracy for nested and repeating fields | ai_agent.id = enhanced_extract_agent |
enhanced_extract_agent) in the ai_agent parameter of your extraction request, which you build in the Build the extraction function step below. To learn more, see the reference.
Set up the development environment
- Open your terminal and create a new project directory:
- Create and activate a Python virtual environment:
(.venv) at the beginning. This confirms you are working inside the virtual environment.source .venv/bin/activate from the project directory. If you see ModuleNotFoundError when running commands, it usually means the venv is not activated.- Install the required packages:
- Create a
.envfile to store your credentials, then add the following content. Replace the placeholder values with your actual credentials from the Box Developer Console:
Authenticate the Box client
box_client.py and add the following code. It builds an authenticated client that you use to call Box AI through the SDK.Define the extraction schema
schema.py and add the following code. The schema describes the agreement using two complex fields:vendoris astructfield that groups related vendor details into one nested object.delivery_scheduleis atablefield that returns one row per milestone.
fields array that defines its sub-fields. Sub-fields support scalar types only. Nested struct or table types are not allowed.Build the extraction function
extract.py and add the following code. It uses the SDK’s create_ai_extract_structured method to send the schema to Box AI and specifies the Enhanced Extract Agent, which improves accuracy for nested and repeating fields.Map the structured output
app.py and add the following code. Box AI returns the vendor field as a nested object and the delivery_schedule field as a list of rows. This script runs the extraction and maps the result into a flat record that is ready for a downstream system.Run the extraction
supplier-extraction directory with the virtual environment activated, then run the script:struct field as a single nested object and the table field as a list of objects. Your output looks similar to this:Troubleshooting
ModuleNotFoundError: No module named '...'
ModuleNotFoundError: No module named '...'
source .venv/bin/activate from the project directory before running any python3 commands. Each new terminal tab needs its own activation.invalid_client: The client credentials are invalid
invalid_client: The client credentials are invalid
.env file:- Verify
BOX_CLIENT_IDandBOX_CLIENT_SECRETmatch the values in Developer Console > Configuration. - Confirm
BOX_ENTERPRISE_IDis your enterprise ID. - Ensure your app is authorized in the Developer Console and uses Client Credentials Grant.
404 Not Found
404 Not Found
Sub-field values come back empty or merged
Sub-field values come back empty or merged
struct and table types are not supported as sub-fields, and a struct or table field must include a fields array. Confirm your sub-fields use only scalar types, and add a prompt to the complex field to clarify what to extract.Scaling to production
Write results back to Box metadata
Write results back to Box metadata
Trigger extraction automatically
Trigger extraction automatically
Tune accuracy for complex layouts
Tune accuracy for complex layouts
prompt to guide extraction, and keep the Enhanced Extract Agent for multi-page agreements with dense tables. For consistent schemas across many documents, define the fields in a instead of inline.