> ## Documentation Index
> Fetch the complete documentation index at: https://developer.box.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Automate invoice intake with Box AI Extract

> Build an accounts payable automation that watches a Box folder for new invoices, extracts structured fields with Box AI, and writes metadata back to each file.

export const RelatedLinks = ({title, items = []}) => {
  const getBadgeClass = badge => {
    if (!badge) return "badge-default";
    const badgeType = badge.toLowerCase().replace(/\s+/g, "-");
    return `badge-${badge === "ガイド" ? "guide" : badgeType}`;
  };
  if (!items || items.length === 0) {
    return null;
  }
  return <div className="my-8">
      {}
      <h3 className="text-sm font-bold uppercase tracking-wider mb-4">{title}</h3>

      {}
      <div className="flex flex-col gap-3">
        {items.map((item, index) => <a key={index} href={item.href} className="py-2 px-3 rounded related_link hover:bg-[#f2f2f2] dark:hover:bg-[#111827] flex items-center gap-3 group no-underline hover:no-underline border-b-0">
            {}
            <span className={`px-2 py-1 rounded-full text-xs font-semibold uppercase tracking-wide flex-shrink-0 ${getBadgeClass(item.badge)}`}>
              {item.badge}
            </span>

            {}
            <span className="text-base">{item.label}</span>
          </a>)}
      </div>
    </div>;
};

export const SignupCTA = ({children}) => {
  return <div className="flex flex-wrap items-center gap-4 p-5 rounded-lg border border-gray-200 dark:border-gray-700 my-6" style={{
    background: "linear-gradient(135deg, rgba(0, 97, 213, 0.06), rgba(0, 97, 213, 0.02))"
  }}>
      <div className="flex-1 text-sm leading-relaxed text-gray-700 dark:text-gray-300" style={{
    minWidth: "280px"
  }}>
        {children}
      </div>
      <div className="flex flex-col items-center gap-2">
        <a href="https://account.box.com/signup/developer#ty9l3" className="signup-cta-button inline-flex items-center whitespace-nowrap px-5 py-2 text-sm font-semibold text-white no-underline">
          Get started for free
        </a>
        <a href="https://account.box.com/developers/console" className="signup-cta-login text-xs text-gray-500 dark:text-gray-400 no-underline whitespace-nowrap">
          Already have an account? Log in
        </a>
      </div>
    </div>;
};

export const Link = ({href, children, className, ...props}) => {
  const localizedHref = href;
  return <a href={localizedHref} className={className} {...props}>
      {children}
    </a>;
};

Manual invoice processing is slow, error-prone, and expensive to scale. Every PDF that lands in an inbox needs someone to open it, read line items, and key data into a spreadsheet or ERP.

This tutorial builds an automated intake service that eliminates that manual work. When a new PDF arrives in a designated Box folder, the service calls Box AI to extract predictable fields - vendor name, invoice number, total, dates - and writes those values back to the file as Box metadata.

## What you are building

By the end of this tutorial, you have a working Python service that:

* Checks for new files in a designated 'Invoices Inbox' folder using Box webhooks.
* Calls the Box AI structured extract endpoint with a metadata template.
* Writes the extracted key-value pairs back to the file as Box metadata.
* Optionally logs extracted totals for downstream ERP integration.

## Prerequisites

Before you start, make sure you have the following:

* A <Link href="https://www.box.com/pricing">Box Enterprise account</Link> with Box AI enabled.
* A Box application configured with **Client Credentials Grant** authentication.
* Python 3.11 or higher.
* The following scopes enabled on your app:
  * Read all files and folders stored in Box
  * Write all files and folders stored in Box
  * Manage AI
  * Manage webhooks

## Step-by-step process

This solution uses three Box Platform capabilities:

| Component          | Purpose                                       | API                                             |
| ------------------ | --------------------------------------------- | ----------------------------------------------- |
| **Webhooks**       | Detect new files arriving in the inbox folder | `POST /2.0/webhooks`                            |
| **Box AI Extract** | Pull structured fields from each invoice PDF  | `POST /2.0/ai/extract_structured`               |
| **Metadata**       | Write extracted values back to the file       | `POST /2.0/files/:id/metadata/:scope/:template` |

<Steps>
  <Step title="Create the metadata template">
    The metadata template defines the fields Box AI extracts. Create it once, and every invoice processed by the service returns values in this shape.

    <Tip>
      This step requires Admin access. If you do not have access, contact your Box administrator.
    </Tip>

    1. Open the [Box Admin Console](https://app.box.com/master) and select **Metadata**.
    2. In the **Invoices** tab, click **New** and name it `Invoice`.
    3. Add the following fields:

    | Field name     | Type   | Description                                                 |
    | -------------- | ------ | ----------------------------------------------------------- |
    | Vendor Name    | Text   | The name of the vendor or supplier issuing the invoice      |
    | Invoice Number | Text   | The unique invoice identifier                               |
    | Invoice Date   | Date   | The date the invoice was issued                             |
    | Due Date       | Date   | The payment due date                                        |
    | Total Amount   | Number | The total amount due, including taxes and fees              |
    | Currency       | Text   | The three-letter currency code (for example, USD, EUR, GBP) |

    4. Copy the template key from under the **Template Name**. You need it in a later step.
    5. Click **Save**.

    For a detailed walkthrough, see <Link href="https://support.box.com/hc/en-us/articles/360044194033-Customizing-Metadata-Templates">Customizing Metadata Templates</Link>.
  </Step>

  <Step title="Create the invoices inbox folder">
    Create a dedicated folder in Box to serve as the invoice drop zone.

    1. In Box, create a new folder called `Invoices Inbox`.
    2. Note the **folder ID** from the URL. For example, if the URL is `https://app.box.com/folder/123456789`, the folder ID is `123456789`.
    3. **Share the folder with your application's service account.** This is required because CCG applications act as a separate service account user that does not automatically have access to your content.

    <Warning>
      **This step is critical.** Without it, all API calls return 404 "Not found" errors.

      To find your service account email: go to the [Developer Console](https://app.box.com/developers/console), open your app, and look under **General Settings** for the **Service Account ID** (it looks like `AutomationUser_xxxxx_xxxxxx@boxdevedition.com`).

      Invite this email as a **collaborator** on the folder with the **Editor** role. Editor access is required because the app needs to write metadata back to files.
    </Warning>
  </Step>

  <Step title="Set up the development environment">
    1. Open your terminal and create a new project directory:

    ```bash theme={null}
    mkdir invoice-intake && cd invoice-intake
    ```

    2. Create and activate a Python virtual environment:

    ```bash theme={null}
    python3 -m venv .venv
    source .venv/bin/activate
    ```

    After activation, your terminal prompt shows `(.venv)` at the beginning. This confirms you are working inside the virtual environment.

    <Note>
      Every time you open a new terminal window or tab, you must re-activate the virtual environment by running `source .venv/bin/activate` from the project directory. If you see `ModuleNotFoundError` when running commands, it usually means the venv is not activated.
    </Note>

    3. Install the required packages:

    ```bash theme={null}
    pip install box-sdk-gen flask python-dotenv
    ```

    4. Create a `.env` file to store your credentials then add the following content. Replace the placeholder values with your actual credentials from the Box Developer Console:

    ```bash theme={null}
    BOX_CLIENT_ID=your_client_id
    BOX_CLIENT_SECRET=your_client_secret
    BOX_ENTERPRISE_ID=your_enterprise_id
    BOX_METADATA_TEMPLATE_KEY=your_metadata_template_key
    INVOICES_FOLDER_ID=your_folder_id
    ```

    <Warning>
      Never commit `.env` files to version control. Add `.env` to your `.gitignore`.
    </Warning>

    <Note>
      **Understanding environment variables:** The `.env` file stores sensitive values (your actual credentials). Your Python code reads these values by referencing their **names** using `os.getenv("VARIABLE_NAME")`. For example, `os.getenv("BOX_CLIENT_ID")` looks up the value stored next to `BOX_CLIENT_ID=` in your `.env` file. When you copy the code in the following steps, keep the quoted variable names exactly as shown. Do not replace them with your actual credentials.
    </Note>
  </Step>

  <Step title="Authenticate the Box client">
    Create a new file called `box_client.py` in your project directory. Open the file and paste the following code:

    ```python theme={null}
    import os
    from dotenv import load_dotenv
    from box_sdk_gen import (
        BoxClient,
        BoxCCGAuth,
        CCGConfig,
    )

    load_dotenv()

    def get_box_client() -> BoxClient:
        config = CCGConfig(
            client_id=os.getenv("BOX_CLIENT_ID"),
            client_secret=os.getenv("BOX_CLIENT_SECRET"),
            enterprise_id=os.getenv("BOX_ENTERPRISE_ID"),
        )
        auth = BoxCCGAuth(config=config)
        return BoxClient(auth=auth)
    ```

    <Tip>
      Client Credentials Grant is recommended for server-to-server automations where no end user is present. For other authentication options, see <Link href="/guides/authentication/select">Select an authentication method</Link>.
    </Tip>
  </Step>

  <Step title="Build the extraction function">
    Create a new file called `extract.py` and paste the following code. This is the core of the service. It takes a file ID, calls Box AI to extract fields using your metadata template, and returns the structured result.

    ```python theme={null}
    import os
    from dotenv import load_dotenv
    from box_sdk_gen import (
        AiItemBase,
        BoxClient,
        CreateAiExtractStructuredMetadataTemplate,
        CreateAiExtractStructuredMetadataTemplateTypeField,
    )

    load_dotenv()

    def extract_invoice_fields(client: BoxClient, file_id: str) -> dict:
        template_key = os.getenv("BOX_METADATA_TEMPLATE_KEY")

        result = client.ai.create_ai_extract_structured(
            items=[AiItemBase(id=file_id)],
            metadata_template=CreateAiExtractStructuredMetadataTemplate(
                template_key=template_key,
                type=CreateAiExtractStructuredMetadataTemplateTypeField.METADATA_TEMPLATE,
                scope="enterprise",
            ),
        )

        return result.to_dict()["answer"]
    ```

    The metadata template tells Box AI exactly which fields to look for and what data types to return. This means the response shape is predictable and consistent, regardless of how each vendor formats their invoices.
  </Step>

  <Step title="Write metadata back to the file">
    Create a new file called `metadata.py` and paste the following code. After extraction, this function attaches the extracted values to the file as a metadata instance:

    ```python theme={null}
    import os
    from dotenv import load_dotenv
    from box_sdk_gen import BoxClient, CreateFileMetadataByIdScope

    load_dotenv()

    def apply_metadata(client: BoxClient, file_id: str, metadata: dict) -> dict:
        template_key = os.getenv("BOX_METADATA_TEMPLATE_KEY")

        attached = client.file_metadata.create_file_metadata_by_id(
            file_id=file_id,
            scope=CreateFileMetadataByIdScope.ENTERPRISE,
            template_key=template_key,
            request_body=metadata,
        )

        return attached.to_dict()
    ```

    Once metadata is attached, the extracted fields become searchable, filterable, and visible in the Box web app. You can use <Link href="/guides/metadata/queries">metadata queries</Link> to find all invoices above a certain amount, filter by vendor, or build dashboards in Box Apps.
  </Step>

  <Step title="Create the webhook listener">
    Create a new file called `app.py` and paste the following code. This Flask application receives webhook notifications when new files arrive in the inbox folder:

    ```python theme={null}
    import os
    import json

    from dotenv import load_dotenv
    from flask import Flask, request, jsonify

    from box_client import get_box_client
    from extract import extract_invoice_fields
    from metadata import apply_metadata

    load_dotenv()
    app = Flask(__name__)

    @app.route("/webhook", methods=["POST"])
    def handle_webhook():
        payload = request.get_json()

        if payload.get("trigger") != "FILE.UPLOADED":
            return jsonify({"status": "ignored"}), 200

        file_id = payload["source"]["id"]
        file_name = payload["source"]["name"]

        if not file_name.lower().endswith(".pdf"):
            return jsonify({"status": "skipped, not a PDF"}), 200

        client = get_box_client()

        extracted = extract_invoice_fields(client, file_id)
        print(f"Extracted from {file_name}: {json.dumps(extracted, indent=2)}")

        apply_metadata(client, file_id, extracted)
        print(f"Metadata applied to file {file_id}")

        return jsonify({"status": "processed", "file_id": file_id}), 200

    if __name__ == "__main__":
        app.run(port=5000)
    ```

    <Note>
      In production, you should verify webhook signatures to confirm that requests originate from Box. See <Link href="/guides/webhooks/v2/signatures-v2">Verify webhook signatures</Link> for implementation details.
    </Note>

    At this point, your project directory should contain the following files:

    ```
    invoice-intake/
    ├── .env
    ├── .venv/
    ├── app.py
    ├── box_client.py
    ├── extract.py
    └── metadata.py
    ```
  </Step>

  <Step title="Test the integration">
    You can test the extraction pipeline locally without setting up a public URL or webhook. This step simulates what Box would send when a new file arrives.

    <Note>
      This step requires **two terminal windows** open at the same time. Terminal 1 runs the Flask server (which must stay running). Terminal 2 sends a test request to it.
    </Note>

    **Terminal 1 - start the server:**

    Make sure you are in the `invoice-intake` directory and the virtual environment is activated:

    ```bash theme={null}
    cd ~/invoice-intake
    source .venv/bin/activate
    python3 app.py
    ```

    You should see:

    ```
    * Running on http://127.0.0.1:5000
    ```

    Leave this terminal running.

    **Terminal 2 - send a test request:**

    Open a new terminal tab or window. Send a simulated webhook payload using curl. Replace `<FILE_ID>` with the **file ID** of the invoice PDF you uploaded to Box.

    <Warning>
      Use the file ID, not the folder ID. The file ID comes from the file's URL: `https://app.box.com/file/123456789` → the file ID is `123456789`. The folder ID comes from a different URL pattern: `https://app.box.com/folder/987654321`.
    </Warning>

    ```bash theme={null}
    curl -X POST http://127.0.0.1:5000/webhook \
      -H "Content-Type: application/json" \
      -d '{
        "trigger": "FILE.UPLOADED",
        "source": {
          "id": "<FILE_ID>",
          "name": "sample-invoice.pdf"
        }
      }'
    ```

    **Check the result:**

    Switch back to Terminal 1. You should see the extracted fields printed, followed by a confirmation that metadata was applied:

    ```
    Extracted from sample-invoice.pdf: {
      "vendorName": "ACME Corp",
      "invoiceNumber": "INV-001",
      "invoiceDate": "2025-03-15T00:00:00Z",
      "dueDate": "2025-04-15T00:00:00Z",
      "totalAmount": 1250.00,
      "currency": "USD"
    }
    Metadata applied to file 123456789
    ```

    Open the file in Box and click the **Metadata** tab to verify the values were written correctly.
  </Step>

  <Step title="Register a webhook (production)">
    The local curl test simulates what Box sends, but for a production deployment you need Box to send real webhook notifications automatically. This requires a publicly accessible HTTPS endpoint:

    ```bash theme={null}
    curl -X POST https://api.box.com/2.0/webhooks \
      -H "Authorization: Bearer <ACCESS_TOKEN>" \
      -H "Content-Type: application/json" \
      -d '{
        "target": {
          "id": "<FOLDER_ID>",
          "type": "folder"
        },
        "address": "<webhook_url>",
        "triggers": ["FILE.UPLOADED"]
      }'
    ```

    Replace `<FOLDER_ID>` with your invoices folder ID and update the `address` to your tunnel URL with `/webhook` appended.

    Once registered, any PDF uploaded to the folder automatically triggers extraction and metadata application.
  </Step>
</Steps>

## Troubleshooting

<AccordionGroup>
  <Accordion title="ModuleNotFoundError: No module named '...'">
    Your virtual environment is not activated. Run `source .venv/bin/activate` from the project directory before running any `python3` commands. Each new terminal tab needs its own activation.
  </Accordion>

  <Accordion title="invalid_client: The client credentials are invalid">
    Check your `.env` file:

    * Verify `BOX_CLIENT_ID` and `BOX_CLIENT_SECRET` match the values in Developer Console > Configuration.
    * Confirm `BOX_ENTERPRISE_ID` is your enterprise ID (found in Admin Console > Account & Billing, or Developer Console > icon in the top-right > Copy Enterprise ID).
    * Ensure your app is authorized in the Developer Console.
    * Make sure the app type is Client Credentials Grant.
  </Accordion>

  <Accordion title="404 Not Found">
    The service account does not have access to the file or folder. Invite the service account email (found in Developer Console > General Settings) as a collaborator with **Editor** role on the folder containing your invoice files.
  </Accordion>

  <Accordion title="metadata_template must have both scope and template_key">
    The `BOX_METADATA_TEMPLATE_KEY` value in your `.env` file is missing or empty. Add the template key you noted when creating the metadata template in Step 1.
  </Accordion>
</AccordionGroup>

## Optional: push totals to an ERP

Once you have structured metadata, pushing data downstream is straightforward. Add an ERP integration step after metadata is applied:

```python theme={null}
def push_to_erp(extracted: dict, file_id: str):
    """Send extracted totals to your ERP system."""
    erp_payload = {
        "vendor": extracted.get("vendorName"),
        "invoice_number": extracted.get("invoiceNumber"),
        "total": extracted.get("totalAmount"),
        "currency": extracted.get("currency"),
        "due_date": extracted.get("dueDate"),
        "source_file_id": file_id,
    }
    # Replace with your ERP's API endpoint
    # requests.post("https://erp.example.com/api/invoices", json=erp_payload)
    print(f"ERP payload ready: {erp_payload}")
```

## Scaling to production

<AccordionGroup>
  <Accordion title="Handle duplicate deliveries">
    Box webhook delivery can produce duplicates. Make your handler idempotent by checking whether metadata already exists on the file before processing. Use `GET /2.0/files/:id/metadata/enterprise/:template` and skip extraction if an instance is already present.
  </Accordion>

  <Accordion title="Process at scale with event streams">
    For high-volume environments, consider using <Link href="/guides/events/enterprise-events/for-enterprise">enterprise events</Link> instead of webhooks. Enterprise events provide a durable, polling-based stream that is better suited to batch processing thousands of invoices.
  </Accordion>

  <Accordion title="Use the Enhanced Extract Agent for complex invoices">
    If your invoices contain complex layouts, multi-page line items, or non-standard formatting, use the <Link href="/guides/box-ai/quick-start/box-ai-extract-enhanced">Enhanced Extract Agent</Link> for improved accuracy. Specify the agent in your extraction call:

    ```python theme={null}
    from box_sdk_gen import AiAgentReference, AiAgentReferenceTypeField

    enhanced_agent = AiAgentReference(
        id="enhanced_extract_agent",
        type=AiAgentReferenceTypeField.AI_AGENT_ID,
    )
    ```
  </Accordion>
</AccordionGroup>

## Next steps

<CardGroup cols={2}>
  <Card title="Sales RFP answer bank" href="/guides/tutorials/sales-rfp-answer-bank" icon="magnifying-glass" arrow="true">
    Build an AI-powered knowledge base for sales teams using Box Hubs and Box AI.
  </Card>

  <Card title="Extract API reference" href="/reference/post-ai-extract-structured" icon="code" arrow="true">
    See the full API specification for structured extraction.
  </Card>
</CardGroup>

<RelatedLinks
  title="RELATED GUIDES"
  items={[
{ label: translate("Extract metadata from file (structured)"), href: "/guides/box-ai/ai-tutorials/extract-metadata-structured", badge: "GUIDE" },
{ label: translate("Extract structured data quick start"), href: "/guides/box-ai/quick-start/box-ai-extract", badge: "QUICKSTART" },
{ label: translate("Extract APIs overview and use cases"), href: "/guides/box-ai/ai-tutorials/extract-use-cases", badge: "GUIDE" },
{ label: translate("Verify webhook signatures"), href: "/guides/webhooks/v2/signatures-v2", badge: "GUIDE" },
{ label: translate("Working with metadata"), href: "/guides/metadata/index", badge: "GUIDE" }
]}
/>
