> ## Documentation Index
> Fetch the complete documentation index at: https://developer.box.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Structured extraction with the Box AI Enhanced Extract Agent in Python

> Learn how to use the Box AI Enhanced Extract Agent to automatically extract data from large, complex documents using the Box Python SDK.

export const SignupCTA = ({children}) => {
  return <div className="flex flex-wrap items-center gap-4 p-5 rounded-lg border border-gray-200 dark:border-gray-700 my-6" style={{
    background: "linear-gradient(135deg, rgba(0, 97, 213, 0.06), rgba(0, 97, 213, 0.02))"
  }}>
      <div className="flex-1 text-sm leading-relaxed text-gray-700 dark:text-gray-300" style={{
    minWidth: "280px"
  }}>
        {children}
      </div>
      <div className="flex flex-col items-center gap-2">
        <a href="https://account.box.com/signup/developer#ty9l3" className="signup-cta-button inline-flex items-center whitespace-nowrap px-5 py-2 text-sm font-semibold text-white no-underline">
          Get started for free
        </a>
        <a href="https://account.box.com/developers/console" className="signup-cta-login text-xs text-gray-500 dark:text-gray-400 no-underline whitespace-nowrap">
          Already have an account? Log in
        </a>
      </div>
    </div>;
};

export const Link = ({href, children, className, ...props}) => {
  const localizedHref = href;
  return <a href={localizedHref} className={className} {...props}>
      {children}
    </a>;
};

<Link href="/ai">Box AI</Link> exposes intelligent extraction capabilities that enable developers to automatically extract
key-value pairs from documents through a single API call. The Enhanced Extract Agent uses advanced reasoning models
to transform complex unstructured document content into actionable metadata without manual data entry, streamlining
document processing workflows for invoices, forms, contracts, and other business documents.

This quick start demonstrates how to configure the Box Python SDK and use Box AI to extract data from a stock purchase
agreement stored in Box.

<SignupCTA>
  A free developer account gives you access to the Box AI API. Try document summarization, question answering, and metadata extraction through the API.
</SignupCTA>

<Steps>
  <Step title="Create and configure a Box application">
    The first step for any Box Platform integration is to create and configure a Box
    application.

    1. Go to <Link href="https://app.box.com/developers/console">Box Developer Console</Link>.
    2. For this quick start, create an App with the `Client Credentials Grant`
       application type.
    3. Once the app is created, enable the following scopes:
       * Read all files and folders stored in Box
       * Write all files and folders stored in Box
       * Manage AI

    For more information about creating a new Box application, see <Link href="/guides/getting-started/first-application#create-your-first-application">Create your first application</Link>.
  </Step>

  <Step title="Upload a test file">
    After preparing the template, select a file to test. For this quick start, use this
    <Link href="/static/quickstarts/extract/files/enhanced-extract-file.pdf">sample stock purchase agreement</Link>.

    1. Download the test document, and then drag and drop it into your Box account.
    2. Get the file ID by opening the file in Box and inspecting the URL.
       The last part of the path is your file ID. For example, the URL might look like this:
       `https://app.box.com/file/2064123286902`

       In this case, the file ID is `2064123286902`.
  </Step>

  <Step title="Configure the environment">
    Now set up your development environment to run the extraction. For this quick start, use Python and the
    Box Python SDK version 10. Make sure you have Python 3.11 or higher installed on your machine.

    1. Create a new directory for your project and navigate into it.
    2. Create a virtual environment:
       ```bash theme={null}
       python3 -m venv .venv
       source .venv/bin/activate
       ```
    3. Install the Box Python SDK:
       ```bash theme={null}
       pip install "boxsdk~=10"
       ```
    4. Install the `python-dotenv` package to load environment variables from the `.env` file:
       ```bash theme={null}
       pip install python-dotenv
       ```
    5. Create an `.env` file in the root of your project directory and add the following
       environment variables, replacing the placeholder values with your actual Box app credentials
       and the IDs from the previous steps:
       ```bash theme={null}
        BOX_DEVELOPER_TOKEN=your_box_developer_token

        BOX_METADATA_TEMPLATE_KEY=your_metadata_template_key
        BOX_FILE_ID=your_box_file_id
       ```

    To get your developer token, go to the Box Developer Console, open your app, and navigate to
    the **Configuration** tab.

    6. Click **Generate Developer Token** to create a new token.

    <Tip>
      For simplicity, this quick start uses a short-lived developer token. In production, you
      should authenticate using your app’s configured method (for example, Client Credentials Grant)
      instead of a developer token.
    </Tip>
  </Step>

  <Step title="Create the enhanced-extract.py file">
    Your development environment is now ready to create the Python script to extract data from the document using Box AI.

    1. Create a new file named `enhanced-extract.py` in the root of your project directory and add the following code:

       ```python theme={null}
       import os

       from dotenv import load_dotenv

       from box_sdk_gen import (
           AiAgentReference,
           AiAgentReferenceTypeField,
           AiItemBase,
           BoxClient,
           BoxDeveloperTokenAuth,
           CreateAiExtractStructuredFields,
           CreateAiExtractStructuredFieldsOptionsField
       )

       load_dotenv()

       developer_token = os.getenv("BOX_DEVELOPER_TOKEN")
       file_id = os.getenv("BOX_FILE_ID")

       def get_box_client(token: str) -> BoxClient:
           
           if not developer_token:
               raise ValueError("BOX_DEVELOPER_TOKEN is not set in environment variables.")
           
           auth = BoxDeveloperTokenAuth(token=token)
           client = BoxClient(auth=auth)

           return client

       def main():
           client = get_box_client(token=developer_token)

           me = client.users.get_user_me()
           print(f"My user ID is {me.id}")

       if __name__ == "__main__":
           main()
       ```

       This code loads the environment variables from the `.env` file, initializes the Box SDK client,
       and prints the current user's ID to validate that the client is working correctly.

    2. Run the script using the following command in your terminal:

       ```bash theme={null}
       python enhanced-extract.py
       ```

       If the Box SDK client is set up correctly, the console displays your user ID. For example:

       ```bash theme={null}
       My user ID is 123456789
       ```
  </Step>

  <Step title="Extract data">
    With a working Box SDK client, you can add the code to extract data from the document using Box AI.

    1. Between the `get_box_client` function and the `main` function, add the following function:

       ```python theme={null}
       def extract_metadata(client: BoxClient, file_id: str) -> dict:

           enhanced_extract_config = AiAgentReference(
               id="enhanced_extract_agent",
               type=AiAgentReferenceTypeField.AI_AGENT_ID
           )

           fields=[
               CreateAiExtractStructuredFields(
                   key="parties",
                   display_name="Parties",
                   description="The named parties involved",
                   prompt="A comma separated list of the named parties involved",
                   type="string",
               ),
               CreateAiExtractStructuredFields(
                   key="effectiveDate",
                   display_name="Effective date",
                   description="The effective date of the contract",
                   prompt="The effective date of the contract",
                   type="date",
               ),
               CreateAiExtractStructuredFields(
                   key="purchasePrice",
                   display_name="Purchase price",
                   description="The purchase price stated in the contract",
                   prompt="The purchase price stated in the contract",
                   type="float",
               ),
               CreateAiExtractStructuredFields(
                   key="summary",
                   display_name="Summary",
                   description="A summary of the contract in 50 words or less",
                   prompt="A summary of the contract in 50 words or less including key obligations",
                   type="string",
               ),
               CreateAiExtractStructuredFields(
                   key="recommendation",
                   display_name="Recommendation",
                   description="Should we make this purchase?",
                   prompt="Given the financial details, would you recommend proceeding with the purchase? Answer Yes or No.",
                   type="enum",
                   options=[
                       CreateAiExtractStructuredFieldsOptionsField(key="Yes"),
                       CreateAiExtractStructuredFieldsOptionsField(key="No"),
                   ],
               ),
           ]

           metadata = client.ai.create_ai_extract_structured(
               [ AiItemBase(id=file_id) ],
               fields=fields,
               ai_agent=enhanced_extract_config,
           )
           
           return metadata.to_dict()['answer']

       ```

       This function uses the Box AI `create_ai_extract_structured` method to extract metadata from the specified file.
       Your BoxClient and the file ID are sent to the function, which returns the extracted metadata as a dictionary.

           <Tip>
             The `fields` parameter defines the specific data points to extract from the document. You can also reference
             a metadata template key instead to extract fields defined in a Box metadata template.
           </Tip>

    2. Add the function call to extract the metadata in the `main` function. Ensure that the new `main` function contains the following logic:

       ```python theme={null}
       def main():
           client = get_box_client(token=developer_token)

           me = client.users.get_user_me()
           print(f"My user ID is {me.id}")

           metadata = extract_metadata(client=client, file_id=file_id)

           print(f"\n\nExtracted Metadata: {metadata}")
       ```

       The SDK handles the API call to Box AI and returns the extracted metadata as an
       <Link href="/reference/resources/ai-extract-structured-response">`AiExtractStructuredResponse`</Link> object. In this quick start,
       the code converts this object to a dictionary and returns the `answer` field that contains the extracted
       key/value pairs.

    3. Print out the extracted metadata to the console to verify that the extraction was successful by running the following command in your terminal:

       ```bash theme={null}
       python enhanced-extract.py
       ```

       If the extraction was successful, the console displays your user ID followed by the extracted metadata from the stock purchase agreement.

       ```bash theme={null}
       My user ID is 123456789

       Extracted Metadata: {'parties': 'Argyle LLP, Suregood Family Trust', 'effectiveDate': '2023-03-31', 'purchasePrice': 231000000, 'summary': 'Argyle LLP agrees to purchase 51% of Erebor Life, Inc. from Suregood Family Trust for $231,000,000. The Seller must operate the business normally until closing, and the Buyer must pay the purchase price. The agreement is effective March 31, 2023.', 'recommendation': 'Yes'}
       ```
  </Step>
</Steps>

## Resources

* <Link href="https://github.com/box-community/box-quickstarts/tree/main/box-ai-enhanced-extract-quickstart">Final code</Link>
* <Link href="/reference/post-ai-extract-structured">Extract Structured Metadata API Reference</Link>
* <Link href="https://github.com/box/box-python-sdk/tree/main">Box Python SDK</Link>
