Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developer.box.com/llms.txt

Use this file to discover all available pages before exploring further.

This tutorial walks you through building a retrieval-augmented generation (RAG) workflow by embedding Box content into a Weaviate vector database and using Weaviate’s Query Agent to answer questions about your data. The complete recipe is available as a Jupyter Notebook in the Weaviate recipes repository.

Overview

What is Weaviate?

Weaviate is an open-source vector database built for speed, scale, and AI-driven search. It stores data as objects and vectors, letting you combine semantic search via embeddings with structured filtering. Weaviate is cloud native, fault tolerant, and integrates directly with large language models (LLMs).

How Box and Weaviate create an end-to-end RAG solution

RAG is a technique that pairs vector search (retrieval) with a language model (generation) to answer questions using your own data. The flow works as follows:
  • Content storage: Box holds your files (PDFs, docs, text reports, and other supported formats).
  • Embedding creation: Text is extracted from your Box files, chunked, and converted into vector embeddings using Weaviate Embeddings.
  • Querying: Weaviate’s Query Agent takes a natural language question, generates the necessary search and aggregation queries, and returns a single answer — all using agentic RAG.

Prerequisites

Before you begin, make sure you have the following:

Get a Box developer token

  1. Click New App in the top right corner.
  2. Enter an app name and select OAuth 2.0 as the authentication method.
  3. Click Create App.
  4. Under Application Scopes, add read/write scopes for files if not already enabled, then click Save Changes.
  5. From the Configuration tab, copy and save the developer token. You will need it for the notebook.
Developer tokens are valid for 60 minutes. If your session takes longer, you will need to generate a new token.

Create a Weaviate cluster

  1. Log in to Weaviate Cloud.
  2. Create a new cluster from the dashboard. You can name it whatever you like.
  3. Once the cluster is ready, go to the Details tab and note the cluster URL and API key.

Run the recipe

Clone the repository

Clone or download the Weaviate recipes repository:
git clone https://github.com/weaviate/recipes.git
Navigate to the Box integration folder:
cd recipes/integrations/data-platforms/box
Open the Jupyter Notebook (weaviate_box.ipynb) in your development environment.
Weaviate recipes repository structure

Configure authentication

The notebook includes a step to set authentication variables. Update the code block in step 3 with:
  • Your Box developer token
  • Your Weaviate cluster URL and API key
Authentication variables in the notebook

Run the notebook

Execute each cell in the notebook sequentially. The notebook:
  1. Uploads demo files to Box (or uses files you provide).
  2. Extracts text content from the Box files.
  3. Chunks the text and creates vector embeddings in Weaviate.
  4. Uses the Weaviate Query Agent to answer questions about the content.
The repository includes a demo_files folder with four 10-K financial reports for testing. You can replace these with your own files if you prefer to work with different content.
The final cell demonstrates querying your data. You can modify the query in step 7 to ask different questions based on your content.
Final answer from the Weaviate Query Agent

Next steps

Upload additional files to Box such as annual reports, articles, or any documents you want to search, and rerun the notebook to embed them in Weaviate.
Adjust the system_prompt parameter to change the agent’s behavior. For example, you can request more detailed analysis or a specific response format.
Weaviate offers additional agent types. The Transformation Agent can preprocess your data, and the Personalization Agent can tailor responses to individual users.

Resources

Weaviate recipe

The complete Jupyter Notebook in the Weaviate recipes repository.

Weaviate Query Agent

Learn about Weaviate’s agentic RAG capabilities.

Weaviate Embeddings

Documentation for Weaviate’s embedding service.

Box Developer Community

Share feedback and get support from other Box developers.