Automatically Identify and Label Photos with Microsoft Computer Vision API

This post was written by Daichi Ishida, Sales Engineer - Technical Specialist at Box

Leveraging modern AI platforms, you can automatically index your photos based on what's in them. In this example, we are going to explain how to send photos stored within Box to Azure's Computer Vision API, and then add a description of the photo and metadata based on the context of the photo.

Here is a diagram showing the call flow:

Prerequisites for this setup:

  1. Azure developer account
  2. A Box Application w/ Web integration integration callback
  3. A web server to receive the web callback and process your request (This can also be hosted on Azure depending on what kind of setup you want to use)

First, you need to sign up for a developer account with Microsoft Azure and need to start up a Cognitive Service instance. Check out this guide for setting up an Azure Cognitive Service instance.

Once you've finished signing up with Azure, take a note of the endpoint URL and 0cp-Apim-Subscription-key. We'll be using that later.

Now, navigate to your Box Developer console and setup a web integration. You need to specify the Client Callback URL. This is the web server that will be receiving the call from the Box Web application. Specify the #file_id# as the callback parameter. We will be using this to get the photo image to send to Azure and update the file with the parsed data.

From here you'll need to create an application to receive the parameter and pass it on to Azure.

Azure Computer Vision API returns analyzed data in a JSON format. Here's an example response from Azure.

            "text":"a cat laying on a sofa",

In the following example, we are going to parse the description field and apply the data as a description using the Box Ruby SDK.

require 'sinatra'
require 'boxr'
require 'rest-client'
require 'hashie'
require 'json'

post '/describe_photo' do
#parse the query parameter from the web integration request
fileid = params[:fileid]
        client =[ACCESS_TOKEN])
        dlphoto = client.download_url(fileid, version: nil)

#includ the download url in to your request to your Azure instance.
payload = {url: dlphoto}

#prepare to send data to Azure
request = '', payload.to_json,headers={"content_type": "json","Ocp-Apim-Subscription-Key": "xxxxxxxxxxxxxxxxx"}

#parse the response and extract the description field
hash = JSON.parse request.body
obj = hash

#set description field
desc = obj.description.captions[0].text

#update the file with the parsed description
        client.update_file(fileid,description: desc.to_s)

You can also extract the tags from the JSON and store them as metadata in Box.

hash = JSON.parse request.body
           obj = hash
           tags = obj.description.tags

tags.each {|tag| 
metadata = {}
#push the desired value in to a hash
#include the metadata hash to your metadata request.

You can now right click on a photo in Box and send it to Azure for image recognition!

API Documentation

The first API call in the Ruby example above uses the Box Update File Info endpoint. The second API call in the Ruby example above uses the Box Add Metadata to a File endpoint. To explore all the Box API features check out our API reference documentation.

Getting Started with Box API

This tutorial highlighted the power of the Box API. If you want to test out Box API in your application, click here to create a free developer account.

If you have any questions about this tutorial, please feel free to ask in the Developer forum within Box Community.

Box Developers Newsletter

Subscribe to our monthly developer newsletter to get our latest product announcements and blog posts like this one in your inbox.