This tutorial demonstrates how to use the OpenAI API’s batch endpoint to process multiple tasks efficiently, achieving a 50% cost savings with guaranteed results within 24 hours. The service is ideal for processing jobs that don’t require immediate responses.
Eligible researchers can email research-programming@wharton.upenn.edu to start the process of obtaining an OpenAI API key.
Import the necessary libraries:
import os import json import time import pandas as pd from dotenv import load_dotenv from openai import OpenAI
Load environment variables and initialize the OpenAI client:
# Load environment variables from .env file
load_dotenv()
# Read API key from environment variables
api_key = os.getenv("OPENAI_API_KEY")
# Initialize OpenAI client
client = OpenAI(api_key=api_key)
Load the list of famous people from a CSV file and display the first 10 entries:
df = pd.read_csv('famous_people.csv')
df.head(10)
Output:
| id | prompt | |
|---|---|---|
| 0 | 1 | Fela Kuti |
| 1 | 2 | Marie Curie |
| 2 | 3 | Albert Einstein |
| 3 | 4 | Nelson Mandela |
| 4 | 5 | Mahatma Gandhi |
| 5 | 6 | Frida Kahlo |
| 6 | 7 | Winston Churchill |
| 7 | 8 | Che Guevara |
| 8 | 9 | Bruce Lee |
| 9 | 10 | Serena Williams |
Create tasks for the batch endpoint using Structured Outputs:
Structured Outputs ensure that each batch result follows the same JSON schema, making the results easier to parse reliably.
birthplace_schema = {
"type": "object",
"properties": {
"city_or_town": {"type": "string"},
"country": {"type": "string"}
},
"required": ["city_or_town", "country"],
"additionalProperties": False
}
tasks = []
for _, row in df.iterrows():
person_id = str(row["id"])
person_name = row["prompt"]
task = {
"custom_id": person_id,
"method": "POST",
"url": "/v1/responses",
"body": {
"model": "gpt-4o-mini",
"input": [
{
"role": "system",
"content": "Identify factual birthplaces of well known individuals. Return only the requested structured data."
},
{
"role": "user",
"content": f"Identify the place of birth for {person_name}."
}
],
"text": {
"format": {
"type": "json_schema",
"name": "birthplace",
"strict": True,
"schema": birthplace_schema
}
}
}
}
tasks.append(task)
Inspect the first task:
tasks[0]
Save the tasks to a JSONL file:
file_name = "batch_tasks_birthplaces.jsonl"
with open(file_name, "w", encoding="utf-8") as file:
for task in tasks:
file.write(json.dumps(task) + "\n")
Upload the JSONL file to the OpenAI API:
batch_file = client.files.create(
file=open(file_name, "rb"),
purpose="batch"
)
print(batch_file)
Start the batch job:
batch_job = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/responses",
completion_window="24h"
)
Monitor the batch job until completion:
while True:
batch_job = client.batches.retrieve(batch_job.id)
if batch_job.status == "completed":
print(f"job {batch_job.id} is done")
break
if batch_job.status in ["failed", "expired", "cancelled"]:
raise RuntimeError(f"Batch job ended with status: {batch_job.status}")
time.sleep(10)
Retrieve and process the structured results:
results_by_id = {}
if batch_job.output_file_id:
result_bytes = client.files.content(batch_job.output_file_id).content
result_entries = result_bytes.decode("utf-8").strip().splitlines()
for entry in result_entries:
res = json.loads(entry)
custom_id = res["custom_id"]
if res.get("error"):
results_by_id[custom_id] = None
continue
body = res["response"]["body"]
text = body["output"][0]["content"][0]["text"]
data = json.loads(text)
results_by_id[custom_id] = f'{data["city_or_town"]}, {data["country"]}'
df["Place of Birth"] = df["id"].astype(str).map(results_by_id)
df
Output:
| id | prompt | Place of Birth | |
|---|---|---|---|
| 0 | 1 | Fela Kuti | Abeokuta,Nigeria |
| 1 | 2 | Marie Curie | Warsaw,Poland |
| 2 | 3 | Albert Einstein | Ulm,Germany |
| 3 | 4 | Nelson Mandela | Mvezo, South Africa |
| 4 | 5 | Mahatma Gandhi | Porbandar,India |
| 5 | 6 | Frida Kahlo | Coyoacán,Mexico |
| 6 | 7 | Winston Churchill | Woodstock, England |
| 7 | 8 | Che Guevara | Rosario,Argentina |
| 8 | 9 | Bruce Lee | San Francisco,United States |
| 9 | 10 | Serena Williams | Saginaw,United States |
