Batch Mode in the Gemini API: Process more for less

The new batch mode in the Gemini API is designed for high-throughput, non-latency-critical AI workloads, simplifying large jobs by handling scheduling and processing, and making tasks like data analysis, bulk content creation, and model evaluation more cost-effective and scalable, so developers can process large volumes of data efficiently.

Gemini models are now available in Batch Mode

Today, we’re excited to introduce a batch mode in the Gemini API, a new asynchronous endpoint designed specifically for high-throughput, non-latency-critical workloads. The Gemini API Batch Mode allows you to submit large jobs, offload the scheduling and processing, and retrieve your results within 24 hours—all at a 50% discount compared to our synchronous APIs.

Process more for less

Batch Mode is the perfect tool for any task where you have your data ready upfront and don’t need an immediate response. By separating these large jobs from your real-time traffic, you unlock three key benefits:

Cost savings: Batch jobs are priced at 50% less than the standard rate for a given model

Higher throughput: Batch Mode has even higher rate limits

Easy API calls: No need to manage complex client-side queuing or retry logic. Available results are returned within a 24-hour window.

A simple workflow for large jobs

We’ve designed the API to be simple and intuitive. You package all your requests into a single file, submit it, and retrieve your results once the job is complete. Here are some ways developers are leveraging Batch Mode for tasks today:

Bulk content generation and processing: Specializing in deep video understanding, Reforged Labs uses Gemini 2.5 Pro to analyze and label vast quantities of video ads monthly. Implementing Batch Mode has revolutionized their operations by significantly cutting costs, accelerating client deliverables, and enabling the massive scalability needed for meaningful market insights.

Get started in just a few lines of code

You can start using Batch Mode today with the Google GenAI Python SDK:

# Create a JSONL that contains these lines:
# {"key": "request_1", "request": {"contents": [{"parts": [{"text": "Explain how AI works in a few words"}]}]}},
# {"key": "request_2", "request": {"contents": [{"parts": [{"text": "Explain how quantum computing works in a few words"}]}]}} uploaded_batch_requests = client.files.upload(file="batch_requests.json") batch_job = client.batches.create( model="gemini-2.5-flash", src=uploaded_batch_requests.name, config={ 'display_name': "batch_job-1", },
) print(f"Created batch job: {batch_job.name}") # Wait for up to 24 hours if batch_job.state.name == 'JOB_STATE_SUCCEEDED': result_file_name = batch_job.dest.file_name file_content_bytes = client.files.download(file=result_file_name) file_content = file_content_bytes.decode('utf-8') for line in file_content.splitlines(): print(line)

Python

To learn more, check out the official documentation and pricing pages.

We’re rolling out Batch Mode for the Gemini API today and tomorrow to all users. This is just the start for batch processing, and we’re actively working on expanding its capabilities. Stay tuned for more powerful and flexible options!

Batch Mode in the Gemini API: Process more for less

Gemini models are now available in Batch Mode

Process more for less

A simple workflow for large jobs

Get started in just a few lines of code

Lasă un răspuns

Servicii

Utile

Categorii