Gemini x Pipecat Virtual Hackathon: Build Adaptive Agents with Real-Time Intelligence

Oct

Saturday

October 11th, 2025 • from 9PM (PDT)

Oct

Sunday

October 19th, 2025 • until 5PM (PDT)

Online & Global

Event Ended

This event has already taken place.

Gallery Hackathon Portal

Attendees include engineers and leaders from Meta, Google, and Amazon, specializing in AI/ML, Python, and JavaScript, alongside VentureBeat’s 2025 Top Woman in AI Research.

Gemini x Pipecat Virtual Hackathon: Build Adaptive Agents with Real-Time Intelligence

Name: Gemini x Pipecat Virtual Hackathon: Build Adaptive Agents with Real-Time Intelligence
Start: 2025-10-11T21:00:00-07:00
End: 2025-10-19T17:00:00-07:00
Location: Online & Global

Overview

Can’t make it to the YC office? Join the virtual track of the Gemini × Pipecat Hackathon — a global builder sprint running from Saturday, Oct 11 through Sunday, Oct 19.
Over nine days, developers around the world will prototype voice, video, and multimodal agents powered by Gemini 2.5 models and Pipecat’s real-time orchestration stack, alongside co-sponsors Boundary, Coval, Daily, Langfuse, and Tavus.

This is not a startup pitch competition. It’s a learning-and-showing hackathon: working code, open demos, and shared discoveries. Every submission becomes part of an open gallery so others can learn from your approach.

Theme & Challenge

We want you to:

Build something you think is interesting.
Using Gemini models and Pipecat.
Write up what you’ve done so other people can learn from it.
Give feedback about any of the tools you’ve used, for the benefit of the teams and open source communities that support the tools.

Your project must use both Gemini and Pipecat in some way. Everything else is up to you.

Prizes

3 Prizes will be awarded at the end of the week:

$100,000 of Gemini Credits – awarded by Google
$100,000 of Pipecat Cloud Credits – awarded by Daily
$50,000 Gemini credits + $50,000 Pipecat Cloud credits to the best Gemini Live API + Pipecat project

What to Build

Voice or video agents with Gemini Live API
Multimodal systems combining speech, vision, and actions
End-to-end orchestration pipelines with Langfuse for tracing or analytics
Hardware or embedded agents (ESP32, Raspberry Pi, etc.)
Novel extensions of Boundary, Coval, Daily, or Tavus stacks
Any adaptive application that showcases runtime reasoning, memory, or feedback loops

How It Works

Register: Apply via this page. Once accepted, enter the Hackathon Portal and get to work.
Build: Hack anytime between Oct 11 – 19. Solo or in teams up to 5.
Submit: Projects due Sunday Oct 19 at 5:00 PM PT.
Judging & Showcase: Top demos featured online + select teams invited to present to DeepMind engineers.

All projects are public and open-source — this is a community learning sprint.

Submission Requirements

Your public GitHub repo must include a README.md with:

What is this? One-sentence summary.
Demo video < 60-120 seconds (show the product, not slides).
How you used Gemini + Pipecat.
Other tools or integrations (Boundary, Coval, Daily, Langfuse, Tavus, etc.).
What you built during the hackathon vs. prior work.
Feedback on the tools you used.
(Optional but encouraged) Live demo link.

Getting Started

General advice
The Gemini models are really, really good – fast, multimodal, long context, good at both instruction following and tool calling.

If you’re building a conversational agent for production, you want to use the three-model approach, with Gemini 2.5 Flash operating between a text-to-speech and a speech-to-text model. This is the most reliable and powerful architecture. This is the GoogleLLMService in Pipecat. See this bot file.

If you’re experimenting, or want to see what the direction that models and APIs are evolving towards, you can use the Gemini Live API. This is a “speech-to-speech” API that leverages Gemini’s native audio-to-audio capabilities. However, it’s harder to debug the Live API, audio is faster but transcription is slower, it’s harder to achieve predictable inference results, and there are still “pre-production” bugs in the API and model performance. This is the GeminiLiveLLMService in Pipecat. See this bot file.

Image generation with Nano Banana (gemini-2.5-flash-image) is super fun! See this bot file.

For a deep dive into building voice agents, you can refer to the Voice AI & Voice Agents Illustrated Primer.

For general discussion about multimodal, conversational AI, join the Pipecat Discord.

We’ve created several starter kits for this hackathon that cover various use cases.