WinterStream AIMLH Best Use of Google Gemini @ UTRA Hacks 2026

WinterStream AI
timeline: 2026·team: Yang Yang Zhang, Jacky Li, Pavlos Constas-Malevanets, Lawrence Ding, Hayson Cheung, Eric Xie
tech stack: Next.js, React, TypeScript, FastAPI, Python, Google Gemini, ElevenLabs, YouTube IFrame API, WebSockets

Overview

Watching the Winter Olympics is incredible — if you can see it. For blind and visually impaired users, or even just newcomers trying to understand what's happening on screen, traditional livestreams fall flat. The action is entirely visual, the commentary assumes you're watching, and there's no easy way to just... ask a question.

WinterStream AI fixes that. It's an accessibility-first, audio-centric Olympics companion that watches the stream with you, narrates what's happening, and answers your questions out loud — in real time.

Inspiration

With the 2026 Winter Olympics approaching, we wanted to rethink how people experience live sports. A lot of viewers don't fully understand the rules or context of what they're watching — and for people with vision impairments, following a live event can be even harder. There's no easy way to ask "wait, what just happened?" and get a real answer.

So we built the thing that answers that question.

But honestly? We almost didn't build it at all.

Most of our weekend was spent on the robotics challenge. We had a strong start — robot fully built, obstacle course and target-shooting tasks both in range. We were testing late into the night when our final wheel axle snapped. By around 2 AM, after going through roughly 67 axles, the competition had completely run out of replacements. They'd been breaking for everyone. No parts left, no way to finish.

That was a brutal moment. But it was also a real one — hardware fails, constraints appear out of nowhere, and sometimes a single overlooked weakness brings an entire system down. So we pivoted. We'd been developing WinterStream AI in parallel for the MLH tracks, and when the hardware path closed, we went all-in on software.

We refined the product, focused on impact and usability, and delivered something we genuinely believed could be useful. Turns out that was the right call.

How It Works

WinterStream AI embeds any YouTube video or livestream and automatically fetches its transcript for context. Users can ask questions about what's happening — by typing or speaking — and get answers back as natural spoken audio via ElevenLabs text-to-speech. Ask "What just happened?" or "Why was that a penalty?" and get a clear, contextual answer immediately.

On the backend, FastAPI mediates between the frontend and the AI layer, with WebSockets handling real-time messaging so the back-and-forth feels fluid and conversational. Google Gemini powers the question-answering, taking both the video transcript and user query as context to generate accurate, descriptive responses — carefully worded to avoid visual language like "look at this" or "you can see here."

The UI is built screen-reader friendly from the ground up, with minimal reliance on visual cues. The whole experience is designed to work just as well with your eyes closed.

Results

Won MLH Best Use of Google Gemini at UTRA Hacks 2026.

Failure isn't the opposite of success. It's often part of the process that leads you there. We broke 67 axles, ran out of parts at 2 AM, and still walked away with a win — just not the one we originally planned for. Really proud of how the team responded when things went sideways.