Multimodal AI
Models that see, hear, and read at once.
Multimodal AI describes models that take in and reason across more than one kind of data — text, images, audio, and video — in a single system, aligning those modalities so the model can answer about an image, transcribe and act on speech, or ground a response in a chart.
7 episodes
- Gemini 3 & Robot Dogs: Inside Google DeepMind's AI Experiments | Paige Bailey
- The 2025 AI Shift: From Chat to Task Completion & Reliable Action | Galileo Founders
- Low-Code AI: From Requirements to Apps in Minutes | OutSystems' Rodrigo Coutinho
- The Making of Gemini 2.0: DeepMind's Approach to AI Development and Deployment | Logan Kilpatrick
- Practical Lessons for GenAI Evals | Chip Huyen & Vivienne Zhang
- GenAI Predictions for 2025 | Databricks & Cohere
- Got Agents? Agentic Workflows & Architecture | Weaviate, Unstructured & CrewAI
Explainer on this topic
Guests on this topic
Paige BaileyVikram ChatterjiAtindriyo SanyalRodrigo CoutinhoLogan KilpatrickChip HuyenVivienne ZhangSara HookerCraig WileyBrian RaymondBob van LuijtJoão Moura