Multimodal RAG: Chat with Videos

Course Syllabus

What You'll Learn

About This Course

This course, developed in collaboration with Intel, guides you in building an interactive system to query and understand video content through multimodal AI. Learn to implement a multimodal RAG system, utilizing multimodal embedding models for embedding images and captions in a semantic space, and leverage this setup for retrieval using text prompts.

Key Technologies and Concepts

Hands-on Project

Throughout the course, you’ll build a complete multimodal RAG system that:

Course Outline

Who Should Join?

This course is for anyone with intermediate Python programming knowledge, familiarity with machine learning concepts and deep learning frameworks, and a basic understanding of natural language processing and computer vision.