Multimodal Understanding and Generation
3.0
creditsAverage Course Rating
This course provides a deep dive into modern Multimodal AI, focusing on models that integrate vision and language data. Topics include visual question answering, generative media (images/video), neuro-symbolic AI, and embodied AI agents. Through weekly paper reviews and a hands-on independent research project, students will gain the technical skills to understand and advance the state-of-the-art in multimodal deep learning. Required Course Background: at least one upper-level/grad course in vision, NLP or machine learning.
No Course Evaluations found