Multimodal Understanding and Generation

3.0

credits

Average Course Rating

(-1)

This course provides a deep dive into modern Multimodal AI, focusing on models that integrate vision and language data. Topics include visual question answering, generative media (images/video), neuro-symbolic AI, and embodied AI agents. Through weekly paper reviews and a hands-on independent research project, students will gain the technical skills to understand and advance the state-of-the-art in multimodal deep learning. Required Course Background: at least one upper-level/grad course in vision, NLP or machine learning.

No Course Evaluations found

Course evaluations, professor ratings, and more!

Lecture Sections

Semester.ly

Johns Hopkins University | EN.601.762

Multimodal Understanding and Generation

3.0

Average Course Rating

Lecture Sections

(01)

No location info

J. Cho

13:30 - 14:45

(02)

No location info

J. Cho

13:30 - 14:45