Semester.ly

Johns Hopkins University | EN.601.762

Multimodal Understanding and Generation

3.0

credits

Average Course Rating

(-1)

This course provides a deep dive into modern Multimodal AI, focusing on models that integrate vision and language data. Topics include visual question answering, generative media (images/video), neuro-symbolic AI, and embodied AI agents. Through weekly paper reviews and a hands-on independent research project, students will gain the technical skills to understand and advance the state-of-the-art in multimodal deep learning. Required Course Background: at least one upper-level/grad course in vision, NLP or machine learning.

No Course Evaluations found

Lecture Sections

(01)

No location info
J. Cho
13:30 - 14:45

(02)

No location info
J. Cho
13:30 - 14:45