Generative Ai for Speech and Audio

3.0

credits

Average Course Rating

(-1)

This course explores Generative AI for Speech and Audio, focusing on both the scientific foundations and cutting-edge applications. Students will study the principles of speech synthesis, text-to-speech (TTS), voice conversion (VC), singing voice synthesis, and expressive generation of emotion, prosody, and style. The course will also cover critical issues of voice deepfakes, speech privacy, and voice security. Special emphasis will be placed on cross-disciplinary applications, including medical and healthcare domains (e.g., voice restoration, speech therapy, and emotion-aware conversational agents). Through lectures, readings, and hands-on assignments, students will gain experience with modern deep learning frameworks for generative speech and audio, while critically analysing challenges such as evaluation, robustness, and ethical implications. The course is designed for advanced undergraduates and graduate students with prior knowledge of machine learning and will prepare them to engage in research or industry innovation in speech technology, AI safety, and healthcare applications.

No Course Evaluations found

Course evaluations, professor ratings, and more!

Semester.ly

Johns Hopkins University | EN.520.684

Generative Ai for Speech and Audio

3.0

Average Course Rating