Ai Safety, Alignment, & Governance

3.0

credits

Average Course Rating

(-1)

This course will focus on the alignment and governance challenges posed by advanced frontier/general purpose AI models: why these models may behave in ways that pose significant risk to human welfare and what technical and governance approaches might mitigate these risks. We’ll begin the course studying general results from alignment and governance in human normative systems such as markets, politics, norms and laws. We’ll pay special attention to risks arising from agentic AI. We’ll then look at current technical and position papers in various topics in AI safety and alignment. Topics could include: RLHF, constitutional AI, red-teaming, safety evaluation methods, red lines, jail-breaking, prompt injection, over-optimization, and open-source debates. We’ll conclude with discussion of regulatory frameworks such as regulatory markets, registration of frontier models, international governance organizations, registration of AI agents and legal personhood for AI agents. This is a paper-reading class.

Semester.ly

Johns Hopkins University | EN.601.469

Ai Safety, Alignment, & Governance

3.0

Average Course Rating

Lecture Sections

(01)

No location info

G. Hadfield

12:00 - 13:15