Semester.ly

Johns Hopkins University | EN.550.436

Data Mining

4.0

credits

Average Course Rating

(3.84)

Data mining is a relatively new term used in the academic and business world, often associated with the development and quantitative analysis of very large databases. Its definition covers a wide spectrum of analytic and information technology topics, such as machine learning, artificial intelligence, statistical modeling, and efficient database development. This course will review these broad topics, and cover specific analytic and modeling techniques. The students will learn the foundation of data visualization, classification, regression, clustering and dimensionality reduction. Although some of the mathematics underlying these techniques will be discussed, our focus will be on the application of the techniques to real data and the interpretation of results. Because use of the computer is extremely important when “mining” large amounts of data, we will make substantial use of software tools to learn the techniques and analyze datasets. In particular, students will program in Python and use Jupyter Notebooks during lectures, for the homework and the exams. Recommended Course Background: EN.550.413, EN.550.420, AS.171.205, EN.550.112

Fall 2012

(3.74)

Fall 2013

(4.12)

Fall 2014

(3.67)

Fall 2012

Professor: Bruno Jedynak

(3.74)

The course offers a good overview of data mining with some real-life applications. Students enjoyed the assignments and said they helped them understand the material. However, some found the lectures unclear and wished there was a textbook as an additional resource. Students should be familiar with R programming for this course. The course does not go into advanced methods, so students looking for an

Fall 2013

Professor: Bruno Jedynak

(4.12)

Students believed that the best aspects of this course included the very useful, applicable things they learned about machine learning. They thought it was good practice of the R language, and that there were plenty of reference materials. Students felt that the lecture notes did not have the information they needed, and were often confusing, vague, or too theoretical. They suggested making the materials more applicable to real world scenarios, focusing less on theory. Students also wanted well-developed 17 notes online that they could refer to in the future. Prospective students should have a strong background in statistics, as well as a familiarity with R programming.

Fall 2014

Professor: Bruno Jedynak

(3.67)

Students thought the best aspect of this class was the practicality of the course material. They believed that the weakest element of the course was the lectures which were sometimes hard to fol ow and did not always match consistently with the homework assignments. Students thought the course could be improved with better organized lecture notes. Students believed it was valuable for people considering taking this class to know that experience programming with R would be useful, as well as some knowledge of statistics.