Large Scale Computing on the Cloud

2.0

credits

Average Course Rating

(-1)

Internet of Things (IoT) is connecting almost all the components together in every aspect of business and our daily life. As a result, huge amount of data is being generated. The term “big data” implies the large scale of data that cannot be stored on one single computer. The analyses of such large-scaled data usually require massively parallel software running on tens, hundreds, or even thousands of servers. Enterprise technology managers are often called upon to organize large-scaled data repositories, to manage and schedule resources between technology components, and to support decision making based on information that resides in distributed data sources. This course prepares students with fundamental concepts of distributed data systems and massive dataset mining algorithms. It equips students with advanced techniques to extract the value from the large-scaled data generated and collected in everyday business life. The course uses a hands-on, learning-by-doing approach to practice on AWS platform. Topics include: MapReduce model, distributed file system (HDFS), advanced MapReduce (Spark), distributed data warehouse and query language (Hive), distributed scripting language (Apache Pig), frequent itenset mining problem, text mining, and recommendation engine. The focus is on creating awareness of the technologies, allowing some level of familiarity with them through assignments, and enabling some strategic thinking around the use of these in business.

Semester.ly

Johns Hopkins University | BU.330.740

Large Scale Computing on the Cloud

2.0

Average Course Rating

Lecture Sections

(T1)

No location info

J. Abed

08:15 - 11:15

(T2)

No location info

J. Abed

14:30 - 17:30