The field of computer science is experiencing a transition from computation-intensive to data-intensive problems, wherein data is produced in massive amounts by large sensor networks, simulations, and social networks. Efficiently extracting, interpreting, and learning from very large datasets imposes new challenges that affect the various components of the computing system including: computer architecture, operating system, data storage, and database management systems. As a result, in the recent past, a plethora of big data computing systems started to replace the traditional systems. This course gives an introduction to emerging big data computing systems and special consideration will be made to the Hadoop ecosystem. Some of the techniques that are used in managing big data have the origins in the research that have been going on for decades in the area of database management systems, parallel and distributed computing. The fundamental concepts on which the emerging big data management systems are based are discussed first. Once a foundation is defined, theoretical and practical concepts that are used to work with big data sets are studied. Tentative topics covered include data stream management, distributed file system, the map reduce paradigm, NoSQL, Pig, Hive, HBase, and Sqoop. The course will include hands on projects with the Hadoop ecosystem.
- Learns the definition, sources, and challenges of big data.
- Understands the similarities and differences between the emerging big data computing platforms and the traditional computing systems.
- Understands the key issues in big data management including distributed and parallel processing, data modeling, query languages, and transaction processing.
- Understands the principles of distributed file system.
- Understands and practice developing applications using the map reduce programming paradigm.
- Has a good knowledge of the Hadoop ecosystem as one of the currently most commonly used platforms to work with big data.
- Get introduced to some of the Hadoops ecosystem components including Hive, Pig, Sqoop, and HBase.