This course deals with the tools available in the big data ecosystem for extraction, transformation, and loading (ETL), data pipelines, and data storage. This enables participants to use more sophisticated ways to deal with data that has outgrown custom-made Python and SQL scripts. Using videos and hands-on exercises, participants will be exposed to different methods for managing and streaming data, in preparation for business intelligence reporting, analytics, and modeling.
What You Will Learn
Upon completion of this course, the learners are expected to:
- understand the different popular ETL frameworks and how they are used;
- learn how Apache Spark, Airflow, and Kafka are used to create real-time big data pipelines; and
- understand the advantages and disadvantages of using different types of storage and database solutions.
You will need a computer or laptop with Microsoft Excel installed. Computer or laptop requirements are:
- For Windows: Core i3 or better, 4GB RAM or better, MS Excel 2007 or better
- For MacBook: ideally MS Excel 2013 or newer should be installed (some functions require this version on the Mac). If the version of MS Excel is 2011, download and install StatPlus.