Big Data
Computer Science & Information Systems
Big Data is a buzzword. It regards storing and processing large amounts of data. In this course, we discuss the following topics in Big Data:
- Big Data Definition
- Big Data Characteristics and Challenges
- Hadoop
- Hadoop Distributed File System (HDFS)
- MapReduce Programming
- Apache Spark
- Resilient Distributed Datasets (RDDs)
- Pair Resilient Distributed Datasets (PairRDDs)
- Spark SQL
- Pandas on Spark
Below you will find the main datasets used in this course and their respective link.
Dataset | Link | |
---|---|---|
Airports.csv | Link | |
Bible.txt | Link | |
Forest Fire | Link | |
JY157487.1 | Link | |
RealEstate | Link | |
Transactions (sample) | Link | |
UK Makerspace | Link | |
UK Postcode | Link | |
Give me Loan | Link |
You will also find a setup for your computer here.