1 day – 6 September 2023
09:45 | Join in |
10:00 | Part 1 – Data Analysis and Data Preparation for Machine Learning |
12:00 | Lunch break |
13:00 | Part 2 – Data Analysis and Data Preparation for Machine Learning |
16:00 | End of course |
Participants will learn how to
- Get data into a suitable form
- Visualize data
- Clean data
- Transform data
- Analyze data
- Handle data that does not fit in memory
Program
- Overview
Participants learn why data needs to be pre-processed before being passed to ML methods. They also learn what the typical challenges are in data wrangling.
- Pandas
Participants get to know this powerful Python library and find out how they can load data into a data frame, get the look and feel of it and transform it in the best suitable way.
- NumPy
ML would simply not be possible in Python without this useful library for numerical operations. This is why participants will get to know the most important aspects of the API and what can be achieved with it.
- Matplotlib
Humans are visual beings and this is why we prefer looking at graphs, rather than endless tables of data. Matplotlib is the Python library to create all kinds of graphs which helps understand data a great deal more. Participants will learn how to create the most common graphs within Matplotlib.
- Dask
In ML problems, we often get to a situation where our data does not fit into memory. Even if it fits into memory, we would like some operations to run faster. Dask solves this problem by dividing our data into smaller, more manageable chunks. It then runs computations on those chunks in parallel, making it possible to handle data that is larger than memory. It is also faster since it makes computations run concurrently. Participants will get to know this tool and see the similarities with previously learned libraries.