What is Exploratory Data Analysis in machine learning?
Updated: Nov 3, 2022
EDA is a process of summarizing data, identifying patterns and relationships, and detecting outliers. This type of data analysis in machine learning is most often used when the data set is large or complex, and it can help with data comprehension.
There are various techniques present for it but the most common and superior among them are by plotting graphs and also statistical methods like calculating summary statistics, which helps us understand what is exploratory data analysis.
Steps involved in Exploratory Data Analysis
Let us look into various steps in EDA.
This step tells us, what actually happened!
This step tells us, why did it happen?
This step tells us, what will happen next.
This step tells us, what should we do next.
Exploratory Data Analysis in machine learning is used to comprehensively understand the data and discover all of its characteristics, typically by employing visual techniques. this will help you to find interesting patterns in it.
The General EDA process is as follows:-
Load .csv files: A CSV(comma-separated values) file is a type of text file that saves data in a table-structured format using a specific format.
Dataset Information: You need to first understand your dataset in order to perform an EDA. It includes understanding data types, columns, and other relevant information.
Data Cleaning: To perform EDA, your data must be cleaned first. It includes transforming raw data into a suitable format.
Summary of Statistics: For this, your sample data is summarized and Informed by summary statistics.
Dealing with Missing Values: Missing data are variables that are not stored in the given dataset. There are various methods to deal with it.
Correlation: This method tells us how one variable is related to another and to what extent.
There are various graphs present in machine learning which helps in performing EDA well. We will discuss it in further posts.