How to compute the statistics of a DataFrame
• 1 minIt is often useful to look at the statistics of each columns in order to make a quick analysis of our data.
To do so, we can use the Pandas .describe() method.
You would end up with the following indicators
- The count
- The mean
- The minima
- The standard deviation
- The 25 percentile
- The 50 percentile
- The 75 percentile
- The maxima
Giving you a good overview of how your data looks like.
Categorical Variables
By default .describe() will only compute the statistics of numeric values, but if you have categorical variables, you could also use the include="all" which will include all variables types.
Timestamps
.describe() can also describe timestamp but you will have to specify it using the datetime_is_numeric=True parameter.