Distribution of data
The distribution of a data set is the shape of the graph when all possible values are plotted on a frequency graph (showing how often they occur). Usually, we are not able to collect all the data for our variable of interest. Therefore we take a sample. This sample is used to make conclusions about the whole data set. To be sure that the results of our sample will give us an accurate reflection of the whole data set we need to understand the limitations of sampling.
The following video from Crash Course Statistics reviews the common distribution, the Normal distribution, as well as skewed and bi-modal distributions.
This next video reviews the Normal distribution and links it to the Central Limit Theorem. This is important to understand confidence intervals and hypothesis testing.
Further information
- To learn key ideas of statistical data analysis and data science and how to use powerful analytic and visualisation tools see the Future Learn MOOC Data to Insight developed by The University of Auckland Statistics department.