The Mean and Standard Deviation Lecture 2
|
The Mean and Standard Deviation
|
|
-
Mean – the average for a data set
-
Median does not use all information
-
Calculate the mean by
-
Notation
-
X i is a data point, or an observation
-
n is the total number of observations
-
i is an index number
-
S is the summation symbol
-
Mean is central tendency; however, it is sensitive to outliers
-
Mode – the data point that occurs most frequently
-
If the probability distribution is symmetric, then the mean = mode = median
-
If the probability distribution is skewed, then the mean does not equal the mode and the mode does not equal the median
-
Example
-
Unordered: 10 32 5 6 7 5 4 5
-
Ordered: 4 5 5 5 6 7 10 32
-
The sum of the numbers is 74
-
Statistics
-
The mean is 74 / 8 = 9.25
-
The mode is 5
-
The median is (5 + 6)/2 = 5.5
-
Thus, the distribution is skewed
-
Standard Deviation – how spread out the distribution is
-
Uses all the data points
-
The s
2 is the variance
-
The hat means it is estimated
-
n – 1 is called the degrees of freedom
-
We are calculating (estimating) the variance, then we lose one piece of information
-
This is the sample variance
-
Population – all data that is included in your analysis
-
Maybe too costly, or too large, etc to collect population data
-
Sample – randomly select out of the population
-
The population variance is:
-
Notice – there is no hat; we have all data points and can calculate the population variance; it does not have to be estimated!
-
It is easy to calculate the sample variance from the population variance and vice versa
-
-
Usually rare to have the whole population data, so sample is always used
-
The population variance is written as:
-
Very easy to derive
-
The trick to the derivation
-
S is a linear operator
-
X bar and 2 are constant and can be distributed out
-
Calculate the variance for the sample
| Observations |
X i –
|
|
| 5 |
5 – 4.6 = 0.4 |
0.16 |
| 6 |
6 – 4.6 = 1.4 |
1.96 |
| 3 |
3 – 4.6 = -1.6 |
2.56 |
| 5 |
5 – 4.6 = 0.4 |
0.16 |
| 4 |
4 – 4.6 = -0.6 |
0.36 |
|
|
5.2 |
-
-
Variance has one problem. If data is in $’s, then units for variance is $ 2
-
Take the standard deviation (SD)
-
Standard deviation has the same units as the mean and data
|
Probability Distributions
|
|
-
Statistics has many probability distributions
-
At least 20 distributions are popular
-
The most common is the Normal or Gaussian Distribution
-
“Bell shaped curve”
-
The mean and standard deviation can completely describe this distribution
-
Normal distribution – as the sample size increases to infinity, many of the other distributions become normal
-
Confidence intervals
-
From the last example, =4.6 and s = 1.141
-
68% of the data lies between
-
[4.6 – 1.141(1), 4.6 + 1.141(1)] = [3.46, 5.74]
-
95% of the data lies between
-
[4.6 – 1.141(2), 4.6 + 1.141(2)] = [2.32 6.88]
-
99% of the data lies between
-
[4.6 – 1.141(3), 4.6 + 1.141(3)] = [1.18, 8.02]
|
Data Transformations
|
|
-
If you have a positively skewed distribution, then use a transformation to make distribution “more symmetric.”
-
An example of a positively skewed distribution
-
Use natural logarithm
-
This function flattens the distribution
| Data |
Natural logarithm |
|
| . |
. |
|
| 45 |
ln45 = 3.8066 |
|
| . |
. |
|
| 50 |
ln50 = 3.912 |
This is the mean |
| . |
. |
|
| 100 |
ln100 = 4.605 |
An outlier |
-
Note – the mean of the data and the mean of log of the data will not equal
-
ln and exp are inverses of each other
-
The natural logarithm of a negatively skewed distribution will not work
|
Measurement Errors
|
|
-
Measurement Errors – errors in measuring the data
-
Within subject (or intra subject) – if you take another measurement on the same person, you get a different measurement
-
We can measure this variability
-
Coefficient of Variability (CV) is
-
Use CV to check variability of our measurement on one person
-
Between subject (or inter subject) – measurement error on each subject in sample
-
Example
-
One person’s heart beat is 60 beats per second and CV = 3%
-
Another persons’ heart beat is 80 beats per second and CV = 10%
-
Each person’s heart is different
-
Each sample has intra and inter measurement errors
|