|
-
Populations and samples
-
Depends on data or survey
-
Example
-
Population – survey CEOs of the world’s top 500 corporations
-
Parameters
-
Mean, m
-
Standard deviation, s
-
Sample – population has too many individuals
-
Choose sample of population
-
Conditions
-
Every individual in a population has a known non-zero chance of being sampled
-
Equal chance for everyone
-
Has to be independent ; choosing one does not influence the choice for choosing another
-
Have to be careful when defining a population
-
Book – each member of population has a number
-
Use a random number table to randomly select individuals
-
Excel – the function is =rand( )
-
Distributed uniform (0, 1)
-
X ~UNIF(0, 1)
-
Select numbers between 0 and 1,000
-
=round(1000*rand(), 0)
-
The round function rounds a number to the integer
-
Each time you change something in Excel, Excel recalculates the random numbers
-
Use Copy and Past Special to freeze the random numbers and stop them from changing
-
Trick – Generate random numbers with any distribution
-
Example – generate normally distributed random numbers
-
Probability Density Function (PDF) – a function that associates each value of a discrete random variable with the probability that this value will occur.
-
Denoted as p(x) or f(x)
-
Cumulative Density Function (CDF) – integral of a probability function
-
Denoted by a capital letter, such as P(x) or F(x).
If you sum over all probabilities, then it has to equal one
-
A PDF and CDF is shown below
-
Use UNIF to get probability between 0 and 1
-
Find the inverse for P(X) using that random number
-
To randomly create a normally distributed variable with mean and standard deviation, then the Excel function is
-
=norminv(rand(), mean, standard deviation)
-
Example
-
Find the random numbers for the distribution, X i~N(10, 25)
-
The notation is X i~N( m, s
2)
-
The Excel function is = norminv(rand(), 10, 5)
-
Can use this method to find random numbers from any distribution
-
Stratified Random Sampling
-
You take a sample and then you divide a sample by gender (male or female)
-
Then you divide by age, creating the four categories
-
0 – 30 years
-
31 – 40 years
-
40 – 60 years
-
> 60 years
-
You have a total of eight compartments
-
You randomly select individuals and fill the compartments equally
-
Each compartment has 10 individuals
-
Unfortunately, males/females and age categories may not be distributed evenly
-
Unbiasedness – on average, the mean of a sample will equal its true parameter value
-
The notation is E( ) = m
-
E stands for expected value
-
Precise – the study is repeatable, if we took another sample, we get similar results
-
Nonrandom samples – makes our parameter estimates biased
-
Some people in the population will never be selected; they may be transient
-
Some people may not fill out the surveys
-
Some people may lie on surveys
-
Block Randomization
-
Use Table F and choose block size 2, 4, 6, 8, and 10
-
Example – testing effectiveness of a new drug
-
We have 8 patients, and choose block size 8
-
Four patients get the new drug, while four patients get the placebo
-
Our study has 8 patients who have a unique number between 1 and 8
-
Patients could be a biased sample; however, we are testing drug’s effectiveness
-
Then we have 8 patients who get the following treatments
| Treatment |
2 |
3 |
8 |
5 |
| Placebo |
1 |
4 |
6 |
7 |
-
Standard Error
-
Each time we take a sample, we get a different mean
-
Example
-
Sample 1: =29.3
-
Sample 2: =33.3
-
Sample 100: =27.7
-
We do not want to keep taking samples to find the variability in the mean
-
The standard error (SE) gives the variability in the mean for repeated sampling
-
The formula
-
As the sample size increases, the standard error decreases
-
With an infinite sample size, we know the true parameter for the mean
-
Binominal Distribution
-
We have two states,
-
P is probability that Event A happens
-
1 – P is probability that Event A does not happen
-
The states or events are mutually exclusive
-
We sampled 80 people and 43 went to college
-
The mean for people going to college (the event)
-
P = 43 / 80 = 0.5375
-
The probability for people who did not go to college
-
1 – P = (80 – 43) / 80 =1 – 0.5375 = 0.4625
-
The variance
-
The standard error is
-
It is possible to keep probability of events in percents.
|