...

Chapter 4 Gathering data Learn …. How to gather “good” data

by user

on
Category: Documents
45

views

Report

Comments

Transcript

Chapter 4 Gathering data Learn …. How to gather “good” data
Chapter 4
Gathering data

Learn ….
How to gather “good” data
About Experiments and
Observational Studies
Agresti/Franklin Statistics, 1 of 56
Section 4.1
Should We Experiment or Should we
Merely Observe?
Agresti/Franklin Statistics, 2 of 56
Population, Sample and
Variables

Population: all the subjects of interest

Sample: subset of the population data is collected on the sample

Response variable: measures the
outcome of interest

Explanatory variable: the variable that
explains the response variable
Agresti/Franklin Statistics, 3 of 56
Types of Studies


Experiments
Observational Studies
Agresti/Franklin Statistics, 4 of 56
Experiment


A researcher conducts an experiment
by assigning subjects to certain
experimental conditions and then
observing outcomes on the response
variable
The experimental conditions, which
correspond to assigned values of the
explanatory variable, are called
treatments
Agresti/Franklin Statistics, 5 of 56
Observational Study

In an observational study, the
researcher observes values of the
response variable and explanatory
variables for the sampled subjects,
without anything being done to the
subjects (such as imposing a
treatment)
Agresti/Franklin Statistics, 6 of 56
Example: Does Drug Testing
Reduce Students’ Drug Use?

Headline: “Student Drug Testing Not
Effective in Reducing Drug Use”

Facts about the study:
•
•
•
76,000 students nationwide
Schools selected for the study included schools
that tested for drugs and schools that did not test
for drugs
Each student filled out a questionnaire asking
about his/her drug use
Agresti/Franklin Statistics, 7 of 56
Example: Does Drug Testing
Reduce Students’ Drug Use?
Agresti/Franklin Statistics, 8 of 56
Example: Does Drug Testing
Reduce Students’ Drug Use?

Conclusion: Drug use was similar in
schools that tested for drugs and
schools that did not test for drugs
Agresti/Franklin Statistics, 9 of 56
Example: Does Drug Testing
Reduce Students’ Drug Use?

What were the response and
explanatory variables?
Agresti/Franklin Statistics, 10 of 56
Example: Does Drug Testing
Reduce Students’ Drug Use?

Was this an observational study or
an experiment?
Agresti/Franklin Statistics, 11 of 56
Advantages of Experiments over
Observational Studies

We can study the effect of an
explanatory variable on a response
variable more accurately with an
experiment than with an
observational study

An experiment reduces the potential
for lurking variables to affect the
result
Agresti/Franklin Statistics, 12 of 56
Experiments vs Observational
Studies

When the goal of a study is to
establish cause and effect, an
experiment is needed

There are many situations (time
constraints, ethical issues,..) in which
an experiment is not practical
Agresti/Franklin Statistics, 13 of 56
Good Practices for Using
Data

Beware of anecdotal data

Rely on data collected in reputable
research studies
Agresti/Franklin Statistics, 14 of 56
Example of a Dataset

General Social Survey (GSS):
• Observational Data Base
• Tracks opinions and behaviors of the
•
•
•
American public
A good example of a sample survey
Gathers information by interviewing a
sample of subjects from the U.S. adult
population
Provides a snapshot of the population
Agresti/Franklin Statistics, 15 of 56
Section 4.2
What Are Good Ways and Poor
Ways to Sample?
Agresti/Franklin Statistics, 16 of 56
Setting Up a Sample Survey

Step 1: Identify the Population

Step 2: Compile a list of subjects in the
population from which the sample will be
taken. This is called the sampling frame.

Step 3: Specify a method for selecting
subjects from the sampling frame. This is
called the sampling design.
Agresti/Franklin Statistics, 17 of 56
Random Sampling

Best way of obtaining a
representative sample

The sampling frame should give each
subject an equal chance of being
selected to be in the sample
Agresti/Franklin Statistics, 18 of 56
Simple Random Sampling

A simple random sample of ‘n’
subjects from a population is one in
which each possible sample of that
size has the same chance of being
selected
Agresti/Franklin Statistics, 19 of 56
Example: Sampling Club Officers
for a New Orleans Trip


The five offices: President, VicePresident, Secretary, Treasurer and
Activity Coordinator
The possible samples are:
(P,V) (P,S) (P,T) (P,A) (V,S)
(V,T) (V,A) (S,T) (S,A) (T,A)
Agresti/Franklin Statistics, 20 of 56
The possible samples are:
(P,V) (P,S) (P,T) (P,A) (V,S)
(V,T) (V,A) (S,T) (S,A) (T,A)
What are the chances the President and Activity
Coordinator are selected?
a.
1 in 5
b.
1 in 10
c.
1 in 2
Agresti/Franklin Statistics, 21 of 56
Selecting a Simple Random
Sample

Use a Random Number Table

Use a Random Number Generator
Agresti/Franklin Statistics, 22 of 56
Methods of Collecting Data in
Sample Surveys

Personal Interview

Telephone Interview

Self-administered Questionnaire
Agresti/Franklin Statistics, 23 of 56
How Accurate Are Results from
Surveys with Random Sampling?

Sample surveys are commonly used
to estimate population percentages

These estimates include a margin of
error
Agresti/Franklin Statistics, 24 of 56
Example: Margin of Error

A survey result states: “The margin of error is
plus or minus 3 percentage points”

This means: “It is very likely that the reported
sample percentage is no more than 3% lower
or 3% higher than the population percentage”

Margin of error is approximately:
1
 100%
n
Agresti/Franklin Statistics, 25 of 56
Be Wary of Sources of Potential
Bias in Sample Surveys

A variety of problems can cause
responses from a sample to tend to
favor some parts of the population
over others
Agresti/Franklin Statistics, 26 of 56
Types of Bias in Sample Surveys

Sampling Bias: occurs from using nonrandom
samples or having undercoverage

Nonresponse bias: occurs when some
sampled subjects cannot be reached or refuse
to participate or fail to answer some questions

Response bias: occurs when the subject gives
an incorrect response or the question is
misleading
Agresti/Franklin Statistics, 27 of 56
Poor Ways to Sample

Convenience Sample: a sample that
is easy to obtain
• Unlikely to be representative of the
•
population
Severe biases my result due to time and
location of the interview and judgment of
the interviewer about whom to interview
Agresti/Franklin Statistics, 28 of 56
Poor Ways to Sample

Volunteer Sample: most common
form of convenience sample
• Subjects volunteer for the sample
• Volunteers are not representative of the
entire population
Agresti/Franklin Statistics, 29 of 56
Warning:
A Large Sample Does Not
Guarantee An Unbiased
Sample
Agresti/Franklin Statistics, 30 of 56
Section 4.3
What Are Good Ways and Poor Ways
to Experiment?
Agresti/Franklin Statistics, 31 of 56
An Experiment

Assign each subject (called an
experimental unit ) to an experimental
condition, called a treatment

Observe the outcome on the response
variable

Investigate the association – how the
treatment affects the response
Agresti/Franklin Statistics, 32 of 56
Elements of a Good
Experiment

Primary treatment of interest

Secondary treatment for comparison

Comparing the primary treatment results to
the secondary treatment results help to
analyze the effectiveness of the primary
treatment
Agresti/Franklin Statistics, 33 of 56
Control Group

Subjects assigned to the secondary
treatment are called the control group

The secondary treatment could be a
placebo or it could be an actual
treatment
Agresti/Franklin Statistics, 34 of 56
Randomization in an
Experiment

It is important to randomly assign subjects to
the primary treatment and to the secondary
(control) treatment

Goals of randomization:
•
•
•
Prevent bias
Balance the groups on variables that you know affect
the response
Balance the groups on lurking variables that may be
unknown to you
Agresti/Franklin Statistics, 35 of 56
Blinding the Study

Subjects should not know which
group they have been assigned to –
the primary treatment group or the
control group

Data collectors and experimenters
should also be blind to treatment
information
Agresti/Franklin Statistics, 36 of 56
Example: A Study to Assess
Antidepressants for Quitting Smoking

Design:
• 429 men and women
• Subjects had smoked 15 cigarettes or
•
more per day for the previous year
Subjects were highly motivated to quit
Agresti/Franklin Statistics, 37 of 56
Example: A Study to Assess
Antidepressants for Quitting Smoking

Subjects were randomly assigned to
one of two groups:
• One group took an antidepressant daily
• Second group did not take the
antidepressant (this group is called the
placebo group)
Agresti/Franklin Statistics, 38 of 56
Example: A Study to Assess
Antidepressants for Quitting Smoking

The study ran for one year

At the end of the year, the study
observed whether each subject had
successfully abstained from smoking
or had relapsed
Agresti/Franklin Statistics, 39 of 56
Example: A Study to Assess
Antidepressants for Quitting Smoking

Results after 1 year:
• Treatment Group: 55.1% were not smoking
• Placebo Group: 42.3% were not smoking

Results after 18 months:
• Antidepressant Group: 47.7% not smoking
• Placebo Group: 37.7% not smoking

Results after 2 years:
• Antidepressant Group: 41.6% not smoking
• Placebo Group: 40% not smoking
Agresti/Franklin Statistics, 40 of 56
Example: A Study to Assess
Antidepressants for Quitting Smoking

Question to Think About: Are the
differences between the two groups
statistically significant or are these
differences due to ordinary variation?
Agresti/Franklin Statistics, 41 of 56
Section 4.4
What Are Other Ways to Conduct
Experimental and Observational
Studies?
Agresti/Franklin Statistics, 42 of 56
Multifactor Experiments

Multifactor Experiments: have more
than one categorical explanatory
variable (called a factor).
Agresti/Franklin Statistics, 43 of 56
Example: Do Antidepressants and/or
Nicotine Patches Help Smokers Quit?
Agresti/Franklin Statistics, 44 of 56
Matched-Pairs Design

Each subject serves as a block

Both treatments are observed for
each subject
Agresti/Franklin Statistics, 45 of 56
Example: A Study to Compare an Oral
Drug with a Placebo for Treating Migraine
Headaches
Subject Drug
Placebo
1
Relief
No Relief
2
Relief
Relief
3
No Relief No Relief
Agresti/Franklin Statistics, 46 of 56
First matched pair
Blocks and Block Designs

Block: collection of experimental
units that have the same (or similar)
values on a key variable

Block Design: identifies blocks
before the start of the experiment and
assigns subjects to treatments with in
those blocks
Agresti/Franklin Statistics, 47 of 56
Experiments vs Observational
Studies



An Experiment can measure cause and
effect
An observational study can yield useful
information when an experiment is not
practical
An observational study is a practical
way of answering questions that do not
involve trying to establish causality
Agresti/Franklin Statistics, 48 of 56
Observational Studies

A well-designed and informative
observational study can give the
researcher very useful data.

Sample surveys that select subjects
randomly are good examples of
observational studies.
Agresti/Franklin Statistics, 49 of 56
Random Sampling Schemes

Simple Random Sample: every possible
sample has the same chance of
selection
Agresti/Franklin Statistics, 50 of 56
Random Sampling Schemes

Cluster Random Sample:
• Divide the population into a large number
•
•
of clusters
Select a sample random sample of the
clusters
Use the subjects in those clusters as the
sample
Agresti/Franklin Statistics, 51 of 56
Random Sampling Schemes

Stratified Random Sample:
• Divide the population into separate groups,
•
called strata
Select a simple random sample from each
strata
Agresti/Franklin Statistics, 52 of 56
Observational Studies

Well-designed observational studies
use random sampling schemes
Agresti/Franklin Statistics, 53 of 56
Retrospective and Prospective
Studies

Retrospective study: looks into the
past

Prospective study: follows its
subjects into the future
Agresti/Franklin Statistics, 54 of 56
Case-Control Study

A case-control study is an
observational study in which subjects
who have a response outcome of
interest (the cases) and subjects who
have the other response outcome (the
controls) are compared on an
explanatory variable
Agresti/Franklin Statistics, 55 of 56
Example: Case-Control Study

Response outcome of interest: Lung
cancer
• The cases have lung cancer
• The controls did not have lung cancer

The two groups were compared on the
explanatory variable:
• Whether the subject had been a smoker
Agresti/Franklin Statistics, 56 of 56
Fly UP