STOR 155 Introduction to Data Models and Inference
Aug 20, 2024
»
Teaching
From Deterministic to Probabilistic Scenarios
- Modeling a problem that has a definite solution versus those that involve uncertainty and probability.
- Question 1: Chris and Amy bought a total of 10 books. Chris bought 4 books. How many books did Amy buy?
- Set a variable
x
to formulate the problem. - Solve for
x
.
- Set a variable
- Question 2: Minji has perfectly symmetrical dice. Roll the dice, and let
X
be the number on the top.- What is your guess for
X
? - Are you 100% certain about your guess in (1)?
- If we’re not 100% certain, does that mean we know nothing about
X
? What can we still say aboutX
?
- What is your guess for
- To describe a random variable, we can talk about possible outcomes and probabilities assigned to them.
- What are some other examples of random variables?
- Is it common to find randomness in data from everyday life?
- The dice example used a discrete set of numbers. Think of a random variable that uses a continuous set of numbers.
What Do We Learn from STOR 155?
- Four steps of scientific inquiry:
- Identify a question or problem of interest.
Move up as a researcher! - Collect relevant data.
Methods for data collection: sampling strategies, observational studies, experiments, and ways to collect reliable data. - Analyze the data.
Calculating summary statistics, regression and correlation, hypothesis testing, confidence intervals. - Form a conclusion.
Given a confidence interval or the result of a hypothesis test, what can we say about our data?
- Identify a question or problem of interest.
- Statistics as the language of Science!
Case Study: Using Stents to Prevent Strokes
- Objective: Evaluate the effectiveness of stents in treating patients at risk of stroke.
- Research Question: Does the use of stents reduce the risk of stroke?
-
Study Details: The researchers conducted an experiment with 451 at-risk patients. Patients randomly assigned 224 patients to the control group and 227 to the treatment group. The table below shows the distribution of patients who had a stroke at the 365-day follow-up.
Group Stroke Yes No Total Treatment 28 199 227 Control 45 179 224 Total 73 378 451 -
Proportion with stroke in treatment group: approximately 12 %
- Proportion with good outcomes in control group: approximately 20%
Understanding the Results
- Do the data show a “real” difference between the groups?
- Suppose we have a fair coin and flip it 100 times. Let
X
represent the number of heads observed. - What is the expected number of heads and tails? Do we actually observe that in reality?
- While the chance a coin lands heads in any given flip is 50%, we probably won’t observe exactly 50 heads. This type of fluctuation is part of almost any type of data-generating process.
- Suppose we have a fair coin and flip it 100 times. Let
Generalizing the Results
- Are the results of this study generalizable to all at-risk patients?
- This set of patients could have specific characteristics, so it may not represent all stroke at-risk patients.
- A Soup Example
- Is an 80% non-random sample “better” than a 5% random sample in measurable terms? 90%? 95%? 99%?
- Which should we trust more: a 1% survey with a 60% response rate or a non-probabilistic dataset covering 80% of the population?
- Think about tasting soup and wanting to know how salty it is.
- Stir it well, then a few bits are sufficient regardless of the size of the container!
- Stirring corresponds to a randomization process in statistics.
- This example is from a lecture by Meng: See this YouTube video.