Minji Kim Minji Kim Studying Statistics @ UNC Chapel Hill

STOR 155 Introduction to Data Models and Inference

» Teaching

From Deterministic to Probabilistic Scenarios

  • Modeling a problem that has a definite solution versus those that involve uncertainty and probability.
  • Question 1: Chris and Amy bought a total of 10 books. Chris bought 4 books. How many books did Amy buy?
    1. Set a variable x to formulate the problem.
    2. Solve for x.
  • Question 2: Minji has perfectly symmetrical dice. Roll the dice, and let X be the number on the top.
    1. What is your guess for X?
    2. Are you 100% certain about your guess in (1)?
    3. If we’re not 100% certain, does that mean we know nothing about X? What can we still say about X?
  • To describe a random variable, we can talk about possible outcomes and probabilities assigned to them.
  • What are some other examples of random variables?
    • Is it common to find randomness in data from everyday life?
    • The dice example used a discrete set of numbers. Think of a random variable that uses a continuous set of numbers.

What Do We Learn from STOR 155?

  • Four steps of scientific inquiry:
    1. Identify a question or problem of interest.
      Move up as a researcher!
    2. Collect relevant data.
      Methods for data collection: sampling strategies, observational studies, experiments, and ways to collect reliable data.
    3. Analyze the data.
      Calculating summary statistics, regression and correlation, hypothesis testing, confidence intervals.
    4. Form a conclusion.
      Given a confidence interval or the result of a hypothesis test, what can we say about our data?
  • Statistics as the language of Science!

Case Study: Using Stents to Prevent Strokes

  • Objective: Evaluate the effectiveness of stents in treating patients at risk of stroke.
  • Research Question: Does the use of stents reduce the risk of stroke?
  • Study Details: The researchers conducted an experiment with 451 at-risk patients. Patients randomly assigned 224 patients to the control group and 227 to the treatment group. The table below shows the distribution of patients who had a stroke at the 365-day follow-up.

    Group   Stroke    
        Yes No Total
    Treatment   28 199 227
    Control   45 179 224
    Total   73 378 451
  • Proportion with stroke in treatment group: approximately 12 %

  • Proportion with good outcomes in control group: approximately 20%

Understanding the Results

  • Do the data show a “real” difference between the groups?
    • Suppose we have a fair coin and flip it 100 times. Let X represent the number of heads observed.
    • What is the expected number of heads and tails? Do we actually observe that in reality?
    • While the chance a coin lands heads in any given flip is 50%, we probably won’t observe exactly 50 heads. This type of fluctuation is part of almost any type of data-generating process.

Generalizing the Results

  • Are the results of this study generalizable to all at-risk patients?
    • This set of patients could have specific characteristics, so it may not represent all stroke at-risk patients.
  • A Soup Example
    • Is an 80% non-random sample “better” than a 5% random sample in measurable terms? 90%? 95%? 99%?
    • Which should we trust more: a 1% survey with a 60% response rate or a non-probabilistic dataset covering 80% of the population?
    • Think about tasting soup and wanting to know how salty it is.
      • Stir it well, then a few bits are sufficient regardless of the size of the container!
      • Stirring corresponds to a randomization process in statistics.
    • This example is from a lecture by Meng: See this YouTube video.