Probability and Statistics for Engineers

Project #1: Probability Distributions for a Real-World Populations

This project is the primary assessment mechanism in IE342 for the Probability portion of the course. It is

the culmination of our study of Probability Theory over Modules 2-9, which included the following

topics:

• sample spaces,

• events,

• additive rules of probability,

• conditional probability,

• Bayes Theorem,

• random variables,

• general properties of probability distributions,

• expected value and other mathematical expectation definitions,

• joint probability distributions,

• covariance,

• correlation,

• linear combinations of random variables,

• Chebyshev’s Theorem,

• parameterized discrete probability distributions (specifically the binomial distribution),

• the normal distribution, and

• parameterized asymmetric continuous probability distributions.

After a half-semester focused on these topics, you are now tasked with applying the concepts of

Probability Theory to real-world populations.

I. Define Random Variables

Project #1 Description

Pick a (or multiple) real-world system(s) or scenario(s) where a population can be modeled with four

random variables that have the following characteristics:

• : a normal random variable that has a normal distribution

• : a continuous random variable with an asymmetric distribution (i.e. exponential or lognormal)

• : a random variable with a uniform distribution (could be discrete or continuous)

• : a binomial random variable with large enough such that it can be accurately modeled using

a normal approximation

Keep in mind that this is an exercise in modeling, and no model is perfect. Thus, you will need to

make assumptions about the nature of the population and how you are quantifying observations to

define the random variables. This project does NOT require any real data associated with the

population of choice. Instead, assign distribution types and parameters based on your understanding of

the system. Data may be used to justify parameter choices, but it is not required. Discuss how

probabilistic modeling is useful for the system of choice.

Probability and Statistics for Engineers

Project #1: Probability Distributions for a Real-World Populations

Here are example definitions related to winter weather in Chicago (DO NOT USE THESE

DEFINITIONS FOR YOUR PROJECT, THE BLUE FONT IS USED FOR EXAMPLES):

• : daily high temperature during winter in Chicago

• : total daily snowfall, in inches, during winter in Chicago

• : the day, indexed from the start of winter, with the largest snowfall amount

• : number of winter days with nonzero snowfall

Within these sample random variable definitions, modeling with a uniform distribution likely has

some inconsistencies with reality, in that there are systematic date preferences for higher/lower

snowfall amounts. However, it is okay to assume that a uniform distribution model captures the overall

tendencies well.

Some other ideas for real-world systems that would work nicely for the probabilistic modeling for this

project include:

• COVID-19 – number of confirmed cases, hospitalizations, deaths, etc. across different states,

countries, or worldwide and potentially through time; efficacy of COVID tests, antibody tests, or

vaccines; adherence rates for public health guidance, etc.

• Federal, State, and/or Local Elections – voting rates and preferences for various demographic

groups or through time,

• Public Health, Poverty, and Social Justice – use U.S. Census of WHO data to study matters

related to housing, economic conditions, public health, racial inequalities, etc.

• NASA Exoplanet Exploration – exoplanet radius, star radius, distance to its star, equilibrium

temperature, orbit period, habitability for life, number of exoplanets per stellar system, etc.

• Sports – world record marathon times, time between goals in soccer, most common score in

basketball, etc.

• Etc. – find something that you are passionate about; there is a great deal of flexibility here, so

you should be able to make it work with just about any real-world system or scenario of interest

to you.

II. Set Population Parameters

With the four random variables set, choose realistic values for the population parameters. Provide

references and justification for all parameter choices.

For the example random variables above, a quick internet search leads to some reasonable choices as

• = 35℉ (https://www.currentresults.com/Weather/Illinois/Places/chicago-temperatures-by-month-average.php),

• = 10℉ (a reasonable choice based on my experience living in Chicagoland),

• = 0.12 (https://www.currentresults.com/Weather/Illinois/Places/chicago-snowfall-totals-snow-accumulation-averages.php),

• = 91 (# winter days = 365/4)

• etc.

III.Produce Distribution Plots

Define the probability distributions and generate distribution plots for all four random variables. The

plot for should include both the binomial distribution AND the normal curve for the approximation.

It is strongly recommended that you use MATLAB for this task, using the tools developed on the HW

assignments.

Probability and Statistics for Engineers

Project #1: Probability Distributions for a Real-World Populations

IV. Define a Joint Probability Distribution

Choose two of the random variables from the list of four to produce a joint probability distribution.

Consider the dependency of the two random variables. If the two random variables can be reasonably

assumed to be independent, then provide justification for the assumption, and quickly get to a joint

probability distribution (this is the straightforward approach). Otherwise, build a joint distribution that

reasonably models the joint nature of the two random variables (this is more complicated, but is more

likely to produce more accurate models). Again, provide justification for the distribution form applied.

V. Calculate Meaningful Probabilities and Expected Values

Use the probability distribution definitions to calculate at least two meaningful probabilities for each

random variable AND two meaningful joint probabilities. Thus, at least 10 probability calculations are

required. Use your plots to help visualize the probability values (i.e. show the areas under the curves).

For example, using the Chicago weather random variable definitions from above, it would be

interesting to know the following:

• (26 < < 35) as that relates to the likelihood of having icy road conditions

• ( > 3) as at least 3 inches of snow are needed for sledding

• ( < 25, > 5) as that gives likelihood of a cold, snowy day where the snow may stay awhile

• ( > 20) as 20 snow days was regular in the 1980s, and winter 2021 had that many

Additionally, calculate and report the mean and variance for each random variable.

Lastly, define a fifth random variable as a meaningful linear combination of two (or more) of the

original random variables. A convenient choice would be the two random variables from part IV.

Then, calculate and report the mean and variance of the new random variable.

For example, using the Chicago weather random variable definitions, define = 32 − + 5 as a

measure of the intensity of winter weather, where the colder and snowier days produce larger values of

U, with an inch of snowfall having equal impact to a 5 degree Fahrenheit drop in temperature.

VI. Discussion of Results and Conclusions

Summarize your calculations and discuss the overall validity of the models. Do the probabilities and

expected values match what would be expected for the real-world system? Where do the models work

well and where is their accuracy limited, for each random variable? Are the potential issues related to

the parameter values, the assumed form of the distribution, or both? Suggest potential improvements

to the models. Lastly, discuss how to design a statistical experiment that could be implemented to test

the population parameters. How can you ensure a random sample?

Probability and Statistics for Engineers

Project #1: Probability Distributions for a Real-World Populations

Deliverables

I. Report (submitted as a group if working with collaborators): written presentation of your

work, with supporting figures. The report should include six sections, associated with I-VI above,

with full descriptions for each distribution definition, plot, and calculated values. Use an easy-to follow format with proper grammar. Make sure figures are properly labeled, captioned, and

referenced in the text. Practice conciseness by finding an optimal balance between rigorousness

and brevity. Submit your MATLAB (or other visualization tool) code with your submission.

II. Video Highlight (submitted individually): 2-3 minute video presentation of ONE probabilistic

model and ONE associated probability calculation. Approach this as a summary highlight of ONE

aspect of your work presented to your boss or a client in a limited timeframe. Your highlight

should include the following:

• Introduce the real-world system being studying in your overall work and how

probabilistic modeling is useful for that system

• Define the ONE random variable you have chosen to highlight and justify the distribution

type chosen

• Present the distribution using a visualization (i.e. a distribution plot) and relevant

parameters for the chosen random variable, with justification for each.

• Explain all steps of the ONE probability calculation you have chosen to highlight.

• Discuss the significance of the probability value calculated and the relevance of the

probabilistic model chosen for the random variable and the real-world system.

Prepare a short (1-3 slides) PowerPoint presentation to organize the summary highlight discussion.

Then, record yourself presenting the slides as an individual. Record the video using the Panopto,

Zoom, or any other method used for Quizzes all semester. If working with a partner or group on

this project, all members must choose a different random variable to highlight, so some

coordination is required here. Submit both the presentation slides and a link to the video recording.

Probability and Statistics for Engineers

Project #1: Probability Distributions for a Real-World Populations

Project #1 GRADING RUBRIC

All

Criteria

are

graded

using this

scale:

Excellent

[5 points]

Criteria is fully

satisfied and/or

demonstratestop level critical

thinking

Very Good

[4 points]

Criteria is mostly

satisfied and/or

demonstrates a

high-level of

critical thinking

Acceptable

[3 points]

Criteria is partially

satisfied and/or

demonstratessome

critical thinking

Minimal

[2 points]

Criteria is not

satisfied and/or

demonstrates a

low-level of critical

thinking

Missing

[0 points]

Criteria is not addressed

or is missing

completely

Criteria Score

1. Report – Define RVs: introduction to real-world system or scenario, four RVs clearly defined

2. Report – Population Parameters: realistic values, justification and/or references provided for each

3. Report – Distribution Plots: mathematical definitions for all RV distributions are presented

4. Report – Distribution Plots: visualize all RV distributions, corresponding descriptions highlight key features

5. Report – Joint Distribution: independence discussed with full justification

6. Report – Joint Distribution: mathematical definition for joint distribution with derivation presented

7. Report – Calculated Values: two probabilities for each RV, two joint probabilities, with discussions

8. Report – Calculated Values: means and variances for each RV, with discussions

9. Report – Calculated Values: linear combination of RVs defined, mean and variance reported and discussed

10. Report – Conclusions: validity of models, where they fall short, potential improvements

11. Report – Conclusions: discussion of statistical experiment design to ensure random sampling

12. Report – Organization & Format: easy to follow, six sections, proper grammar, neat style, labeled plots, etc.

13. Report – Submission: report uses clearly readable file format; visualization code (e.g. MATLAB) is included*

14. Video Highlight – System Introduction: benefits of probabilistic modeling for real-world system discussed

15. Video Highlight – Random Variable Definition: clear definition; distribution type justified

16. Video Highlight – Distribution & Parameters: distribution plot shown and discussed; parameters justified

17. Video Highlight – Probability Calculation: all steps justified and explained clearly for ONE probability value

18. Video Highlight – Significance: probability value and probabilistic model relevance for real-world system

19. Video Highlight – Organization & Format: easy to follow slides and verbal explanations, within 2-3 minutes

20. Video Highlight – Submission: accessible video link and/or valid video format; slides included

TOTAL (out of 100):