This is large sample (n = 500) of the 12,000 + employees in the Dept. of Public Safety. This is publicly available employee data and we have placed a copy in your M8 Data folder.
File name DPS_500.xlsx [download
file]
There are a number of variables but some are noteworthy in that they can act as explanatory variables:
Possible Research Questions for Texas Employee data:
nc data set
nc_births.csv
A dataset on Delta Variant Covid-19 cases in the UK. This dataset gives a great example of Simpson’s Paradox. When aggregating results without regard to age group, the death rate for vaccinated individuals is higher – but they have a much higher risk population. Once we look at populations with more comparable risks (breakout age groups), we see that the vaccinated group tends to be lower risk in each risk-bucketed group and that many of the higher risk patients had gotten vaccinated.
Challenge Replicate this analysis using R chunks.
simpsons_paradox_covid.csv
Note: this is an interesting
dataset, but using it for your PDP will be challenging.
usdata packagecounty is a data frame with 3142 observations (the
counties) with 14 categorical and quantitative variables.
county_complete has all 188 variables [https://openintrostat.github.io/usdata/reference/county_complete.html]county and
county_complate are primarily quantitative and demographic
and cover topics such as race, education level, income, uninsured rates,
rate of computer use, smart phone use., broadband
openintro packageemail data frame 3921 observations on 21 variables;
email_sent has 1252 observations of the same 21
variablesopenintro packageloans_full_schema - full data set with 10,000
observations for 55 variables including categorical and quantitative.
homeownership,
application_type, loan_purpose.
state, emp_titleloan_amount,
interest_rate, balance,
total_credit_limit, annual_incomeloan50 data set -This is a sample for the larger loan
data set from Lending club above
An experiment is designed to study the effectiveness of stents in treating patients at risk of stroke (Chimowitz et al. 2011). Stents are small mesh tubes that are placed inside narrow or weak arteries to assist in patient recovery after cardiac events and reduce the risk of an additional heart attack or death. Many doctors have hoped that there would be similar benefits for patients at risk of stroke. We start by writing the principal question the researchers hope to answer: Does the use of stents reduce the risk of stroke?
The researchers who asked this question conducted an experiment with 451 at-risk patients. Each volunteer patient was randomly assigned to one of two groups:
stent30 and stent265 are found in the
openintro package • http://openintrostat.github.io/openintro/reference/stent30.html
• stent30 results after 30 days from stroke •
stent360 results after 360 days from stroke • A data frame
with 451 observations on the following 2 variables.
This is a list of the data set names with links to more info and to download the data set as a .csv data format.

This
work was created by Dawn Wright.
It is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
V2.0: 6/29/25 Last Compiled 2025-06-29