This is large sample (n = 500) of the 12,000 + employees in the Dept. of Public Safety. This is publicly available employee data.
File name DPS_500.xlsx
There are a number of variables but some are noteworthy in that they can act as explanatory variables:
Possible Research Questions for Texas Employee data:
nc
data set
nc_births.csv
A dataset on Delta Variant Covid-19 cases in the UK. This dataset gives a great example of Simpson’s Paradox. When aggregating results without regard to age group, the death rate for vaccinated individuals is higher – but they have a much higher risk population. Once we look at populations with more comparable risks (breakout age groups), we see that the vaccinated group tends to be lower risk in each risk-bucketed group and that many of the higher risk patients had gotten vaccinated.
Challenge Replicate this analysis using R chunks.
simpsons_paradox_covid.csv
usdata
packagecounty
is a data frame with 3142 observations (the counties) with 14 categorical and quantitative variables. county_complete
has all 188 variables [https://openintrostat.github.io/usdata/reference/county_complete.html]county
and county_complate
are primarily quantitative and demographic and cover topics such as race, education level, income, uninsured rates, rate of computer use, smart phone use., broadbandopenintro
packageemail
data frame 3921 observations on 21 variables; email_sent
has 1252 observations of the same 21 variablesopenintro
packageloans_full_schema
- full data set with 10,000 observations for 55 variables including categorical and quantitative.
homeownership
, application_type
, loan_purpose
. state
, emp_title
loan_amount
, interest_rate
, balance
, total_credit_limit
, annual_income
loan50
data set -This is a sample for the larger loan data set from Lending club above
An experiment is designed to study the effectiveness of stents in treating patients at risk of stroke (Chimowitz et al. 2011). Stents are small mesh tubes that are placed inside narrow or weak arteries to assist in patient recovery after cardiac events and reduce the risk of an additional heart attack or death. Many doctors have hoped that there would be similar benefits for patients at risk of stroke. We start by writing the principal question the researchers hope to answer: Does the use of stents reduce the risk of stroke?
The researchers who asked this question conducted an experiment with 451 at-risk patients. Each volunteer patient was randomly assigned to one of two groups:
stent30
and stent265
are found in the openintro
package • http://openintrostat.github.io/openintro/reference/stent30.html • stent30
results after 30 days from stroke • stent360
results after 360 days from stroke • A data frame with 451 observations on the following 2 variables.
This is a list of the data set names with links to more info and to download the data set as a .csv data format.
This work was created by Dawn Wright.
It is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Last Compiled 2022-01-01