Title: | Data to Accompany Smithson & Merkle, 2013 |
---|---|
Description: | Contains data files to accompany Smithson & Merkle (2013), Generalized Linear Models for Categorical and Continuous Limited Dependent Variables. |
Authors: | Ed Merkle and Michael Smithson |
Maintainer: | Ed Merkle <[email protected]> |
License: | GPL-2 |
Version: | 1.2 |
Built: | 2025-02-14 03:40:00 UTC |
Source: | https://github.com/cran/smdata |
Gaze patterns of four babies in a group.
data("babies")
data("babies")
A data frame with 1180 observations on the following 6 variables.
row
a numeric vector
time
a numeric vector indexing the target baby
id
a numeric vector indexing the observations
gaze
a factor indicating whether a baby was looked at, with levels no
yes
babies
a factor indexing which baby was chosen to be looked at with levels baby1
baby2
baby3
baby4
lookedat
a numeric vector registering whether gaze was initiated by the target baby, with levels 0
indicating “no” and 1
indicating “yes”
These are hypothetical data.
data("babies", package="smdata")
data("babies", package="smdata")
Replication of the car salesperson problem in See, Fox, and Rottenstreich (2006)
data("carsales")
data("carsales")
A data frame with 155 observations on the following 4 variables.
initial
a numeric vector taking the value 0
for the Car condition and 1
for the Salesperson condition
prob
a numeric vector recording the respondent's probability estimate that the car was purchased from Carlos
NFCC
a numeric vector recording respondents' scores on the Need for Certainty and Closure scale
ctrNFCC
a numeric vector that is NFCC standardized to have a mean of 0 and standard deviation of 1
Data provided by Gurr, M. (2009).
Gurr, M. (2009). Partition dependence: Investigating the principle of insufficient reason, uncertainty and dispositional predictors. (Unpublished Honours thesis: The Australian National University, Canberra, Australia)
See, K. E., Fox, C. R., & Rottenstreich, Y. S. (2006). Between ignorance and truth: Partition dependence and learning in judgment under uncertainty. Journal of Experimental Psychology, 32, 1385-1402.
data("carsales", package="smdata")
data("carsales", package="smdata")
Data from the 1991-1994 Drug Abuse Treatment Outcome Study on cocaine usage patterns.
data("cocaine")
data("cocaine")
A data frame with 7592 observations on the following 2 variables.
sex
a factor with levels female
male
mode
a factor recording self-reported method of cocaine ingestion with levels crack
freebase
inhale
inject
The data are extracted from the 1991-1994 Drug Abuse Treatment Outcome Study (2010) data (DATOS).
United States Department of Health and Human Services and National Institute of Health and National Institute on Drug Abuse (2010). Drug Abuse Treatment Outcome Study, 1991-1994. Ann Arbor, MI: Inter-university Consortium for Political and Social Research.
data("cocaine", package="smdata")
data("cocaine", package="smdata")
Data from the 1991-1994 Drug Abuse Treatment Outcome Study on cocaine usage patterns.
data("cocaineplus")
data("cocaineplus")
A data frame with 7592 observations on the following 8 variables.
sexsrt
a factor with levels FEMALE
MALE
age
a numeric vector
mstatstr
a factor with levels BLANK
DIVORCED
LIVINGASMARRIED
MARRIED
NEVERMARRIED
SEPARATED
WIDOWED
modestr
a factor with levels crack
freebase
inhale
inject
racestr
a factor with levels AfroAmerican
Caucasian
Hispanic
Other
sex
a numeric vector that takes the value 1
if male and 0
if female
mode
a numeric vector that takes the value 1
if cocaine usage method is crack, 2
if method is freebase, 3
if method is inhale, and 4
if method is inject
race
a numeric vector that takes the value 1
if AfroAmerican, 2
if Caucasian, 3
if Hispanic, and 4
if Other
The data were extracted from the 1991-1994 Drug Abuse Treatment Outcome Study (2010) data (DATOS).
United States Department of Health and Human Services and National Institute of Health and National Institute on Drug Abuse (2010). Drug Abuse Treatment Outcome Study, 1991-1994. Ann Arbor, MI: Inter-university Consortium for Political and Social Research.
data("cocaineplus", package="smdata")
data("cocaineplus", package="smdata")
Depression, Anxieity, and Stress Scale Data.
data("dass")
data("dass")
A data frame with 166 observations on the following 3 variables.
depress
a numeric vector measuring depression, scored from 0 to 20
anxiety
a numeric vector measuring anxiety, scored from 0 to 20
stress
a numeric vector measuring stress, scored from 0 to 20
Data from a pilot study by Michael Smithson.
Lovibond, P. F., & Lovibond, S. H. (1995). The structure of negative emotional states: Comparison of the Depression Anxiety Stress Scales with the Beck Depression and Anxiety Inventories. Behavior Research and Therapy, 33, 335-343.
data("dass", package="smdata")
data("dass", package="smdata")
Reading scores and nonverbal IQ scores for gender- and age-matched dyslexic and non-dyslexic readers.
data("dyslexic3")
data("dyslexic3")
A data frame with 44 observations on the following 3 variables.
score
a numeric vector recording childrens' scores on a reading accuracy test
dys
a numeric vector taking the value 1
if dyslexic and 0
if not
ziq
a numeric vector recording childrens' nonverbal IQ scores, standardized to have a mean of 0 and standard deviation of 1
The reading accuracy scores have a maximum score of 1, indicating a perfect score on the test. In the Example 6.2 analysis, these are recoded to .99; whereas in the 1's inflated model in Ch. 6 and the censored regression model in Ch. 7 they have a value of 1.
Data provided from Pammer and Kevan (2007), first analyzed in Smithson and Verkuilen (2006).
Pammer, K., & Kevan, A. (2007). The contribution of visual sensitivity, phonological processing, and nonverbal IQ to childrens reading. Scientific Studies in Reading, 11, 33-53.
Smithson, M. J., & Verkuilen, J. (2006). A better lemon squeezer? maximum likelihood regression with beta-distributed dependent variables. Psychological Methods, 11, 54-71.
data("dyslexic3", package="smdata")
data("dyslexic3", package="smdata")
Data from the U.S. General Social Surveys on marital status (ordinal; see details) and email usage.
data("email")
data("email")
A data frame with 3967 observations on the following 3 variables.
marital
Marital status, an ordered factor with levels never.married
< married
< divorced
.
email.hrs
Reported weekly hours spent emailing.
z.email
Standardized version of email.hrs
.
In creation of this dataset, an additional GSS item (DIVORCE) was used to ensure that married people in the sample had not been previously divorced or widowed. Thus, the marital status variable in this dataset is truly ordinal, as individuals can only progress through the statuses in one order.
The Survey Documentation and Analysis system hosted at UC, Berkeley: http://sda.berkeley.edu/GSS/.
Smith, T. W., Marsden, P. V., Hout, M., & Kim, J. (2011). General Social Surveys, 1972 - 2010. Principal Investigator, Tom W. Smith; Co-Principal-Investigators, Peter V. Marsden and Michael Hout, NORC ed. Chicago: National Opinion Research Center, producer, 2005; Storrs, CT: The Roper Center for Public Opinion Research, University of Connecticut, distributor. 1 data file (55,087 logical records) and 1 codebook (3,610 pp).
data("email", package="smdata")
data("email", package="smdata")
Euthanasia scale and Christian identification scale data.
data("euthan")
data("euthan")
A data frame with 351 observations on the following 3 variables.
mident
a numeric vector measuring the degree to which respondents identify themselves as Christian, on a scale from 0 to 1
teuth
a numeric vector measuring the degree to which respondents favor euthanasia, on a scale from 0 to 1
status
a numeric vector taking the value 0
if the observation is censored and 1
if not
Data obtained from Mavor's (2004) study.
Mavor, K. (2004). Religious orientation, social identity and attitudes to homosexuality. Unpublished doctoral dissertation, School of Psychology, The Australian National University, Canberra, A.C.T., Australia.
data("euthan", package="smdata")
data("euthan", package="smdata")
Grades achieved by second-year psychology students at The Australian National University in an introductory research methods course and the percentage marks they received in the laboratory component of that course.
data("exam")
data("exam")
A data frame with 154 observations on the following 3 variables.
Labs
a numeric vector recording the percentage mark for the laboratory component of the course
Final
a numeric vector recording the percentage mark for the final exam
cens
a numeric vector taking the value 100
to indicate the value of censored observations
Data obtained from Michael Smithson.
data("exam", package="smdata")
data("exam", package="smdata")
Choice and confidence data from a study of financial knowledge involving U.S. undergraduates.
data("finance")
data("finance")
A data frame with 4230 observations on the following 11 variables.
sub
Participant number.
jmeth
Experimental condition, with levels 1cd
2ci
3ei
(see details).
item
Item number.
easyfoil
Equals 1 if the foil (incorrect alternative) was easy, 0 if the foil was hard (see details).
targtop
Equals 1 if the correct alternative was the first one displayed (on top), 0 otherwise.
cho
Participant's choice (equals one for the first alternative, 0 for the second alternative).
corr
Participant's accuracy (essentially targtop==cho
).
iproba
For conditions 2ci
and 3ei
, the
participant's confidence in the first alternative.
iprobb
For conditions 2ci
and 3ei
, the
participant's confidence in the second alternative.
probc
The participant's confidence in his/her choice (see details).
nchorev
The number of choice revisions that the participant made.
The data come from Study 2 of Sieck, Merkle, and Van Zandt (2007). Experimental participants completed a 30-item, 2-alternative test of financial knowledge. For each item, the participant first chose an alternative and then made a confidence judgment.
The confidence
elicitation method varied across three between-subjects conditions.
For condition 1cd
, participants reported confidence in their
chosen alternative on a scale from 50% to 100%. For conditions 2ci
and 3ei
, participants reported independent confidence judgments
for each alternative on scales from 0% to 100%. These independent
confidence judgments are contained in iproba
and iprobb
.
In these conditions, probc
is obtained by normalizing confidence
in the chosen alternative by the sum of independent judgments.
In addition to reporting independent confidence judgments in condition
3ei
,
participants wrote an explanation in response to the
question "Why is this option true?" prior to reporting each confidence
judgment.
For each item, the incorrect alternative was manipulated to sometimes be
easy (easyfoil==1
) and sometimes be difficult
(easyfoil==0
). Foil difficulty was defined by the accuracy of an
independent group of students on a four-alternative version of the
financial knowledge test; see Sieck et al. for more detail.
Provided by Ed Merkle.
Sieck, W.R., Merkle, E.C., & Van Zandt, T. (2007). Option fixation: A cognitive contributor to overconfidence. Organizational Behavior and Human Decision Processes, 103, 68-83.
data("finance", package="smdata")
data("finance", package="smdata")
Summary eyetracking data from a study examining the impact of text saliency on eye movements.
data("fixations")
data("fixations")
A data frame with 48 observations on the following 6 variables.
id
Participant ID label.
condition
Condition, signifying whether a channel had a red title (see details).
countleft
Count of fixations in the middle, left channel.
countright
Count of fixations in the middle, right channel.
gazetime
Total gaze time on the webpage.
rt.cond
Equals red
if the middle, right channel
title was red; black
otherwise.
The data are taken from Owens, Shrestha, & Chaparro (2009). A webpage was divided into 9 channels (sections), and the title color of the "middle, left" and "middle, right" channels were manipulated.
The variable condition
takes the value Control
if all
title colors were black; Left
if the "middle, left" channel title
was red; and Right
if the "middle, right" channel title was red.
Provided by Justin W. Owens.
Owens, J.W., Shrestha, S., & Chaparro, B.S. (2009). Effects of text saliency on eye movements while browsing a web portal. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting (Vol. 53, pp. 1257-1261).
data("fixations", package="smdata")
data("fixations", package="smdata")
Lab percentage mark, letter grade, lower and upper grade thresholds, a censored variable value, and the final percentage course mark.
data("grades")
data("grades")
A data frame with 165 observations on the following 6 variables.
lab
a numeric vector recording the percentage mark for the laboratory component of the course
gradecat
a factor denoting the letter grade for the course, with levels CR
D
HD
N
P
lower
a numeric vector denoting the lower threshold for the corresponding letter grade
upper
a numeric vector denoting the upper threshold for the corresponding letter grade
cens
a numeric vector listing the censoring value of a mark, 3
finmark
a numeric vector recording the final percentage mark for the course
Data obtained from Michael Smithson.
data("grades", package="smdata")
data("grades", package="smdata")
Judged probabilities of guilt in a criminal trial scenario (Study 1).
data("guilt1")
data("guilt1")
A data frame with 104 observations on the following 7 variables.
observ
a numeric vector indexing cases
crguilt
a numeric vector recording the judged probability of guilt in a criminal trial scenario
cigult
a numeric vector recording the judged probability of guilt in a civil trial scenario
crvd1
a numeric vector taking the value 1
if the respondent returned a “guilty” verdict in the criminal trial and 0
otherwise
crvd2
a numeric vector taking the value 1
if the respondent returned a “not guilty” verdict in the criminal trial and 0
otherwise
civd1
a numeric vector taking the value 1
if the respondent returned a “guilty” verdict in the civil trial and 0
otherwise
civd2
a numeric vectortaking the value 1
if the respondent returned a “not guilty” verdict in the civil trial and 0
otherwise
Data provided from Study 1 of Smithson, Deady and Gracik (2007).
Smithson,M., Gracik, L., & Deady, S. (2007). Guilty, not guilty, or ?multiple verdict options in jury verdict choices. Journal of Behavioral Decision Making, 20, 481-498.
data("guilt1", package="smdata")
data("guilt1", package="smdata")
Judged probabilities of guilt in a criminal trial scenario (Study 3).
data("guilt3")
data("guilt3")
A data frame with 96 observations on the following 3 variables.
pguilt
a numeric vector recording the judged probability of guilt in a criminal trial scenario
v1
a numeric vector taking the value 1
if the respondent returned a “guilty” verdict in the criminal trial and 0
otherwise
v2
a numeric vector taking the value 1
if the respondent returned a 'not 'guilty” verdict in the criminal trial and 0
otherwise
Data provided from Study 3 of Smithson, Deady and Gracik (2007).
Smithson, M., Gracik, L., & Deady, S. (2007). Guilty, not guilty, or ?multiple verdict options in jury verdict choices. Journal of Behavioral Decision Making, 20, 481-498.
data("guilt3", package="smdata")
data("guilt3", package="smdata")
Lower and upper probability estimates provided by the Busdecu et al. (2009) respondents in their interpretations of the phrase “very likely” in an IPCC report statement, along with dummy variables indicating the experimental condition.
data("intervalbeta")
data("intervalbeta")
A data frame with 220 observations on the following 5 variables.
t
a numeric vector taking the value 1
if the respondent is in the Translation condition, and 0
otherwise
n
a numeric vector taking the value 1
if the respondent is in the Narrow condition, and 0
otherwise
w
a numeric vector taking the value 1
if the respondent is in the Wide condition, and 0
otherwise
y1
a numeric vector recording the respondent's lower probability estimate
y2
a numeric vector recording the respondent's upper probability estimate
Data provided by D. V. Budescu from the Budescu et al. (2009) study.
Budescu, D.V., Broomell, S., and Por,H.-H. (2009). Improving the communication of uncertainty in the reports of the Intergovernmental panel on climate change, Psychological Science, 20, 299-308.
data("intervalbeta", package="smdata")
data("intervalbeta", package="smdata")
Frequency with which respondents correctly identified 0, 1, 2, 3, or 4 letters (in correct versus incorrect order) of a word or non-word based on a cue.
data("phono")
data("phono")
A data frame with 16 observations on the following 3 variables.
treeid
a numeric vector, a tree identification code needed by the R package for estimating MPT models
resp
a factor denoting whether a respondent correctly identified 0, 1, 2, 3, or 4 letters, with CO denoting the 4 letters were in the correct order and IO indicating that they were not, with levels 0L
1L
2L
3L
4LCO
4LIO
fr
a numeric vector recording the frequency of each response type
These data are extracted from Maris (2002) figure 7, pg. 1421.
Maris, E. (2002). The role of orthographic and phonological codes in the word and the pseudoword superiority effect: An analysis by means of multinomial processing tree models. Journal of Experimental Psychology: Human Perception and Performance, 28, 1409-1431.
data("phono", package="smdata")
data("phono", package="smdata")
Response times for a task timed-out at 1200 ms, and a prime (either respondents were primed to use intuition or deliberation in the task).
data("rtime")
data("rtime")
A data frame with 300 observations on the following 3 variables.
RT
a numeric vector, response time in milliseconds
prime
a numeric vector taking the value 0
if primed to use intuition or 1
if primed to use deliberation
status
a numeric vector taking the value 0
if the observation is censored and 1
if not
These are hypothetical data.
data("rtime", package="smdata")
data("rtime", package="smdata")
Data from the U.S. National Survey on Drug Use and Health on the frequency with which individuals skip school and other covariates.
data("skipping")
data("skipping")
A data frame with 252 observations on the following 6 variables.
income
Reported household income, where 1
means < $20k;
2
means >= $20k and < $50k;
3
means >= $50k and < $75k;
4
means >= $75k.
irsex
Gender; 1
is male and 2
is female.
educatn2
Grade in school (see details).
schdskip
Reported number of school days skipped out of the past 30.
wrkhrsw2
Reported number of hours worked in the past week.
anyskip
A binary version of schdskip
, signifying whether the
respondent skipped any days of school out of the past 30.
Variable names match those from the National Survey on Drug Use and Health, so more
details can be obtained from the survey codebook. Missing data codes have been
changed to NA
. Additionally, the educatn2
has been recoded to generally
match the actual grade in which the respondent is enrolled. The only exceptions to this
are that 14
means the second and third years in college, and 15
means the fourth
or higher year in college.
Obtained from the Inter-University Consortium for Political and Social Research, University of Michigan, http://www.icpsr.umich.edu.
United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Center for Behavioral Health Statistics and Quality. National Survey on Drug Use and Health, 2010. ICPSR32722-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2011-12-05. doi:10.3886/ICPSR32722.v1
data("skipping", package="smdata")
data("skipping", package="smdata")
Choice of transportation mode by gender.
data("trchoice")
data("trchoice")
A data frame with 10 observations on the following 4 variables.
treeid
a numeric vector needed for identifying a tree in the MPT algorithm
sex
a numeric vector taking the value 1
if male and 0
if female
resp
a factor denoting the transport mode choice, where D
denotes driving one's own vehicle, F
denotes getting a ride with a friend, O
denotes other, P
denotes using public transport, and W
denotes walking
fr
a numeric vector recording the frequency with which each transport mode is chosen
The data are extracted from the 1991-1994 Drug Abuse Treatment Outcome Study (2010) data (DATOS).
United States Department of Health and Human Services and National Institute of Health and National Institute on Drug Abuse (2010). Drug Abuse Treatment Outcome Study, 1991-1994. Ann Arbor, MI: Inter-university Consortium for Political and Social Research.
data("trchoice", package="smdata")
data("trchoice", package="smdata")
Experimental data in which participants were presented with statistical information about two treatments for chest pain, then asked about their preference for a treatment.
data("treatment")
data("treatment")
A data frame with 235 observations on the following 4 variables.
cond
Condition, referring to the way that statistical information was presented (see details).
choice
Treatment preference on an ordinal, 6-level scale from "definitely angioplasty" to "definitely bypass".
effectiveness
Participant ratings of the importance of treatment effectiveness on treatment choice (1 is extremely unimportant; 6 is extremely important).
invasiveness
Participant ratings of the importance of treatment invasiveness on treatment choice (1 is extremely unimportant; 6 is extremely important).
The data were taken from Hulsey (2010). Study participants were asked to make a hypothetical decision between two treatments for chest pain: bypass surgery or balloon angioplasty. Bypass is generally more effective, but it is also more invasive and has a longer recovery time.
Conditions were defined by the way participants received statistical
information concerning the two treatments. In condition
pictograph
, participants viewed
visual information via a pictograph. In condition
statistics
, participants view numerical information.
Provided by Lukas Hulsey.
Hulsey, L. (2010). Testimonials and statistics in patient medical decision aids. Unpublished master's thesis, Wichita State University.
data("treatment", package="smdata")
data("treatment", package="smdata")
Choice of transportation mode by gender, in long format so that each choice occupies 5 rows.
data("trlong")
data("trlong")
A data frame with 31680 observations on the following 6 variables.
obs
a numeric vector
case
a numeric vector
sex
a numeric vector, = 1
if male and 0
if female
resp
a factor indicating the transport mode choice, and B
denotes taking the bus, codeD denotes driving one's own vehicle, F
denotes getting a ride with a friend, O
denotes other, and W
denotes walking
chosen
a numeric vector taking the value 1
if the transport mode was chosen and 0
if not
pubpriv
a numeric vector that takes a value of 1
if the transportation mode is private and 0
if it is public
The data are extracted from the 1991-1994 Drug Abuse Treatment Outcome Study (2010) data (DATOS).
United States Department of Health and Human Services and National Institute of Health and National Institute on Drug Abuse (2010). Drug Abuse Treatment Outcome Study, 1991-1994. Ann Arbor, MI: Inter-university Consortium for Political and Social Research.
data("trlong", package="smdata")
data("trlong", package="smdata")
Data from the U.S. National Survey on Drug Use and Health on the frequency with which individuals miss work due to mental health issues and other covariates.
data("workdays")
data("workdays")
A data frame with 777 observations on the following 8 variables.
cigtry
Reported age that the respondent first smoked a cigarette.
impydays
Reported days in the past year the respondent was unable to work due to mental health (see details).
age2
Respondent age (see details).
service
Has the respondent been in the U.S. Armed Forces? (1
=yes, 0
=no)
health
Rating of overall health, where 1
is excellent and 5
is poor.
movespy2
Number of times the respondent moved in the past 12 months.
schenrl
Whether the respondent is enrolled in any school (1
=yes, 0
=no).
coutyp2
Type of county in which the respondent resides: large metro (large
), small metro (small
), nonmetro (nonmetro
).
Variable names match those from the National Survey on Drug Use and Health, so more
details can be obtained from the survey codebook. Missing data codes have been
changed to NA
. Additionally, age2
is coded so that 7
means 18 years of age, 8
means 19 years of age, ..., 11
means 22 or 23 years of age, 12
means 24 or 25 years,
13
means 26-29, 14
means 30-34, 15
means 35-49, 16
means 50-64, and 17
means 65 and over.
The variable impydays
contains responses to the question "About how many days out of 365 in the past 12 months were you totally unable to work or carry out your normal activities because of your emotions, nerves, or mental health?"
Obtained from the Inter-University Consortium for Political and Social Research, University of Michigan, http://www.icpsr.umich.edu.
United States Department of Health and Human Services. Substance Abuse and Mental Health Services Administration. Center for Behavioral Health Statistics and Quality. National Survey on Drug Use and Health, 2010. ICPSR32722-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2011-12-05. doi:10.3886/ICPSR32722.v1
data("workdays", package="smdata")
data("workdays", package="smdata")