Interactive demos
Ideas you can poke at
The models behind my research are easier to understand by moving them than by reading about them. These small demos run entirely in your browser: play a detection game and measure your own decision criterion, then explore signal detection, error management, Bayesian updating, ordinal scales, partial pooling, and the sampling designs that make or break a study. Built for students, free to reuse in teaching.
- The detection game
- Sensitivity & bias
- Ordinal scales
- Error management
- Bayesian updating
- Partial pooling
- Sampling consent
Game · two minutes
Measure your own criterion
Each display below contains either 20 dots or 25 dots, shown for less than half a second. Your job is to call it: high or low. There are 20 displays and no feedback along the way. At the end you get your own sensitivity (d′) and criterion (c), the same two numbers I estimate for participants in my studies.
Fix your eyes on the cross. The dots flash briefly, then the buttons unlock: click, or press H and L. Go with your gut when unsure: that is precisely the behaviour the criterion measures. No data leave your browser.
A liberal criterion here means you said “high” whenever the display felt ambiguous. In my research the same logic describes who reads interest into ambiguous social cues. Twenty trials give a noisy estimate, which is itself a useful lesson: single-person SDT estimates are unstable, and that is why my tutorials use multilevel models.
Explore · signal detection theory
Sensitivity and bias, untangled
Two different things determine detection judgments: how well the signal separates from the noise (d′) and where you place your decision line (c). Raw accuracy mixes them together; signal detection theory pulls them apart. Move the sliders and watch the four outcomes and the point on the ROC curve.
How far the signal distribution sits from the noise.
Negative = liberal (says “signal” easily). Positive = conservative.
Shaded tails right of the line: false alarms (grey-blue) and hits (orange).
ROC: every criterion is one point on the curve set by d′.
Try this: set d′ to 1 and c to −0.8, note the hit rate, then set d′ to 2.5 and c to 0.9. Two very different observers can produce the same hit rate, one through sensitivity and one through a lenient threshold. That distinction is invisible if you only report accuracy, and it is the reason my work models the two parameters separately.
Explore · ordinal data
What a 1-to-5 rating hides
Suppose observers rate perceived interest on a five-point scale. Ordinal signal detection theory says they carve a continuous impression into categories using four thresholds. The same underlying separation (d′) can produce very different-looking ratings depending on where those thresholds sit. Move the scale around and watch the rating distributions and the apparent effect size change while d′ stays put.
Negative = a generous rater (high ratings come easily). Positive = stingy.
Small = endpoints get used. Large = answers cluster in the middle.
Four thresholds turn one continuous impression into ratings 1 to 5.
How often each rating gets used, by trial type.
The trap: treating ratings as plain numbers makes the effect size a property of the response scale, not just of perception. Two studies can find different “effects” simply because their participants used the scale differently. Ordinal models estimate the thresholds together with d′, so the conclusion does not depend on scale habits. This is the topic of my current methods work on hierarchical ordinal models.
Explore · error management
When over-perceiving is the rational move
A detector cannot avoid both kinds of error; it can only trade them. If a miss costs more than a false alarm, the cost-minimizing criterion shifts liberal: the best possible detector makes more false alarms, on purpose. This is the signal-detection core of error management theory. Set the stakes and the base rate, and watch where the optimal line lands.
How often the signal is actually present.
The dashed line marks the criterion with the lowest expected cost.
Notice the tug-of-war: cost asymmetries pull the line one way, base rates pull it back. In the smoke-alarm preset a miss is catastrophic, yet fires are rare, so the optimum barely moves. Error management explanations of biases like sexual overperception live or die on exactly this arithmetic, which is why my work measures the criterion directly instead of inferring it from accuracy.
Explore · Bayesian updating
How cues add up
Treat “are they interested?” as a hypothesis and each behaviour as evidence. Every cue multiplies the odds by its likelihood ratio: how much more probable that behaviour is if they are interested than if they are not. Choose a prior, click cues on and off, and watch the posterior move.
What you believed before observing anything.
The likelihood ratios here are made up for illustration. In real interactions most single cues are weak, context moves them around, and people differ enormously in the priors they walk in with. The gap between the evidence actually available and the conclusions people draw from it is what my research tries to measure.
Explore · estimation
Why I model everyone at once
Estimate d′ separately for each participant and small samples produce wild numbers: with 20 trials, someone can look like a super-detector by luck alone. Multilevel models treat people as draws from a population and pull noisy estimates toward the group mean, more strongly when the data are thin. Statisticians call this partial pooling, or shrinkage. Draw a sample of twelve participants and compare the two approaches against the truth.
Fewer trials, noisier per-person estimates, stronger shrinkage.
How much people genuinely differ in d′.
Average error is the root-mean-square distance between estimate and truth. Partial pooling wins most clearly when trials are few and people are similar; crank the trials up and the two approaches converge, because strong data need little help. This trade-off is the engine of my multilevel SDT tutorial: the same data, modeled jointly, give person-level estimates you can actually trust.
Explore · sampling design
The situations you never sample
Claims about how clearly consent is communicated usually rest on surveys about encounters that happened. But situations where someone refused, withdrew, or no signal was given at all rarely enter the sampling frame, and they are exactly where communication matters most. Below is a population of 100 situations. Choose what the study gets to sample, and watch the estimate drift from the truth.
Situations where nothing proceeded: an explicit no, or no signal at all.
Encounters that began and were stopped.
The rest proceeded on ambiguous signals.
A simplified illustration of the sampling argument in my paper Estimating consent clarity requires sampling absence, refusal, and withdrawal (Archives of Sexual Behavior, 2026). The numbers here are illustrative, not estimates from data; the paper develops the design problem properly.
Use these in your teaching
All demos are plain HTML and JavaScript with no tracking and no server. You are welcome to link to them, embed them, or adapt the code for courses. For the full statistical treatment, see the tutorial and app on the resources page.