Interactive Demos — Iliana Samara

Game · two minutes

Measure your own criterion

Each display below contains either 20 dots or 25 dots, shown for less than half a second. Your job is to call it: high or low. There are 20 displays and no feedback along the way. At the end you get your own sensitivity (d′) and criterion (c), the same two numbers I estimate for participants in my studies.

Fix your eyes on the cross. The dots flash briefly, then the buttons unlock: click, or press H and L. Go with your gut when unsure: that is precisely the behaviour the criterion measures. No data leave your browser.

A liberal criterion here means you said “high” whenever the display felt ambiguous. In my research the same logic describes who reads interest into ambiguous social cues. Twenty trials give a noisy estimate, which is itself a useful lesson: single-person SDT estimates are unstable, and that is why my tutorials use multilevel models.

Explore · signal detection theory

Sensitivity and bias, untangled

Two different things determine detection judgments: how well the signal separates from the noise (d′) and where you place your decision line (c). Raw accuracy mixes them together; signal detection theory pulls them apart. Move the sliders and watch the four outcomes and the point on the ROC curve.

Sensitivity d′ 1.50

How far the signal distribution sits from the noise.

Criterion c 0.00

Negative = liberal (says “signal” easily). Positive = conservative.

Hits–

Misses–

False alarms–

Correct rejections–

Shaded tails right of the line: false alarms (grey-blue) and hits (orange).

ROC: every criterion is one point on the curve set by d′.

Try this: set d′ to 1 and c to −0.8, note the hit rate, then set d′ to 2.5 and c to 0.9. Two very different observers can produce the same hit rate, one through sensitivity and one through a lenient threshold. That distinction is invisible if you only report accuracy, and it is the reason my work models the two parameters separately.

Explore · ordinal data

What a 1-to-5 rating hides

Suppose observers rate perceived interest on a five-point scale. Ordinal signal detection theory says they carve a continuous impression into categories using four thresholds. The same underlying separation (d′) can produce very different-looking ratings depending on where those thresholds sit. Move the scale around and watch the rating distributions and the apparent effect size change while d′ stays put.

Latent separation d′ 1.50

Scale shift 0.00

Negative = a generous rater (high ratings come easily). Positive = stingy.

Scale spread 0.70

Small = endpoints get used. Large = answers cluster in the middle.

Mean rating, noise–

Mean rating, signal–

Metric effect size–

Four thresholds turn one continuous impression into ratings 1 to 5.

How often each rating gets used, by trial type.

The trap: treating ratings as plain numbers makes the effect size a property of the response scale, not just of perception. Two studies can find different “effects” simply because their participants used the scale differently. Ordinal models estimate the thresholds together with d′, so the conclusion does not depend on scale habits. This is the topic of my current methods work on hierarchical ordinal models.

Explore · error management

When over-perceiving is the rational move

A detector cannot avoid both kinds of error; it can only trade them. If a miss costs more than a false alarm, the cost-minimizing criterion shifts liberal: the best possible detector makes more false alarms, on purpose. This is the signal-detection core of error management theory. Set the stakes and the base rate, and watch where the optimal line lands.

Base rate of signal 30%

How often the signal is actually present.

Cost of a miss 6

Cost of a false alarm 2

Sensitivity d′ 1.50

Optimal c*–

Hits at c*–

False alarms at c*–

The dashed line marks the criterion with the lowest expected cost.

Notice the tug-of-war: cost asymmetries pull the line one way, base rates pull it back. In the smoke-alarm preset a miss is catastrophic, yet fires are rare, so the optimum barely moves. Error management explanations of biases like sexual overperception live or die on exactly this arithmetic, which is why my work measures the criterion directly instead of inferring it from accuracy.

Explore · Bayesian updating

How cues add up

Treat “are they interested?” as a hypothesis and each behaviour as evidence. Every cue multiplies the odds by its likelihood ratio: how much more probable that behaviour is if they are interested than if they are not. Choose a prior, click cues on and off, and watch the posterior move.

Prior probability of interest 20%

What you believed before observing anything.

prior 20% (thin line)posterior 20%

The likelihood ratios here are made up for illustration. In real interactions most single cues are weak, context moves them around, and people differ enormously in the priors they walk in with. The gap between the evidence actually available and the conclusions people draw from it is what my research tries to measure.

Explore · estimation

Why I model everyone at once

Estimate d′ separately for each participant and small samples produce wild numbers: with 20 trials, someone can look like a super-detector by luck alone. Multilevel models treat people as draws from a population and pull noisy estimates toward the group mean, more strongly when the data are thin. Statisticians call this partial pooling, or shrinkage. Draw a sample of twelve participants and compare the two approaches against the truth.

Trials per person 40

Fewer trials, noisier per-person estimates, stronger shrinkage.

True between-person spread 0.50

How much people genuinely differ in d′.

Error, no pooling–

Error, partial pooling–

Average error is the root-mean-square distance between estimate and truth. Partial pooling wins most clearly when trials are few and people are similar; crank the trials up and the two approaches converge, because strong data need little help. This trade-off is the engine of my multilevel SDT tutorial: the same data, modeled jointly, give person-level estimates you can actually trust.

Explore · sampling design

The situations you never sample

Claims about how clearly consent is communicated usually rest on surveys about encounters that happened. But situations where someone refused, withdrew, or no signal was given at all rarely enter the sampling frame, and they are exactly where communication matters most. Below is a population of 100 situations. Choose what the study gets to sample, and watch the estimate drift from the truth.

Refusals & absences per 100 30

Situations where nothing proceeded: an explicit no, or no signal at all.

Withdrawals per 100 8

Encounters that began and were stopped.

Completed encounters with clear agreement 65%

The rest proceeded on ambiguous signals.

Study estimate–

True rate–

estimated share of situations with clearly communicated agreementtruth (thin line)

A simplified illustration of the sampling argument in my paper Estimating consent clarity requires sampling absence, refusal, and withdrawal (Archives of Sexual Behavior, 2026). The numbers here are illustrative, not estimates from data; the paper develops the design problem properly.

Ideas you can poke at

Measure your own criterion

Sensitivity and bias, untangled

What a 1-to-5 rating hides

When over-perceiving is the rational move

How cues add up

Why I model everyone at once

Use these in your teaching