https://www.sciencedirect.com/science/article/abs/pii/S0010027721003358
Here’s a research article that I had laying around for several years (pre-LLM). I dusted it off and asked Grok to summarize it. Here is what Grok says:
Summary
This 2021 paper by Hugo Mercier and Nicolas Claidière, published in Cognition, investigates whether discussion in large groups enhances or undermines the “wisdom of crowds” phenomenon, where aggregating independent opinions (e.g., via averaging or majority voting) often yields surprisingly accurate results. The authors review theoretical arguments: discussion might reduce diversity and independence (potentially harming aggregation), but it could also improve individual accuracy (as seen in small-group studies). To test this in larger groups, they conducted an experiment during the 2017 European Researchers’ Night across 11 French cities, involving 1,958 participants in 33 groups (sizes 22–212, mean 59).
Participants tackled one of three problem types—demonstrative (logical/mathematical, e.g., “bat and ball” puzzle), factual (numerical estimates, e.g., goals in the 2010 World Cup), or ethical (value judgments, e.g., compensation for losing a finger)—first individually (5 minutes of silent thinking, answers recorded every minute), then via discussion with neighbors (10 minutes, answers recorded every minute). Performance was measured against objective benchmarks where possible (demonstrative and factual problems). Ethical problems served as controls, with results relegated to supplementary materials due to interpretive challenges.
Key findings: Discussion improved individual accuracy across problems with correct answers, and it either enhanced or did not harm aggregated group answers. The paper concludes that, for solvable problems, discussion in large groups tends to boost rather than detract from collective wisdom.
Breakdown of Most Important Conclusions
The authors tested specific hypotheses (H1a–c, H2) and a research question (RQ1). Here’s a table summarizing the key conclusions, supported by evidence from the study:
| Conclusion | Problem Type | Key Evidence | Implications |
|---|---|---|---|
| Discussion improves individual performance more than solitary thinking (H1a, H1b confirmed; H1c partially confirmed) | Demonstrative and Factual | - For demonstrative problems: Correct answers rose from ~12–41% (post-silence) to 84–91% (post-discussion).- For factual problems: Mean error decreased (e.g., World Cup error halved from 96 to 52; Elevators from 55 to 48).- For ethical problems: No systematic improvement (results in ESM, no strong conclusions drawn). | Extends small-group benefits to large, diverse crowds; participants converged toward correct answers via arguments, not just majority pressure. |
| Discussion improves aggregated answers via majority voting (H2 confirmed) | Demonstrative | - Post-silence: Majority correct in only 2/13 groups.- Post-discussion: Majority correct in all 13 groups (significant, p=0.003). | “Truth wins” in large groups: One correct individual can convince many through discussion. |
| Discussion improves or has no effect on aggregated answers via averaging (RQ1 answered positively/neutrally) | Factual | - Elevators: All 6 groups reduced mean error post-discussion (p=0.03).- World Cup: No consistent change in mean error (2 groups improved, 2 worsened), due to variance reduction without directional shift toward the correct answer. | Loss of diversity is offset by accuracy gains; no detriment to wisdom of crowds. Overall, discussion enhanced aggregates in 3/4 cases with benchmarks and was neutral in 1/4. |
| No detrimental effects overall | All (with benchmarks) | - Individual answers always improved.- Aggregates improved or stayed the same; never consistently worsened.- Participants moved toward correct answers more than initial group averages. | Challenges theories predicting discussion harms diversity/independence (e.g., Lorenz et al., 2011); supports argumentation theory (Mercier & Sperber, 2017). |
The study limitations include modest group sizes relative to massive crowds, homogeneous problems, and constrained discussion (only with neighbors). Ethical problems were inconclusive, and results are strongest for problems with objective truths.
What Should Be Done in Light of This Research
Based on the findings, I think organizations and decision-makers should prioritize incorporating discussion into large-group processes for problems with verifiable answers, like policy evaluations, forecasting, or problem-solving in business/scientific teams. For instance, citizens’ assemblies or corporate brainstorming could allocate time for structured neighbor-based discussions before voting or averaging, as this setup proved effective here. It could be scaled via online platforms (e.g., forums or breakout rooms) to mimic the neighbor constraint while allowing flexibility.
Further research should test even larger crowds (e.g., thousands) and varied network structures—maybe optimizing connections to spread accurate info faster, as hinted by related studies. Also, explore non-objective problems (beyond ethics) like creative ideation, where diversity loss might hurt more. Practically, avoid over-relying on silent aggregation in high-stakes scenarios; the gains in accuracy seem worth the time investment. If anything, this suggests ancient Athens might have benefited from allowing some discussion before votes, at least for factual matters.