|
|
(5 intermediate revisions by 2 users not shown) |
Line 1: |
Line 1: |
| [[File:Topic Cover - 4.2 Finding Patterns in Random Noise.png|thumb]]
| | {{Cover|4.2 Finding Patterns in Random Noise}} |
|
| |
|
| Humans are so good at identifying patterns that we often see them even when it is really noise in masquerade. When we think we have seen a pattern, how do we quantify the level of confidence correctly? We describe common pitfalls that lead to an overconfidence in an apparent pattern, some that even prey on the inattentive scientist! | | Humans are so good at identifying patterns that we often see them even when it is really noise in masquerade. When we think we have seen a pattern, how do we quantify the level of confidence correctly? We describe common pitfalls that lead to an overconfidence in an apparent pattern, some that even prey on the inattentive scientist! |
|
| |
| {{Navbox}}
| |
|
| |
|
| == The Lesson in Context == | | == The Lesson in Context == |
Line 23: |
Line 21: |
| {{ContextRelation|<math>p</math>-hacking is one source of pathological science, where authors fail to disclose the measurements in which the supposed signal is not observed, thereby falsely inflating the statistical significance of the reported signal.}} | | {{ContextRelation|<math>p</math>-hacking is one source of pathological science, where authors fail to disclose the measurements in which the supposed signal is not observed, thereby falsely inflating the statistical significance of the reported signal.}} |
| }} | | }} |
|
| |
| == Takeaways == | | == Takeaways == |
|
| |
|
Line 36: |
Line 33: |
| # Recognize and explain the flaw in scenarios in which scientists and other people mistake noise for signal. | | # Recognize and explain the flaw in scenarios in which scientists and other people mistake noise for signal. |
| # Resist the opposing temptations of both the Gambler's Fallacy (the expectation that a run of similar events will soon break and quickly balance out, because of the assumption that small samples resemble large samples) and the Hot-hand Fallacy (the expectation that a run will continue, because runs suggest non-randomness). | | # Resist the opposing temptations of both the Gambler's Fallacy (the expectation that a run of similar events will soon break and quickly balance out, because of the assumption that small samples resemble large samples) and the Hot-hand Fallacy (the expectation that a run will continue, because runs suggest non-randomness). |
| | # '''(Data Science)''' Describe the difference between the effect size (strength of pattern) and credence level (probability that the pattern is real), and identify the role each plays in decision making. |
| {{BoxCaution|People underestimate the frequency of apparent patterns produced by randomness, leading to over-perception of spurious signal much more frequently than people account for. Events that are just coincidental are much more likely than most people expect.}} | | {{BoxCaution|People underestimate the frequency of apparent patterns produced by randomness, leading to over-perception of spurious signal much more frequently than people account for. Events that are just coincidental are much more likely than most people expect.}} |
| <br /> | | <br /> |
Line 49: |
Line 47: |
| * Running a test or similar tests too many times, reporting only statistically significant results. | | * Running a test or similar tests too many times, reporting only statistically significant results. |
| * The effect also occurs in everyday life, e.g. when one looks at a whole lot of phenomena and only takes note of the most surprising-looking patterns, not properly taking into account the larger number of unsurprising patterns/lack of pattern.}} | | * The effect also occurs in everyday life, e.g. when one looks at a whole lot of phenomena and only takes note of the most surprising-looking patterns, not properly taking into account the larger number of unsurprising patterns/lack of pattern.}} |
| {{Definition|''<math>p</math>-hacking''|A subset of the Look Elsewhere Effect that occurs when people conduct multiple statistical tests and only report those with <math>p</math>-values over .05 (the traditional threshold for publication and statistical significance, which indicates a tolerance of 5% false positives).}} | | {{Definition|''<math>p</math>-hacking''|A subset of the Look Elsewhere Effect that occurs when people conduct multiple statistical tests and only report those with <math>p</math>-values under .05 (the traditional threshold for publication and statistical significance, which indicates a tolerance of 5% false positives).}} |
| {{BoxCaution|A <math>p</math>-value cutoff of .05 thus indicates that, on average, 1 in 20 results will be false positives. So one should expect, on average, one false positive for every 20 independent analyses of pure noise. <math>p</math>-hacking is statistically problematic but more often a result of misunderstanding than deliberate fraudulence.}} | | {{BoxCaution|A <math>p</math>-value cutoff of .05 thus indicates that, 1 out of 20 analyses of pure noise would discover a spurious signal. <math>p</math>-hacking is statistically problematic but more often a result of misunderstanding than deliberate fraudulence.}} |
| {{BoxTip|title=Common Techniques for <math>p</math>-hacking| | | {{BoxTip|title=Common Techniques for <math>p</math>-hacking| |
| * Running different statistical analyses on the same dataset and only reporting the statistically significant ones. | | * Running different statistical analyses on the same dataset and only reporting the statistically significant ones. |