{{ContextLesson|4.2 Finding Patterns in Random Noise}}
{{ContextLesson|4.2 Finding Patterns in Random Noise}}
{{ContextRelation|In addition to the signal-to-noise ratio, there are other statistical tools (e.g. p-value) to quantify the strength of the signal amidst all the noise.}}
{{ContextRelation|In addition to the signal-to-noise ratio, there are other statistical tools (e.g. <math>p</math>-value) to quantify the strength of the signal amidst all the noise.}}
{{ContextLesson|5.1 False Positives and Negatives}}
{{ContextLesson|5.1 False Positives and Negatives}}
{{ContextRelation|"Positive" and "negative" refer to whether we identify what we detect as a signal or not. The decision of any "threshold" of strength for a signal to be counted as positive inevitably involves human values judgment in a trade-off between the rates of false positives and false negatives.}}
{{ContextRelation|"Positive" and "negative" refer to whether we identify what we detect as a signal or not. The decision of any "threshold" of strength for a signal to be counted as positive inevitably involves human values judgment in a trade-off between the rates of false positives and false negatives.}}
Revision as of 14:19, 15 August 2023
The challenges of finding the information we want amidst messy data.
The presence of noise, which sometimes disguises as a signal, is inevitable in any measurement. The identification of a signal always comes with a roughly quantifiable level of confidence.
In addition to the signal-to-noise ratio, there are other statistical tools (e.g. [math]\displaystyle{ p }[/math]-value) to quantify the strength of the signal amidst all the noise.
"Positive" and "negative" refer to whether we identify what we detect as a signal or not. The decision of any "threshold" of strength for a signal to be counted as positive inevitably involves human values judgment in a trade-off between the rates of false positives and false negatives.
Some signals in nature seem hopelessly too weak to detect, such as the tiny fluctuations in the distance between two mirrors as a result of the gravitational waves from faraway black holes, but scientists spend decades to develop new instruments to increase the strength of the signal, as well as new analysis techniques to filter out the noise.
The detection of a "statistically significant" difference between conditions in an RCT is the identification of a signal. The random variations that exist between experimental subjects are a source of noise.
Takeaways
After this lesson, students should
Be able to explain what scientists mean by "signal," "noise," and "signal-to-noise ratio."
Be able to identify examples of "signal" and "noise," recognizing that these examples are context-dependent.
Be able to roughly compare measurement techniques in terms of their resultant signal-to-noise ratios.
Be able to describe examples of techniques and tools to suppress noise and/or amplify signal.
Signal
Aspects of observations or stimuli that provide useful information about the target of interest, as opposed to noise.
Please hold off on introducing the concept of false positive/negative or thresholds in detections, as students have previously been overwhelmed and confused. We will properly discuss them in 5.1 False Positives and Negatives.
Noise
The aspects of observations that get confused with signal but do not provide the same useful information about the target of interest. Noise is frequently, but not always, the result of random measurement fluctuations.
Some students falsely think that noise is anything that prevents you from detecting the signal, for instance, a law banning the use of ultrasound to detect the sex of the foetus. In fact, noise is something that is detected by an instrument the same way a signal would be, except that it is not caused by the source of the signal and could be confused with the signal.
There is always random background noise. But, noise doesn't have to be random.
Noise does not have to be sound.
Signal-to-noise Ratio
The relative strength of signal compared to the relative strength of noise in a given context. Obtaining meaningful information from the world requires distinguishing signal from noise. Therefore, human cognition (both scientific and otherwise) relies on techniques and tools to suppress noise and/or amplify signal (i.e., increase signal-to-noise ratio). It is possible to design filters to increase the signal-to-noise ratio, if you know where the noise is going to appear.
Bajau People
As a member of the Bajau people of Southeast Asia, you are diving to collect shellfish for food. While the shellfish themselves are the signal, there are several sources of noise: rocks and other creatures resembling shellfish, waving sunlight patterns on the seafloor. The signal-to-noise ratio may be low if the water is murky (higher noise), the shellfish are camouflaged (lower signal), or if the light is dim (lower signal). (BBC Article)
Identifying Fish
Detecting fish jumps (signal) on a lake on a day when the wind is causing waves (noise). Some splashing waves may be misidentified as fish jumps.
Radio Static
Getting the words of a radio personality through static.
Loud Party
Hearing your conversational partner at a party where lots of conversations are happening.
Randomized Controlled Trials
Figuring out if there's a meaningful difference between the control condition and experimental condition in an RCT. Random fluctuations in the chosen experimental sample may cause a spurious difference between the two groups; this is a source of noise.
Finding the facts on a topic where there's a lot of disinformation floating around.
Palette Cleansing
Palette cleansing with water or crackers between tasting different wines. The subtle differences between wines are the signal, while lingering flavours and scents from the previous wine are the noise.
COVID Symptom Screening
The signal is the actual COVID infection, and the noise is all the other illnesses/allergies/etc causing similar symptoms.
Smoke Detectors
Smoke detectors detect the presence of smoke from a fire (signal) by measuring the opacity of air. Steam is a possible source of noise.
How can we definitely tell if a single stimulus is signal or noise?
By improving the sensitivity of the instrument.
The stimulus is definitely a signal if it is stronger than most of the previous stimuli.
It is impossible to tell for sure if a single stimulus is a signal or noise.
Explanation
Noise can masquerade itself as signal, and random fluctuations can sometimes produce a single strong stimulus. For a given stimulus, we can only come up with a likelihood for whether it is signal or noise. We then have to determine the confidence level we need in order to classify stimuli appropriately. Any single supposed signal might be a rare (or not-so-rare) spike in noise.
Scenario Analysis
For each of several scenarios, have the students answer the following questions about signal and noise.
What is the sense/instrument that you are using?
What does the sense/instrument actually measure?
What is the signal from the sense/instrument you are expecting?
What sources of noise do you anticipate in this measurement? List two or more if you can.
(Optional) How would you reduce these sources of noise?
Example Scenario
Members of the Bajau people of Southeast Asia collecting shellfish for food.
What is the sense/instrument that you are using?
Their eyes.
What does the sense/instrument actually measure?
Visual light reflecting off nearby surfaces.
What is the signal from the sense/instrument you are expecting?
A shell shape/pattern.
What sources of noise do you anticipate in this measurement? List two or more if you can.
Murky water, low light, creatures and rocks that look like shellfish.
(Optional) How would you reduce these sources of noise?
Clean the water, dive during the day, etc.
Scenarios
Catching gossip about you from across the room at a party. What about understanding what the person you're talking to is saying?
Detecting a metal knife in the luggage of someone boarding an airplane.
Detecting an ongoing earthquake in Berkeley.
Determining if your arch nemesis put cyanide in your almond milk.
Determining whether there are birds around you on your weekly birding expedition, then determining whether owls are in the mix.
Is that a creepy crawly on your neck right now?
Identifying a budding new wave of COVID in the US. (Suppose you're a health official provided with daily updates of the following data from hospitals across the country: rates of people coming into the ER with fever, coughs, broken bones, wounds, diarrhea, and cardiac arrest...)
Guess the Message Game
In this game the students will write a message and corrupt it to varying degrees. Each student will have a partner with whom they shared the corrupted messages. Each student will try and decode the messages from the other student from most to least corrupted. Full instructions are available in the handout.
The 140 page handout for this game is designed to work with up to 35 students (each student gets four pages). The last of the four pages has a different randomized pair of letter and number grids for each student. Hence, the handout is slightly different for each student. If you have more than 35 students, you'll have to print more copies.
Students should not share the messages nor the decoding process until the game is done.
Instructions
2 Minutes
Hand out the students a copy of the worksheet. It has the full instructions in it.
5 Minutes
Explain the game as per the instructions linked above. Make sure the students know not to share their uncorrupted messages with each other until the game is complete.
What was the highest corruption level at which you could understand the message?
What are the factors affecting the signal-to-noise ratio?
The quantity of letters corrupted increases the noise to affect the ratio. The strength of the original message is also important. If the original message is short, then this also lowers the signal-to-noise ratio. Furthermore, if you have a very obscure message (that another student might not be likely to recognize) to begin with, then the signal would also be less clear.
Takeaway
The ratio of signal to noise determines how easy it is to distinguish between true signal and the noise that "pretends" to be signal.
We could have a poor ratio because the signal is very low or because the noise is very high.
Example: Why is it hard to have a conversation at a cocktail party?
Signal: Increase the sound of your voice.
Noise: Go outside to get away from the background conversations.