Overview of classification bias
Classification bias occurs when variables that affect outcomes are inadequately recorded at the beginning of a study. Misclassifying the statuses of participants within the trial can lead to an inaccurate association between the studied risk factor/exposure and relevant outcomes.
How to detect classification bias
In retrospective observational studies, researchers look back in time at groups of participants that had a risk factor or exposure and its association to some outcome. Detecting bias in classification asks the following question: could the outcomes(s) be biased based on how participants were classified in the study due to their risk factor/exposure?
Sometimes participants are asked to recall their relationship to the risk factor/exposure or outcome. Acquiring information based on memory can be highly confounded, as memories can be unreliable, especially when asking participants for specifics such as “how many times over the past year did you eat red meat?”
If the participants over or underestimate their consumption of red meat (the exposure), the outcomes of study will be biased. In such a case, recall bias is introduced into the study because the participants’ recollection is inaccurate.
The two types of classification bias
There are two main types of classification bias: non-differential and differential misclassification of interventions. Let's examine each.
Non-differential misclassification of interventions
Non-differential misclassification of interventions occurs when the status of the intervention is randomly misclassified and is unrelated to the outcome. Both groups are equally affected by non-differential misclassification, and the actions of the participants do not directly cause this bias.
When interventions are non-differentially misclassified, the outcomes are usually biased in the direction of no effect, or the null hypothesis, making it more likely that no difference will be found between groups with respect to the studied outcomes.
Differential misclassification of interventions
Differential misclassification of interventions occurs when the status of the intervention is misclassified due to a non-random differential pattern between participants. Unlike non-differential classification where there is an equal distribution of errors between groups, with differential classification, one group has a greater proportion of errors.
It is the actions of the participant or the researchers (if they incorrectly and disproportionately place participants into groups) that directly cause this misclassification. Most commonly, differential misclassification occurs in retrospective studies through recall bias, as described above. In this case, outcomes can be biased toward or away from the null depending on the proportions of participants that were misclassified.
Examples of classification bias
Consider the following hypothetical example that illustrates bias in classification of interventions and the two subtypes of misclassification bias:
Researchers examined if routinely eating a high-sugar diet over the past 10 years was associated with obesity. Over 1,000 participants, 500 with obesity and 500 without obesity, were studied. Participants with obesity may have remembered eating high-sugar foods more regularly than those without obesity. Perhaps they were more conscious of foods that were higher versus lower in sugar because they were more aware of the social perception that obesity is linked to eating a high-sugar diet.
In contrast, the participants without obesity were less likely to remember eating a high-sugar diet regularly since their BMI was normal and thus they were less concerned about the sugar content in the food they consumed. Hence, the normal weight participants may have underestimated the amount of foods high in sugar they consumed over the past 10 years. In this circumstance, differential misclassification occurred because the participants without obesity incorrectly recalled the amount of sugar they consumed, and the classification of whether the participant ate a high-sugar diet depended on the outcome (obesity). In other words, those with obesity recalled the amount of sugar they consumed more accurately because they were obese.
In the same study, the researchers examined whether there was a relation between the consumption of anxiolytic medications and obesity over the previous 10 years. Neither the participants with nor without obesity perceived that anxiolytics were related to obesity.
Both groups found it difficult to remember the amount of anxiolytics they consumed and incorrectly estimated these numbers. Thus, both groups had a recall bias. In this instance, the lack of an accurate estimation is random. Both groups recalled information with some errors and thus the misclassification is non-differential because it is not dependent upon the exposure or outcome.
Final thoughts
The layers that compose this type of bias can be challenging to comprehend. The critical takeaway is that research participants may poorly recall associations that are investigated in a study. If there is a systematic difference in the way these associations are recalled between study groups that is related to the exposure or outcome such that the groups are misclassified, a bias may result that deviates the outcome away from its true estimate.
Also read:
Explainer: What is Selective Reporting Bias?
Most Influential Research Square Preprints of All Time
Evidence-based Medicine, Meet the Preprint
Protocol Highlight: Modeling Gastrulation in Human Embryonic Development