Why Evidence Needs Grading
Not all studies carry the same weight. A finding from a single small test-tube experiment tells you far less than the same finding repeated across several large trials in people. An evidence grade is shorthand for one question: *how confident can we be that this effect is real?*
The Evidence Hierarchy
Research designs form a rough ladder, from weakest to strongest:
- Test-tube (in-vitro) and animal studies — useful for generating ideas, but results often don't carry over to humans.
- Observational studies — track people over time and reveal associations, but can't establish cause and effect (see [Observational Studies vs RCTs](/learn/observational-vs-rct)).
- Randomized controlled trials (RCTs) — randomly assign people to an intervention or a comparison, which is what allows a true cause-and-effect test (see [What Is an RCT?](/learn/what-is-an-rct)).
- Systematic reviews and meta-analyses — pool many studies using transparent methods and sit at the top when the underlying trials are sound (see [What Is a Meta-Analysis?](/learn/what-is-a-meta-analysis)).
Formal Grading Systems
Researchers use structured systems so grading isn't just opinion. The most widely used is GRADE, which rates the *certainty* of evidence as high, moderate, low, or very low, and separately rates how strong any resulting recommendation is [1]. Other bodies use letter grades (A, B, C, D) for their recommendations. The common thread: the grade reflects not just whether studies are positive, but how trustworthy they are.
What Pushes a Grade Down
Even randomized trials lose credibility when they have:
- High risk of bias — poor blinding, missing data, or funding arrangements that shape the outcome.
- Small samples or short duration — easy to produce a fluke result.
- Inconsistency — studies pointing in different directions.
- Indirectness — testing a different dose, form, or population than the one you care about.
The U.S. government's NCCIH publishes plain-language guides on reading research with these pitfalls in mind [2].
How This Site Labels Evidence
To keep things readable, we summarize the body of evidence behind each ingredient with five labels:
| Label | Roughly means |
|---|---|
| **Strong** | Consistent support from multiple high-quality human trials or systematic reviews |
| **Moderate** | Good human evidence, with some gaps or mixed results |
| **Emerging** | Early human studies look promising but aren't yet confirmed |
| **Preliminary** | Limited, small, or mostly non-human evidence |
| **Insufficient** | Too little reliable evidence to judge |
These labels are a starting point for your own reading — not a substitute for talking with a qualified health professional.