Skip to main content
Supplement ScienceSupplementScience

Evidence Grades Explained: How the Strength of Evidence Is Rated

This content is for informational purposes only and does not constitute medical advice. Statements about dietary supplements have not been evaluated by the FDA and are not intended to diagnose, treat, cure, or prevent any disease. Individual results may vary — consult your healthcare provider before starting any supplement. Full disclaimer

An evidence grade summarizes how much confidence we can place in a finding, based on study design, consistency, and...

An evidence grade summarizes how much confidence we can place in a finding, based on study design, consistency, and size. Stronger grades come from large, well-conducted randomized trials and systematic reviews; weaker grades from small, short, or laboratory studies. On this site we label evidence as Strong, Moderate, Emerging, Preliminary, or Insufficient.

Key Takeaways

  • An evidence grade answers one question: how confident can we be that an effect is real?
  • Study designs form a ladder — test-tube and animal work at the bottom, randomized trials and systematic reviews near the top.
  • GRADE, the most widely used system, rates the certainty of evidence as high, moderate, low, or very low.
  • Bias, small samples, short duration, and inconsistent results all push a grade down, even for randomized trials.
  • This site summarizes evidence as Strong, Moderate, Emerging, Preliminary, or Insufficient.

Get the free evidence-based Evidence Grades Explained: How the Strength of Evidence Is Rated guide — delivered in 60 seconds.

No spam. Unsubscribe anytime.

Why Evidence Needs Grading

Not all studies carry the same weight. A finding from a single small test-tube experiment tells you far less than the same finding repeated across several large trials in people. An evidence grade is shorthand for one question: *how confident can we be that this effect is real?*

The Evidence Hierarchy

Research designs form a rough ladder, from weakest to strongest:

  • Test-tube (in-vitro) and animal studies — useful for generating ideas, but results often don't carry over to humans.
  • Observational studies — track people over time and reveal associations, but can't establish cause and effect (see [Observational Studies vs RCTs](/learn/observational-vs-rct)).
  • Randomized controlled trials (RCTs) — randomly assign people to an intervention or a comparison, which is what allows a true cause-and-effect test (see [What Is an RCT?](/learn/what-is-an-rct)).
  • Systematic reviews and meta-analyses — pool many studies using transparent methods and sit at the top when the underlying trials are sound (see [What Is a Meta-Analysis?](/learn/what-is-a-meta-analysis)).

Formal Grading Systems

Researchers use structured systems so grading isn't just opinion. The most widely used is GRADE, which rates the *certainty* of evidence as high, moderate, low, or very low, and separately rates how strong any resulting recommendation is [1]. Other bodies use letter grades (A, B, C, D) for their recommendations. The common thread: the grade reflects not just whether studies are positive, but how trustworthy they are.

What Pushes a Grade Down

Even randomized trials lose credibility when they have:

  • High risk of bias — poor blinding, missing data, or funding arrangements that shape the outcome.
  • Small samples or short duration — easy to produce a fluke result.
  • Inconsistency — studies pointing in different directions.
  • Indirectness — testing a different dose, form, or population than the one you care about.

The U.S. government's NCCIH publishes plain-language guides on reading research with these pitfalls in mind [2].

How This Site Labels Evidence

To keep things readable, we summarize the body of evidence behind each ingredient with five labels:

LabelRoughly means
**Strong**Consistent support from multiple high-quality human trials or systematic reviews
**Moderate**Good human evidence, with some gaps or mixed results
**Emerging**Early human studies look promising but aren't yet confirmed
**Preliminary**Limited, small, or mostly non-human evidence
**Insufficient**Too little reliable evidence to judge

These labels are a starting point for your own reading — not a substitute for talking with a qualified health professional.

Frequently Asked Questions

What is the strongest type of evidence?

Generally, a systematic review or meta-analysis that pools several well-conducted randomized controlled trials in humans. A single large, well-run trial is also strong. Test-tube and animal studies are the weakest basis for conclusions about people, even when the results look exciting.

Does one positive study mean a supplement works?

Rarely. Single studies can be small, short, or chance findings, and they sometimes fail to hold up when repeated. Confidence grows when independent research groups reach the same conclusion using solid methods.

Why do some ingredients only earn an 'Emerging' or 'Preliminary' grade?

Because the human evidence is still thin — perhaps a few small studies, or mostly laboratory and animal work. An Emerging or Preliminary label is an honest signal that the science isn't settled, not a forecast of future benefit.

Can an evidence grade change over time?

Yes. Grades reflect the research available today. As new, higher-quality studies are published, the strength of evidence for an ingredient can move up or down, which is why we date and periodically revisit our pages.

References

  1. GRADE Working Group (2024). GRADE: Grading of Recommendations Assessment, Development and Evaluation. GRADE Working Group.
  2. National Center for Complementary and Integrative Health (NCCIH) (2024). Know the Science. NIH National Center for Complementary and Integrative Health.