← Forky AI Blog

AI calorie counter accuracy: 30 plates, three apps, USDA ground truth

We photographed 30 standard plated meals, scored each one with Forky AI, Cal AI, and Snap Calorie, and compared the results against USDA-derived ground-truth values. Per-component vision wins on plates with toppings; single-photo estimators tie on simple foods. The raw spreadsheet is published below.

By Elie de Rougemont, Founder of Forky AI · 11 min read · published

There is no public benchmark for how accurate AI calorie counters actually are. Every vendor reports their own number on their own dataset, and the numbers are almost certainly cherry-picked. We built our own dataset, hit three apps with the same photos under the same conditions, and published the spreadsheet. The headline: single-photo vision drifts ±20–25% on plated meals; per-component decomposition lands ±10–15%; on simple foods (a single apple, a glass of milk) every app is equivalent.

Why this benchmark exists

Every AI calorie counter on the market makes accuracy claims that sound the same and prove nothing. "±20% on standard meals." "Within 10% of USDA values." "Industry-leading vision accuracy." None of these come with a methodology, a test set, or a re-runnable harness. As the team shipping one of those apps, we wanted to stop bluffing and write down what we actually see.

So we built a 30-plate test set, photographed each one twice (overhead and 45°) on the same iPhone 15 Pro under the same kitchen light, fed each photo to Forky AI, Cal AI, and Snap Calorie inside their consumer apps, and compared the returned calorie + macro values to a USDA-derived ground truth that we computed by weighing every component on a 0.1g kitchen scale before plating.

Method, in 10 bullets

Headline numbers

Across 30 plates, median absolute percentage error on calories was:

AppMedian APEP95 APEPlates ±20%
Forky AI12.8%27.4%26 / 30
Cal AI22.1%48.0%18 / 30
Snap Calorie24.3%52.6%16 / 30

"APE" = absolute percentage error vs USDA ground truth. "Plates ±20%" = number of plates where the app's calorie estimate landed within ±20% of ground truth — the threshold most AI macro trackeres use for "good enough" daily tracking.

What the per-plate data actually says

Simple plates: everyone wins

On the 8 plates we labelled "simple" — a single piece of fruit, a glass of milk, a plain omelette, a bowl of plain oatmeal — all three apps landed within ±10% of ground truth. The models recognise the food, recall its per-100g macros from training data, and estimate the portion within reason. There is no meaningful product differentiation here.

Composed plates with hidden components: per-component wins

On the 16 plates we labelled "composed" — pasta with cream sauce, grain bowls with multiple toppings, sandwiches with fillings — the gap opens dramatically. Forky's median APE was 11.4%; Cal AI's was 27.8%; Snap Calorie's was 30.2%. The reason is the same in almost every error: single-photo estimators silently drop a component. Cream sauce on pasta. Drizzled oil on a salad. Cheese melted into a bowl. The model acknowledges the component if you ask, but it does not price it into its total.

"Forky said 870 cal for the pasta carbonara, Cal AI said 520, ground truth was 905. Cal AI had pancetta and cheese in its written description and didn't add them to the calorie number." — plate 14, test log

Dishes outside training distribution: everyone struggles

On the 6 plates we labelled "uncommon" — a Korean banchan platter with five small dishes, a Lebanese mezze plate, a French-style cheese-and-charcuterie board — all three apps drifted ±30%+. Forky was still the median best (28.4%), but no app cracked ±20% on more than two of these six plates. Vision models inherit the distribution of their training data; food that wasn't well-represented at training time gets less reliable extraction regardless of the prompting strategy.

Macro-by-macro breakdown

Calories aggregate four sub-quantities, so a macro-by-macro view tells you where each app loses accuracy:

Median APEForky AICal AISnap Calorie
Calories12.8%22.1%24.3%
Protein14.6%26.9%29.1%
Carbs15.2%23.7%25.1%
Fat18.0%34.2%37.8%

Fat is the macro every app loses on the most, and the gap between Forky and the other two widens. Fat is concentrated in toppings, sauces, cheese, oil drizzles — the exact components a per-component prompt is designed to catch and a single-photo estimator drops. Protein accuracy is also a function of weight-estimation quality on the protein component (chicken, salmon, tofu), which Forky's per-component grams field surfaces explicitly.

Intra-day variance: how stable are these numbers?

Re-running the same photo file 7 days later, all three apps showed run-to-run variance — the models are stochastic and the API versions roll forward. Median delta between runs:

Practically, this means none of these apps is a deterministic measurement device — they're statistical estimators, and the day-to-day noise is real. For a single meal, treat the number as ±10–25% depending on app and complexity; for a weekly macro average across 21 logged meals, the noise washes down to single-digit percentages because the errors are approximately mean-zero.

What this benchmark doesn't measure

Three big caveats, in service of intellectual honesty:

So which app should you use?

If you're tracking macros for general health or recomposition and your plates are mostly composed home cooking, the accuracy delta in this benchmark is large enough to matter — 12.8% vs 22–24% median APE is the difference between trusting your weekly average and not. If you eat mostly simple foods (a yogurt here, an apple there, a barcode-scannable packaged bar), every app converges. Pick the one whose UX you like.

This is also why we ship Forky's full per-component breakdown by default. You can see the chicken-salmon-edamame rows, you can correct any one of them in two taps, and the totals recompute. A wrong number you can fix is better than a right-ish number you can't.

The spreadsheet, in full

Plate-by-plate raw data, including ground-truth gram weights, USDA per-100g lookups, and each app's returned calories/protein/carbs/fat per plate, is available as a CSV. Email [email protected] and we'll send a copy. If you re-run the benchmark on your own dataset, send results back — we'll publish them here with attribution.

Related reading: