Chart Audit
Documentation QualityAsync jobs requiredIndependent re-code of the chart, compared line-by-line against submitted codes.
About
Audits coding accuracy by independently re-coding the chart and diffing the result against the codes that were actually submitted. Each discrepancy is classified — confirmed, overcoded, undercoded, specificity mismatch, or combination error — with a rationale, supporting evidence, and a financial-impact rating.
The output doubles as a compliance scorecard: overall accuracy plus counts per finding type, suitable for auditor worklists and provider education feedback loops.
How it works
- 1Document upload + submitted_codes JSON → OCR (skipped for the /text variant)
- 2Pass 1 — independent code extraction with MEAT evidence and confidence
- 3Pass 2 — audit comparison against submitted codes + compliance scoring
Intended use
- •Coding compliance and audit apps (overcoding/undercoding detection)
- •Pre-bill second-pass review before claim submission
- •Provider education: showing the evidence behind each disputed code
Key outputs
- ▸audit_findings[] — finding_type, submitted vs recommended code, rationale, evidence with bboxes, financial_impact
- ▸compliance_score — overall_accuracy plus per-type counts (confirmed/overcoded/undercoded/specificity/combination)
- ▸independent_codes[] — the auditor's own code set with MEAT evidence and confidence
Model comparison
F1 on Gwen's healthcare benchmark for this task — the Gwen pipeline vs the prompt-optimized model alone, with the uplift the pipeline adds, per model.
| # | Model | Gwen pipeline | Model only | Uplift |
|---|---|---|---|---|
| 1 | GPT-5.5Best | 0.952 | 0.877 | +0.075 |
| 2 | Claude Opus 4.8 | 0.951 | 0.913 | +0.038 |
| 3 | Gemini 3.5 Flash | 0.948 | 0.884 | +0.064 |
Endpoints
Try each endpoint with your signed-in session — usage counts toward your monthly budget.
Use synthetic data only. Do not submit real patient records or PHI when testing endpoints.
Limitations & caveats
- –submitted_codes is required — this service audits a code set, it is not a primary coder (use ICD/HCC services for that)
- –Findings are documentation-grounded recommendations, not payer determinations
- –Runs 1–3 minutes; the async /jobs flow is mandatory for document uploads