Chart Audit

Documentation QualityAsync jobs required

Independent re-code of the chart, compared line-by-line against submitted codes.

Chart auditComplianceOvercodingUndercoding

About

Audits coding accuracy by independently re-coding the chart and diffing the result against the codes that were actually submitted. Each discrepancy is classified — confirmed, overcoded, undercoded, specificity mismatch, or combination error — with a rationale, supporting evidence, and a financial-impact rating.

The output doubles as a compliance scorecard: overall accuracy plus counts per finding type, suitable for auditor worklists and provider education feedback loops.

How it works

  1. 1Document upload + submitted_codes JSON → OCR (skipped for the /text variant)
  2. 2Pass 1 — independent code extraction with MEAT evidence and confidence
  3. 3Pass 2 — audit comparison against submitted codes + compliance scoring

Intended use

  • Coding compliance and audit apps (overcoding/undercoding detection)
  • Pre-bill second-pass review before claim submission
  • Provider education: showing the evidence behind each disputed code

Key outputs

  • audit_findings[] — finding_type, submitted vs recommended code, rationale, evidence with bboxes, financial_impact
  • compliance_score — overall_accuracy plus per-type counts (confirmed/overcoded/undercoded/specificity/combination)
  • independent_codes[] — the auditor's own code set with MEAT evidence and confidence

Model comparison

F1 on Gwen's healthcare benchmark for this task — the Gwen pipeline vs the prompt-optimized model alone, with the uplift the pipeline adds, per model.

#ModelGwen pipelineModel onlyUplift
1
GPT-5.5Best
0.952
0.877+0.075
2
Claude Opus 4.8
0.951
0.913+0.038
3
Gemini 3.5 Flash
0.948
0.884+0.064

Endpoints

Try each endpoint with your signed-in session — usage counts toward your monthly budget.

Use synthetic data only. Do not submit real patient records or PHI when testing endpoints.

Limitations & caveats

  • submitted_codes is required — this service audits a code set, it is not a primary coder (use ICD/HCC services for that)
  • Findings are documentation-grounded recommendations, not payer determinations
  • Runs 1–3 minutes; the async /jobs flow is mandatory for document uploads