Gwen for developers
Production-grade scaffolding for healthcare AI.
Hardened, benchmarked, and HIPAA-aware. Built for engineers who need to ship on a payer ops floor, a provider ops floor, and the friction in between.
The healthcare AI ecosystem
Discover what's good. Build on what ships.
One place for healthcare developers to find evaluated capabilities, build on top of what others ship, and avoid rebuilding what already exists.
Cognitive services
16 healthcare endpoints, benchmarked per task
HCC, ICD, CDI, chart audit, prior auth, claims, appeals, OCR, PII — every service ships with a model card and per-task benchmark scores, so you know what you're building on.
- 16 services
- Model cards
- Per-task benchmarks
Reference data
Versioned coding and policy datasets
5,300+ records across 5 subject areas — coding vintages, adjustment reason codes, policy references. Sourced, versioned, downloadable.
- 5,300+ records
- 5 subject areas
- CSV
Skills
Instruction modules your builds already understand
94 healthcare skills — each one packages the hardened SOP, the domain rules, and the output expectations. Compose them in the Studio; no re-explaining each session.
- 94 modules
- Payer / Provider / Platform
- Composable
The problem
Frontier LLMs alone do not ship on a healthcare ops floor.
Two gaps slow every initiative: long pilots, brittle prototypes, AI that never reaches production-grade trust. Gwen closes both.
01 / translation gap
Engineering can't reach SME depth.
- IT does not know, at SME depth, what actually needs to be built.
- SMEs (claims, payment integrity, UM, quality) cannot hand work over in engineering-ready form. Specs travel through decks, screenshots, and tribal knowledge.
- Edge-case logic — CARC/RARC handling, NCCI quirks, payer companion guides — lives in a senior reviewer's head, not in code.
02 / feedback gap
Models drift the moment policy moves.
- Even after something ships, IT has no good way to surface edge cases or keep the AI improving.
- No golden-dataset rigor. No cost / accuracy / latency tracking per workflow. No way to compare model choices side-by-side.
- When coding vintages and payer policies change, the AI quietly drifts out of date.
// net effect: long pilots, brittle prototypes, ai that never reaches production-grade trust.
The Gwen answer
Battle-tested endpoints, composed into working software.
Healthcare SMEs, AI, and engineering co-author every layer. Endpoints become skills. Skills become apps. Apps publish back to the marketplace — so the next team starts where you finished.
Endpoints — the atomic unit
Battle-tested cognitive services, SME-authored and benchmarked. Call them directly or let generated apps consume them.
Skills — the playbooks
Versioned instruction modules — SOP, domain rules, evidence requirements — that tell any build how to use the endpoints correctly.
Datasets — the ground truth
Versioned reference data behind every decision: coding vintages, reason-code ontologies, policy references.
Studio — the workbench
Text-prompted builder. Describe a workflow; Gwen composes the skills and endpoints into a runnable, full-stack app.
Agents — close the loop
Desktop and browser agents carry outputs into EHRs, payer portals, and queues — execution, not just recommendation.
External API — your stack, our services
Key-authenticated programmatic access to every cognitive service, with rate limits and per-key cost controls.
Build your own
What isn't in the catalog, you build once.
Describe a pipeline in the Studio and Gwen composes the existing endpoints, skills, and datasets into a running full-stack app. Publish it back to the marketplace — public or private to your team — and nobody rebuilds it again.
- Text-prompted full-stack builds with live preview
- 22 verified apps to fork instead of starting blank
- Publish to the marketplace: private to your team or public
# every cognitive service, one key
curl https://gwen.penguinai.co/api/external/v1/icd/code \
-H "Authorization: Bearer hvb_live_…" \
-H "Content-Type: application/json" \
-d '{"text": "…clinical documentation…"}'
# services: icd · hcc · cdi · prior-auth · claims · appeals · ocr · pii …
# keys: contact your Gwen admin to provision a key
Trust + deployment
Built to be audited.
How Gwen runs, and how your compliance team verifies every step.
- Per-task benchmark scores on every cognitive service
- Evidence spans back to the source document
- Structured audit logging with request tracing
- Per-key rate limits and cost ceilings on the external API
- Human checkpoints where confidence is low
22
Apps in the marketplace
16
Cognitive services, benchmarked
94
Skills in the library
5,300+
Reference data records
Final
Start where healthcare already finished.
Pick a service, wire your data, ship a workflow — and publish what you build so nobody builds it twice.