Gwen for developers

Production-grade scaffolding for healthcare AI.

Hardened, benchmarked, and HIPAA-aware. Built for engineers who need to ship on a payer ops floor, a provider ops floor, and the friction in between.

Cognitive servicesDatasetsSkillsStudioAgentsAPI

Get started Read the docs

BAA availableHIPAA-eligibleBenchmarked servicesReplayable audit trail

The healthcare AI ecosystem

Discover what's good. Build on what ships.

One place for healthcare developers to find evaluated capabilities, build on top of what others ship, and avoid rebuilding what already exists.

Cognitive services

16 healthcare endpoints, benchmarked per task

HCC, ICD, CDI, chart audit, prior auth, claims, appeals, OCR, PII — every service ships with a model card and per-task benchmark scores, so you know what you're building on.

16 services
Model cards
Per-task benchmarks

Explore

Reference data

Versioned coding and policy datasets

5,300+ records across 5 subject areas — coding vintages, adjustment reason codes, policy references. Sourced, versioned, downloadable.

5,300+ records
5 subject areas
CSV

Explore

Skills

Instruction modules your builds already understand

94 healthcare skills — each one packages the hardened SOP, the domain rules, and the output expectations. Compose them in the Studio; no re-explaining each session.

94 modules
Payer / Provider / Platform
Composable

Explore

The problem

Frontier LLMs alone do not ship on a healthcare ops floor.

Two gaps slow every initiative: long pilots, brittle prototypes, AI that never reaches production-grade trust. Gwen closes both.

01 / translation gap

Engineering can't reach SME depth.

IT does not know, at SME depth, what actually needs to be built.
SMEs (claims, payment integrity, UM, quality) cannot hand work over in engineering-ready form. Specs travel through decks, screenshots, and tribal knowledge.
Edge-case logic — CARC/RARC handling, NCCI quirks, payer companion guides — lives in a senior reviewer's head, not in code.

02 / feedback gap

Models drift the moment policy moves.

Even after something ships, IT has no good way to surface edge cases or keep the AI improving.
No golden-dataset rigor. No cost / accuracy / latency tracking per workflow. No way to compare model choices side-by-side.
When coding vintages and payer policies change, the AI quietly drifts out of date.

// net effect: long pilots, brittle prototypes, ai that never reaches production-grade trust.

The Gwen answer

Battle-tested endpoints, composed into working software.

Healthcare SMEs, AI, and engineering co-author every layer. Endpoints become skills. Skills become apps. Apps publish back to the marketplace — so the next team starts where you finished.

Endpoints — the atomic unit

Battle-tested cognitive services, SME-authored and benchmarked. Call them directly or let generated apps consume them.

Skills — the playbooks

Versioned instruction modules — SOP, domain rules, evidence requirements — that tell any build how to use the endpoints correctly.

Datasets — the ground truth

Versioned reference data behind every decision: coding vintages, reason-code ontologies, policy references.

Studio — the workbench

Text-prompted builder. Describe a workflow; Gwen composes the skills and endpoints into a runnable, full-stack app.

Agents — close the loop

Desktop and browser agents carry outputs into EHRs, payer portals, and queues — execution, not just recommendation.

External API — your stack, our services

Key-authenticated programmatic access to every cognitive service, with rate limits and per-key cost controls.

Build your own

What isn't in the catalog, you build once.

Describe a pipeline in the Studio and Gwen composes the existing endpoints, skills, and datasets into a running full-stack app. Publish it back to the marketplace — public or private to your team — and nobody rebuilds it again.

Text-prompted full-stack builds with live preview
22 verified apps to fork instead of starting blank
Publish to the marketplace: private to your team or public

Open the Studio

# every cognitive service, one key

curl https://gwen.penguinai.co/api/external/v1/icd/code \

-H "Authorization: Bearer hvb_live_…" \

-H "Content-Type: application/json" \

-d '{"text": "…clinical documentation…"}'

# services: icd · hcc · cdi · prior-auth · claims · appeals · ocr · pii …

# keys: contact your Gwen admin to provision a key

Trust + deployment

Built to be audited.

How Gwen runs, and how your compliance team verifies every step.

Per-task benchmark scores on every cognitive service
Evidence spans back to the source document
Structured audit logging with request tracing
Per-key rate limits and cost ceilings on the external API
Human checkpoints where confidence is low

Apps in the marketplace

Cognitive services, benchmarked

Skills in the library

5,300+

Reference data records

Final

Start where healthcare already finished.

Pick a service, wire your data, ship a workflow — and publish what you build so nobody builds it twice.

Get started Read the docs