ClinicianDeveloper

Gwen for developers

Production-grade scaffolding for healthcare AI.

Hardened, benchmarked, and HIPAA-aware. Built for engineers who need to ship on a payer ops floor, a provider ops floor, and the friction in between.

Cognitive servicesDatasetsSkillsStudioAgentsAPI
BAA availableHIPAA-eligibleBenchmarked servicesReplayable audit trail

The healthcare AI ecosystem

Discover what's good. Build on what ships.

One place for healthcare developers to find evaluated capabilities, build on top of what others ship, and avoid rebuilding what already exists.

Cognitive services

16 healthcare endpoints, benchmarked per task

HCC, ICD, CDI, chart audit, prior auth, claims, appeals, OCR, PII — every service ships with a model card and per-task benchmark scores, so you know what you're building on.

  • 16 services
  • Model cards
  • Per-task benchmarks

Reference data

Versioned coding and policy datasets

5,300+ records across 5 subject areas — coding vintages, adjustment reason codes, policy references. Sourced, versioned, downloadable.

  • 5,300+ records
  • 5 subject areas
  • CSV

Skills

Instruction modules your builds already understand

94 healthcare skills — each one packages the hardened SOP, the domain rules, and the output expectations. Compose them in the Studio; no re-explaining each session.

  • 94 modules
  • Payer / Provider / Platform
  • Composable

The problem

Frontier LLMs alone do not ship on a healthcare ops floor.

Two gaps slow every initiative: long pilots, brittle prototypes, AI that never reaches production-grade trust. Gwen closes both.

01 / translation gap

Engineering can't reach SME depth.

  • IT does not know, at SME depth, what actually needs to be built.
  • SMEs (claims, payment integrity, UM, quality) cannot hand work over in engineering-ready form. Specs travel through decks, screenshots, and tribal knowledge.
  • Edge-case logic — CARC/RARC handling, NCCI quirks, payer companion guides — lives in a senior reviewer's head, not in code.

02 / feedback gap

Models drift the moment policy moves.

  • Even after something ships, IT has no good way to surface edge cases or keep the AI improving.
  • No golden-dataset rigor. No cost / accuracy / latency tracking per workflow. No way to compare model choices side-by-side.
  • When coding vintages and payer policies change, the AI quietly drifts out of date.

// net effect: long pilots, brittle prototypes, ai that never reaches production-grade trust.

The Gwen answer

Battle-tested endpoints, composed into working software.

Healthcare SMEs, AI, and engineering co-author every layer. Endpoints become skills. Skills become apps. Apps publish back to the marketplace — so the next team starts where you finished.

Endpoints the atomic unit

Battle-tested cognitive services, SME-authored and benchmarked. Call them directly or let generated apps consume them.

Skills the playbooks

Versioned instruction modules — SOP, domain rules, evidence requirements — that tell any build how to use the endpoints correctly.

Datasets the ground truth

Versioned reference data behind every decision: coding vintages, reason-code ontologies, policy references.

Studio the workbench

Text-prompted builder. Describe a workflow; Gwen composes the skills and endpoints into a runnable, full-stack app.

Agents close the loop

Desktop and browser agents carry outputs into EHRs, payer portals, and queues — execution, not just recommendation.

External API your stack, our services

Key-authenticated programmatic access to every cognitive service, with rate limits and per-key cost controls.

Build your own

What isn't in the catalog, you build once.

Describe a pipeline in the Studio and Gwen composes the existing endpoints, skills, and datasets into a running full-stack app. Publish it back to the marketplace — public or private to your team — and nobody rebuilds it again.

  • Text-prompted full-stack builds with live preview
  • 22 verified apps to fork instead of starting blank
  • Publish to the marketplace: private to your team or public

# every cognitive service, one key

curl https://gwen.penguinai.co/api/external/v1/icd/code \

-H "Authorization: Bearer hvb_live_…" \

-H "Content-Type: application/json" \

-d '{"text": "…clinical documentation…"}'

# services: icd · hcc · cdi · prior-auth · claims · appeals · ocr · pii …

# keys: contact your Gwen admin to provision a key

Trust + deployment

Built to be audited.

How Gwen runs, and how your compliance team verifies every step.

  • Per-task benchmark scores on every cognitive service
  • Evidence spans back to the source document
  • Structured audit logging with request tracing
  • Per-key rate limits and cost ceilings on the external API
  • Human checkpoints where confidence is low

22

Apps in the marketplace

16

Cognitive services, benchmarked

94

Skills in the library

5,300+

Reference data records

Final

Start where healthcare already finished.

Pick a service, wire your data, ship a workflow — and publish what you build so nobody builds it twice.