Identity Resilience
Country pluggability, per-provider circuit breakers, and live provider health for the Identity Resolution Service.
The Identity Resolution Service is built to be the most reliable identity layer in the markets PlatformXe enters. Phase 6F.5 ships three pieces that make that real:
- Country pluggability — adding a new country = drop a plugin file. Identifier patterns, provider chain, and capabilities are all per-country.
- Per-(country, provider) circuit breakers — a flaky provider quarantines without dragging down the rest of the chain or other countries.
- Provider health endpoint — single rollup that ops dashboards and external status pages can poll.
This isn't theoretical hardening. The same circuit-breaker primitive backs the email service in production today; we lifted it onto identity providers exactly because we know the pattern works.
Country plugins
Every country PlatformXe supports is a single file in src/lib/services/identity/countries/<iso>.ts (internal). Each plugin declares:
- Identifier types with their patterns, lengths, and optional normalisers (e.g.
BVN: /^\d{11}$/with whitespace stripping). - Provider chain in priority order — the resolver tries them top-down until one returns a healthy result.
- Capabilities — which KYC operations the country supports (
bvn,nin,liveness,faceMatch,accountVerify, plus the secondary identifier types). - Bank codes for account-name verification (NIBSS sort codes in NG; equivalents in other countries).
Currently supported
| Code | Country | Identifier types | Capabilities |
|---|---|---|---|
NG | Nigeria | BVN, NIN, TIN, PASSPORT, PHONE, VOTER_ID, DRIVERS_LICENSE | BVN, NIN, liveness, face-match, account |
KE | Kenya | NIN (Huduma), TIN (KRA PIN), PASSPORT, PHONE, DRIVERS_LICENSE | NIN, liveness, face-match |
GH | Ghana | NIN (Ghana Card), TIN, PASSPORT, PHONE, VOTER_ID, DRIVERS_LICENSE | NIN, liveness, face-match, account |
ZA | South Africa | NIN (SAID), TIN (SARS), PASSPORT, PHONE, DRIVERS_LICENSE | NIN, liveness, face-match, account |
The canonical NIN key maps to each country's primary national identifier, so cross-country code (lookupByNIN, etc.) keeps working without per-country branching. BVN stays Nigeria-specific.
Adding a country
A new country follows this shape:
// src/lib/services/identity/countries/ke.ts
import { registerCountryPlugin } from './registry';
registerCountryPlugin({
code: 'KE',
name: 'Kenya',
identifierTypes: {
NATIONAL_ID: { pattern: /^\d{8}$/, length: 8, kind: 'numeric', label: 'National ID' },
KRA_PIN: { pattern: /^[AP]\d{9}[A-Z]$/i, kind: 'alphanumeric', label: 'KRA PIN' },
PHONE: { pattern: /^\+?254\d{9}$|^0\d{9}$/, kind: 'phone', label: 'Phone Number' },
PASSPORT: { pattern: /^[A-Z0-9]{6,12}$/i, kind: 'alphanumeric', label: 'Passport' },
},
providerChain: ['smile-id', 'iprs', 'mock'],
capabilities: { liveness: true, faceMatch: true, accountVerify: true },
});
Then a single import in countries/index.ts activates it. No changes to the resolver, route handlers, or audit trail.
Circuit breakers
Each (country, provider) pair gets its own circuit breaker — Nigeria's primary provider hiccup never touches Kenya's, and one provider going down inside a country fails over to the next without cascading.
| Default | Value |
|---|---|
| Failure threshold | 3 consecutive failures → OPEN |
| Recovery window | 5 minutes (OPEN → HALF_OPEN for one probe) |
| Probe success | HALF_OPEN → CLOSED |
| Probe failure | HALF_OPEN → OPEN (window resets) |
What "open" looks like to callers
When the breaker is OPEN, lookups don't hit the upstream — they return { found: false } so the chain advances cleanly. The caller never sees a 5xx; they see the same shape they'd see from any provider that didn't have data.
isHealthy() on a wrapped provider returns false whenever the breaker is OPEN, even if the upstream's own health check claims healthy. The breaker is the source of truth for "should we send traffic here right now."
Provider health endpoint
GET /api/v1/identity/providers/health
| Property | Value |
|---|---|
| Scope | identity:read |
| Plan gate | Detection Pack addon |
| Rate limit | 60 / hour per API key (intentionally low — designed for dashboard polling, not request-time decisions) |
Response
{
"success": true,
"data": {
"overall": {
"status": "healthy",
"totalCountries": 1,
"totalProviders": 3,
"openProviders": 0
},
"countries": [
{
"country": "NG",
"countryName": "Nigeria",
"status": "healthy",
"providers": [
{
"provider": "smile-id",
"state": "CLOSED",
"consecutiveFailures": 0,
"openedAt": null,
"latency": { "count": 218, "p50": 312.4, "p95": 821.9, "p99": 1240.0 }
},
{
"provider": "nimc",
"state": "CLOSED",
"consecutiveFailures": 0,
"openedAt": null,
"latency": { "count": 47, "p50": 480.0, "p95": 1100.0, "p99": 1380.0 }
},
{
"provider": "dojah",
"state": "CLOSED",
"consecutiveFailures": 0,
"openedAt": null,
"latency": { "count": 0, "p50": null, "p95": null, "p99": null }
}
]
}
],
"generatedAt": "2026-05-03T19:00:00Z"
}
}
Status semantics
| Status | Country meaning | Overall meaning |
|---|---|---|
healthy | Every breaker CLOSED with zero recorded failures. | Every country is healthy. |
degraded | At least one provider HALF_OPEN, OPEN, or CLOSED with prior failures. | Any country degraded. |
down | Every provider in the country is OPEN. | Every country is down. |
A country with no traffic yet is healthy (no signal to read). Unknown countries (breaker state present without a registered plugin) are surfaced for observability rather than hidden — useful when a plugin is removed mid-deploy.
Per CLAUDE.md provider policy
The response carries internal provider labels only — never the underlying vendor name. The label format is the slug used inside the country plugin's providerChain (e.g. smile-id, nimc). Tenants aiming for a customer-facing status page should map these to human-friendly aliases.
Provider latency
Each (country, provider) pair maintains a rolling window of the last 200 latency samples (in-memory, per-process). The window is fed automatically — every dispatched lookup contributes one sample. Calls that the breaker short-circuits are NOT recorded, so the window only reflects real upstream behaviour.
The provider health endpoint surfaces count, p50, p95, and p99 per provider. Empty windows (no samples yet) return null for every percentile so dashboards can distinguish "fast" from "no signal."
The hedging primitive consumes p95 to size hedge delays — see the Request hedging section.
Dead-letter queue
When the entire chain returns not found (or every provider's breaker is open), the verification still surfaces provider_down to the caller. Behind the scenes the orchestrator persists the request to identity_verification_dlq so operators can replay it later — once procurement adds another provider, a regional outage clears, or an upstream finally cooperates.
Writes are best-effort: a DLQ insert failure is logged, never raised to the caller. The user-visible result remains authoritative regardless.
Endpoints
| Method | Path | Scope | Description |
|---|---|---|---|
GET | /api/v1/identity/dlq | identity:read | List DLQ rows with optional filters (unreplayedOnly, kind, subjectId, countryCode). |
GET | /api/v1/identity/dlq/stats | identity:read | Counts of unreplayed rows, broken down by kind. |
GET | /api/v1/identity/dlq/:id | identity:read | Single row including the captured originalRequest. |
POST | /api/v1/identity/dlq/:id/replay | identity:verify | Re-runs the verification through the live provider chain. Idempotent. |
Replay outcomes
| HTTP | Status | Behaviour |
|---|---|---|
| 200 | verified | Chain succeeded. New identity_verifications row persisted, DLQ row marked replayed and linked. |
| 200 | failed | Chain succeeded but verification still failed (e.g. low match score). DLQ row marked replayed for the per-kind services that own a writer. |
| 503 | provider_down | Chain still degraded. DLQ row left unreplayed so the next sweep can retry. |
| 400 | invalid | Captured payload can't be replayed (missing identifier, etc.). |
| 404 | DLQ row not found (or wrong organisation). |
Already-replayed rows short-circuit to the existing verification id without re-billing.
Request hedging
Hedging is opt-in per country plugin. When enabled it fires the next provider after a short delay (sized from observed p95) and races the results — first useful answer wins, laggards are abandoned.
The cost is roughly one extra provider call per request when the primary is in its tail; the benefit is p99 latency cut roughly in half. The tradeoff is the right one for high-value verifications where a 1.5-second tail loses a customer.
For the v1 Nigerian chain hedging is OFF by default — the existing breaker + cache cover most failure modes. Country plugins can flip it on without resolver-level changes; the helper lives in internal/hedging.ts and reads p95 directly from the latency module.
Errors
| HTTP | Code | Cause |
|---|---|---|
| 400 | BAD_REQUEST | Invalid query params (replay payload missing the captured identifier, etc.). |
| 401 | UNAUTHORIZED | Missing or invalid API key. |
| 402 | DETECTION_PACK_REQUIRED | Detection Pack addon not enabled. |
| 403 | FORBIDDEN | API key has no organisation context, or wrong scope. |
| 404 | NOT_FOUND | DLQ row not found (or wrong organisation). |
| 500 | INTERNAL_ERROR | Unexpected service failure (very rare — endpoint is in-process state). |
| 503 | SERVICE_UNAVAILABLE | Replay attempted but every provider in the chain is still down. |