OCR Overview

Text extraction from documents and identity cards with Nigerian document support, graded name matching, and optional regional context.

The PlatformXe OCR service extracts text from uploaded documents and identity cards using Azure Computer Vision (Read API). It parses Nigerian ID layouts, MRZ blocks where applicable, and compares extracted names to your profile using culturally aware matching with reliability tiers — not only a single pass/fail score.

How it works

Your application sends a document image (documentUrl or documentBase64) to POST /api/v1/ocr/verify-identity with profileName fields.
PlatformXe runs OCR, detects document type, and parses structured fields (names, ID number, dates, etc.).
The name matcher compares profile vs extracted names using normalisation, fuzzy similarity, a variation dictionary (static + database + cache), and soft Nigerian morphology priors.
Optionally, profileName.matchingContext can include state of origin (2-letter code) and related fields — a soft signal only (e.g. Delta vs south-east Igbo naming patterns). It can nudge borderline matches from MEDIUM toward HIGH when it aligns with inferred name patterns; it never replaces strong OCR or exact string agreement.
You receive isVerified (strict: FULL reliability + OCR ≥ 0.85), nameReliabilityLevel, requiresSecondaryValidation, and advisories.

Supported document types

Detection is driven by internal configs (regex / patterns on OCR text). Common types include:

Document	Typical code	Notes
NIN slip / card	`NIN`	National Identification Number
BVN print	`BVN`	Bank Verification Number
International passport	`PASSPORT`	MRZ supported
Driver's license	`DRIVERS_LICENSE`	FRSC
Voter's card	`VOTERS_CARD`	INEC PVC
TIN certificate	`TIN`	FIRS
CAC certificate	`CAC_RC_NUMBER`	Corporate registration

Pass expectedDocumentType as a hint when you know the document class; otherwise the service auto-detects.

Key features

Azure Computer Vision — production OCR with polling until analysis completes
MRZ parsing — machine-readable zone for passports and compatible IDs
Graded name matching — FULL / HIGH / MEDIUM / LOW / NONE with reliabilityScore and advisories
Variation dictionary — platform dictionary with Redis → PostgreSQL → static hydration (name_variation_entries)
Soft affinity priors — morphology hints for ambiguous particles (never a hard ethnicity label)
Optional state/LGA context — optional stateOfOriginCode on the profile for soft regional alignment (e.g. Delta Igbo vs other clusters)
Confidence — extraction must meet a minimum OCR confidence to proceed with matching; isVerified additionally requires high OCR for automatic pass

Thresholds (summary)

Concept	Typical value	Meaning
Minimum OCR to match	0.6	Below this, matching aborts with low-confidence / failed read semantics
Automatic `isVerified`	`FULL` reliability + OCR ≥ 0.85	Sole automatic verification bar
Fuzzy token similarity	0.85	Levenshtein ratio for many pairwise comparisons

Use matchResult.reliabilityLevel, requiresSecondaryValidation, and verificationAdvisories in product flows. Relying only on isVerified will reject many legitimate HIGH / MEDIUM outcomes that are appropriate after a secondary check (BVN, NIN OTP, manual review).

API scope

OCR verify-identity uses the service API key model (x-api-key). Scope requirements follow your platform configuration for B2B keys.

Endpoints

Method	Path	Description
POST	`/api/v1/ocr/verify-identity`	OCR + identity name matching (this service)

Processor tuning (confidence, languages) may be exposed under separate processor routes where enabled — see OCR processor if available for your tenant.

Next steps

Verify Identity — full request/response reference
ID Verification overview — product flow and cross-reference with Identity Resolution