OCR Overview
Text extraction from documents and identity cards with Nigerian document support, graded name matching, and optional regional context.
The PlatformXe OCR service extracts text from uploaded documents and identity cards using Azure Computer Vision (Read API). It parses Nigerian ID layouts, MRZ blocks where applicable, and compares extracted names to your profile using culturally aware matching with reliability tiers — not only a single pass/fail score.
How it works
- Your application sends a document image (
documentUrlordocumentBase64) to POST/api/v1/ocr/verify-identitywithprofileNamefields. - PlatformXe runs OCR, detects document type, and parses structured fields (names, ID number, dates, etc.).
- The name matcher compares profile vs extracted names using normalisation, fuzzy similarity, a variation dictionary (static + database + cache), and soft Nigerian morphology priors.
- Optionally,
profileName.matchingContextcan include state of origin (2-letter code) and related fields — a soft signal only (e.g. Delta vs south-east Igbo naming patterns). It can nudge borderline matches from MEDIUM toward HIGH when it aligns with inferred name patterns; it never replaces strong OCR or exact string agreement. - You receive
isVerified(strict: FULL reliability + OCR ≥ 0.85),nameReliabilityLevel,requiresSecondaryValidation, and advisories.
Supported document types
Detection is driven by internal configs (regex / patterns on OCR text). Common types include:
| Document | Typical code | Notes |
|---|---|---|
| NIN slip / card | NIN | National Identification Number |
| BVN print | BVN | Bank Verification Number |
| International passport | PASSPORT | MRZ supported |
| Driver's license | DRIVERS_LICENSE | FRSC |
| Voter's card | VOTERS_CARD | INEC PVC |
| TIN certificate | TIN | FIRS |
| CAC certificate | CAC_RC_NUMBER | Corporate registration |
Pass expectedDocumentType as a hint when you know the document class; otherwise the service auto-detects.
Key features
- Azure Computer Vision — production OCR with polling until analysis completes
- MRZ parsing — machine-readable zone for passports and compatible IDs
- Graded name matching —
FULL/HIGH/MEDIUM/LOW/NONEwithreliabilityScoreand advisories - Variation dictionary — platform dictionary with Redis → PostgreSQL → static hydration (
name_variation_entries) - Soft affinity priors — morphology hints for ambiguous particles (never a hard ethnicity label)
- Optional state/LGA context — optional
stateOfOriginCodeon the profile for soft regional alignment (e.g. Delta Igbo vs other clusters) - Confidence — extraction must meet a minimum OCR confidence to proceed with matching;
isVerifiedadditionally requires high OCR for automatic pass
Thresholds (summary)
| Concept | Typical value | Meaning |
|---|---|---|
| Minimum OCR to match | 0.6 | Below this, matching aborts with low-confidence / failed read semantics |
Automatic isVerified | FULL reliability + OCR ≥ 0.85 | Sole automatic verification bar |
| Fuzzy token similarity | 0.85 | Levenshtein ratio for many pairwise comparisons |
Use matchResult.reliabilityLevel, requiresSecondaryValidation, and verificationAdvisories in product flows. Relying only on isVerified will reject many legitimate HIGH / MEDIUM outcomes that are appropriate after a secondary check (BVN, NIN OTP, manual review).
API scope
OCR verify-identity uses the service API key model (x-api-key). Scope requirements follow your platform configuration for B2B keys.
Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/ocr/verify-identity | OCR + identity name matching (this service) |
Processor tuning (confidence, languages) may be exposed under separate processor routes where enabled — see OCR processor if available for your tenant.
Next steps
- Verify Identity — full request/response reference
- ID Verification overview — product flow and cross-reference with Identity Resolution