PlatformXeDocs
Get API Key

OCR Overview

Text extraction from documents and identity cards with Nigerian document support, graded name matching, and optional regional context.

The PlatformXe OCR service extracts text from uploaded documents and identity cards using Azure Computer Vision (Read API). It parses Nigerian ID layouts, MRZ blocks where applicable, and compares extracted names to your profile using culturally aware matching with reliability tiers — not only a single pass/fail score.

How it works

  1. Your application sends a document image (documentUrl or documentBase64) to POST /api/v1/ocr/verify-identity with profileName fields.
  2. PlatformXe runs OCR, detects document type, and parses structured fields (names, ID number, dates, etc.).
  3. The name matcher compares profile vs extracted names using normalisation, fuzzy similarity, a variation dictionary (static + database + cache), and soft Nigerian morphology priors.
  4. Optionally, profileName.matchingContext can include state of origin (2-letter code) and related fields — a soft signal only (e.g. Delta vs south-east Igbo naming patterns). It can nudge borderline matches from MEDIUM toward HIGH when it aligns with inferred name patterns; it never replaces strong OCR or exact string agreement.
  5. You receive isVerified (strict: FULL reliability + OCR ≥ 0.85), nameReliabilityLevel, requiresSecondaryValidation, and advisories.

Supported document types

Detection is driven by internal configs (regex / patterns on OCR text). Common types include:

DocumentTypical codeNotes
NIN slip / cardNINNational Identification Number
BVN printBVNBank Verification Number
International passportPASSPORTMRZ supported
Driver's licenseDRIVERS_LICENSEFRSC
Voter's cardVOTERS_CARDINEC PVC
TIN certificateTINFIRS
CAC certificateCAC_RC_NUMBERCorporate registration

Pass expectedDocumentType as a hint when you know the document class; otherwise the service auto-detects.

Key features

  • Azure Computer Vision — production OCR with polling until analysis completes
  • MRZ parsing — machine-readable zone for passports and compatible IDs
  • Graded name matchingFULL / HIGH / MEDIUM / LOW / NONE with reliabilityScore and advisories
  • Variation dictionary — platform dictionary with Redis → PostgreSQL → static hydration (name_variation_entries)
  • Soft affinity priors — morphology hints for ambiguous particles (never a hard ethnicity label)
  • Optional state/LGA context — optional stateOfOriginCode on the profile for soft regional alignment (e.g. Delta Igbo vs other clusters)
  • Confidence — extraction must meet a minimum OCR confidence to proceed with matching; isVerified additionally requires high OCR for automatic pass

Thresholds (summary)

ConceptTypical valueMeaning
Minimum OCR to match0.6Below this, matching aborts with low-confidence / failed read semantics
Automatic isVerifiedFULL reliability + OCR ≥ 0.85Sole automatic verification bar
Fuzzy token similarity0.85Levenshtein ratio for many pairwise comparisons

Use matchResult.reliabilityLevel, requiresSecondaryValidation, and verificationAdvisories in product flows. Relying only on isVerified will reject many legitimate HIGH / MEDIUM outcomes that are appropriate after a secondary check (BVN, NIN OTP, manual review).

API scope

OCR verify-identity uses the service API key model (x-api-key). Scope requirements follow your platform configuration for B2B keys.

Endpoints

MethodPathDescription
POST/api/v1/ocr/verify-identityOCR + identity name matching (this service)

Processor tuning (confidence, languages) may be exposed under separate processor routes where enabled — see OCR processor if available for your tenant.

Next steps