Signature AI ·Last-Mile Delivery

AI Address Intelligence

From unstructured addresses to deliverable jobs — at national scale.

✦ 90% first-attempt delivery rate at a national postal operator powering 200+ countries.

Himanshu Gupta

Co-Founder & Chief Architect

The problem it solves

Most failed deliveries aren’t a routing problem. They’re an address problem. “Near the second mosque, blue door, call when you reach” is a valid human address and an uninterpretable machine input. In emerging markets, 30–50% of shipping addresses are unstructured, partial, misspelled, geocoded to the wrong polygon, or point to a building that has three entrances on three different streets. In mature markets, the failure mode is subtler — apartment numbers missing, gate codes absent, customer-not-present because the window was wrong. Every one of these becomes a re-attempt. Every re-attempt is roughly USD 3–10 of direct cost, a dented NPS, and a driver hour burnt. At national-postal scale, bad addresses are the single largest lever on first-attempt delivery rate — larger than routing, larger than driver skill, larger than vehicle type. AI Address Intelligence exists to make a bad address deliverable before the parcel leaves the hub.

What it is

AI Address Intelligence is Shipsy’s address normalization and enrichment pipeline — the signature mechanism that turns free-text, voice-note, and emoji-laden addresses into structured, geocoded, deliverable jobs. It is not a geocoder. It is a layered system: rule-based parsing, LLM-driven extraction, confidence scoring, polygon-library matching, and — critically — agentic voice and WhatsApp correction when confidence is low. Where a traditional geocoder returns “no match” and gives up, AI Address Intelligence calls the customer, corrects the address, and writes the enriched payload back. What’s new-age about it: the pipeline doesn’t sit in a batch job running the night before — it runs at the moment the consignment note is created, at the moment the parcel is picked up, and again at the moment it’s dispatched to a driver. Confidence is re-scored at every stage. By the time the parcel reaches the last mile, 90%+ of addresses are deliverable first attempt.

Core capabilities

Capability	What it does
Multi-layer parser	Rule-based tokenizer (country-specific), followed by an LLM-based extractor that handles vernacular, transliteration, and multi-script inputs. Returns a canonical JSON with house, street, locality, landmark, pincode, city, country.
Confidence scoring	Every address gets a 0–100 confidence score on four dimensions — parse quality, geocode precision, polygon match, historical success rate for the pincode. Tiered thresholds drive routing downstream.
Polygon library	Country-specific polygon libraries (India pincode-level, UAE community-level, Southeast Asia district-level) with operator-validated overrides. A geocode that lands in the wrong polygon is flagged, not used.
Agentic voice correction	When confidence falls below threshold, a voice agent calls the customer in the local language, asks targeted clarifying questions (landmark, nearest shop, gate code), and writes the corrected address back. Auto-approves when voice confidence exceeds policy.
Agentic WhatsApp correction	For markets and customers who prefer chat, a WhatsApp agent messages the customer with a smart-quote flow — “Is it the building with the green gate next to the pharmacy?” — and disambiguates the drop.
Duplicate and variant detection	Merges variants of the same physical address (“Flat 4B, Green Tower” vs “4B Green Twr” vs “Green Tower apt 4”) into a canonical entity. Reduces index bloat and improves historical scoring.
Landmark and POI enrichment	Matches free-text landmarks (“opposite SBI ATM”) against a POI database and converts them to geocoded anchors. Critical for markets where street naming is thin.
Gap-fill via LLM	When the customer writes a partial address, the LLM fills structured gaps using context — order history, delivery history, city heuristics — and flags every gap for human review above a risk threshold.
Batch and real-time modes	Real-time API for CN creation and at-pickup validation. Batch mode for order-book backfill and historical cleansing. Same model, same scoring, different cadence.
Driver-in-the-loop feedback	Every failed or corrected delivery writes back into the training ledger — the delivery the driver actually made becomes the ground truth for tomorrow. The model learns the city.
Fraud signal integration	Addresses with repeated failed-delivery patterns, suspicious landmark combinations, or high-risk postal clusters feed the fraud detection pipeline — shared with Atlas for incident coupling.
Privacy-first design	PII-safe LLM prompts, regional data residency, configurable retention windows. Voice and chat transcripts encrypted at rest and pruned per customer policy.

How it works

The pipeline is event-driven. When a consignment note is created, an address.received event lands on the bus. The Parser runs the rule-based + LLM-based stack. The Geocoder resolves coordinates and matches against the polygon library. The Confidence Engine assigns a tiered score. Below threshold, the Correction Orchestrator dispatches a voice or WhatsApp agent. Above threshold, the address is stamped with a confidence tier and moves downstream to route planning. The Feedback Collector writes every outcome — delivered first attempt, failed, corrected by driver — back into the training ledger.

graph TB A[Order / CN Created] -->|address.received| B[Multi-Layer Parser] B --> C[Geocoder] C --> D[Polygon Library Match] D --> E[Confidence Engine] E --> F{Confidence Tier} F -->|High >85| G[Stamped and Routed] F -->|Medium 60-85| H[Gap-Fill via LLM] F -->|Low <60| I[Correction Orchestrator] I --> J[Voice Agent] I --> K[WhatsApp Agent] H --> E J --> E K --> E G --> L[Downstream: Astra / Atlas] L --> M[Feedback Collector] M --> N[Training Ledger] N -.feeds model.-> B

The execution flow shows how the agentic correction works in practice — what used to be a failed delivery becomes a 90-second phone call before dispatch.

sequenceDiagram participant OMS as Order Book participant AAI as Address Intelligence participant VCE as Voice Agent participant CUS as Customer participant AST as Astra OMS->>AAI: CN created (address: "near second mosque blue door") AAI->>AAI: parse + geocode AAI->>AAI: confidence 42 (low) AAI->>VCE: dispatch voice correction VCE->>CUS: IVR call in local language CUS-->>VCE: "It's the blue door next to Madina Mart, lane 4" VCE->>AAI: corrected landmark + lane AAI->>AAI: re-parse + re-geocode AAI->>AAI: confidence 92 (high) AAI->>AST: stamped address ready for planning AST-->>OMS: route published with precise drop

Proven outcomes

Customer type and scale	Outcome
A national postal operator powering 200+ countries	90% first-attempt delivery rate; 25% reduction in manual workload; 12–18%+ cost reduction
A national postal operator serving 15+ atoll offices and 172 postal agencies (island nation)	Full transparency and real-time tracking across complex sea routes; higher SLA adherence via AI incident detection; fraud/failed-delivery prevention
India’s largest pharmacy chain, 3,000+ delivery riders	17 incident types auto-detected; address confidence scoring calls customers when input is weak; 3,000+ selfies auto-processed per shift (sister mechanism)
A premium Indian B2B express network, 49 cities, 3,500+ pincodes	90%+ FADR (from ~75%); real-time address validation and correction at CN creation; 16–18% cost savings at network level
A global alco-bev leader across 70+ countries	50% reduction in failed deliveries due to “customer not present”; 28% reduction in excessive-stay payments to LSPs

Integrations

Order sources — Oracle ERP, SAP, Salesforce, Shopify, Magento, Veeva, custom OMS via REST and Kafka
Geocoders — Google Maps, HERE, Mapbox, OpenStreetMap (country-appropriate), with Shipsy polygon overlays
Voice and messaging — Voice over SIP, Twilio, Exotel (India), WhatsApp Business API, Telegram for select markets
LLM providers — Gemini Flash, Claude, GPT-family — provider-agnostic with PII-safe prompting
Driver apps — Shipsy Driver App, third-party apps via REST; driver correction loop writes back to the ledger
Data platforms — Snowflake, BigQuery, Databricks; every address decision stream-exported
Sister agents and mechanisms — Astra (planning), Atlas (control tower), Clara (CX), Micro-Cluster Route Optimization

Deployment

Phase 1 — Discovery (weeks 1–2). Address corpus audit — sample 10,000 historical addresses, score confidence, classify failure modes. Country-specific parser rules reviewed, polygon library scoped. Success criterion: baseline FADR and baseline confidence distribution documented.

Phase 2 — Configuration (weeks 2–5). Parser tuning per country, polygon library loaded, LLM prompt library configured, voice and WhatsApp agents wired to local telephony and language packs. Shadow mode runs against the live order book.

Phase 3 — Pilot (weeks 4–7). One city or region goes live. Real-time mode on CN creation. Tier 1 autonomy — voice and WhatsApp corrections require supervisor approval for the first week, then unlock. Baseline FADR is measured weekly.

Phase 4 — Scale (weeks 8–14). Progressive rollout, typically 2–3 regions per week. Tier 3 autonomy unlocks for high-confidence auto-corrections. Retraining cadence set (weekly for the first quarter, monthly thereafter). Governance review with ops leadership.

Most enterprises reach steady-state in 10–14 weeks. The FADR lift typically lands by month three in mature deployments. Success criteria are pre-agreed: first-attempt delivery rate, confidence distribution, voice-agent resolution rate, failed-delivery cost reduction.

Security and compliance

SOC 2 Type II, ISO 27001, GDPR-ready data handling with PII minimization in LLM prompts
Regional data residency — EU, India, Middle East, APAC hosting options
Voice and chat transcripts encrypted at rest, configurable retention (default 30 days, customer-configurable)
Full audit trail on every address decision — input, parse, geocode, confidence, correction method, approver
21 CFR Part 11, GDP, and GMP-aligned audit trails for pharma cold-chain deployments
Consent flows for voice and WhatsApp contact — respects local regulations (TRAI, GDPR, TCPA)
Fraud detection integration with Atlas incident coupling

Case study callouts

A national postal operator powering 200+ countries

Hit a 90% first-attempt delivery rate and cut manual workload by 25% by deploying AI Address Intelligence at the booking layer — the voice and WhatsApp agents correct addresses before the parcel ever leaves the sorting hub. Cost reduction of 12–18%+.

→ Read the full case study

A national postal operator across 15+ atolls and 172 postal agencies

Island-nation geography means sea routes, patchy addresses, and long recovery windows when a delivery fails. AI Address Intelligence combined with AI incident detection delivered full transparency and real-time tracking — and “a real edge,” in the CCO’s words, across first, middle, and last mile.

→ Read the full case study

India’s largest pharmacy chain · 3,000+ delivery riders

Address Intelligence scores every delivery address at dispatch, fills gaps via LLM, and calls the customer when confidence is low — part of a broader AI stack that auto-detects 17 incident types and gives cluster managers a 30-minute early warning before SLA breaches.

→ Read the full case study

Frequently Asked Questions

What kinds of addresses does it handle?

All of them — structured addresses (US, Western Europe), partially-structured (India, Southeast Asia, MENA), and unstructured vernacular ("opposite the blue gate, call when you reach"). Multi-script inputs (Devanagari, Arabic, Thai, Chinese) are supported out of the box. The system is country-tunable — a parser trained on Indian addresses handles "H.No" and "Opp." natively; a UAE parser handles community-and-makani codes.

How long does deployment typically take?

Most enterprises go live in 10–14 weeks. A pilot city runs in weeks 4–7. FADR lift of 5–10 points typically lands by week 10, 10–20 points by week 16, in markets where the baseline is ~70–75%.

How do voice and WhatsApp corrections work in practice?

When address confidence is below threshold, the Correction Orchestrator dispatches an agent. The voice agent makes an IVR-style call in the customer's local language — "Hi, I'm calling to confirm your delivery address for order 12345. Is it the building with the green gate next to Madina Mart?" — and writes the clarified address back. The WhatsApp agent does the same in chat. Most calls and chats close in under 90 seconds.

What happens if the customer doesn't answer?

The agent retries per policy (typically two calls over 30 minutes, with a WhatsApp follow-up). If unreachable, the address is escalated — either to a human ops reviewer or back to the merchant for re-confirmation. The parcel is held at the hub rather than dispatched into a likely failure.

How does it integrate with our existing OMS and carrier network?

REST and Kafka at the order layer. The address enrichment runs at CN creation, at pickup, and again at dispatch — same API, different trigger points. Carrier-agnostic; the enriched address lands on every carrier integration equally. Shipsy runs 240+ carrier integrations under the [Multi-Carrier Intelligence](/brochures/multi-carrier-intelligence) layer.

Does it work for pharma and cold-chain?

Yes. Pharma deployments add 21 CFR Part 11, GDP, and GMP-aligned audit trails, PII minimization in LLM prompts, and integration with Veeva QMS. A global biotech across 30+ countries uses it for patient-shipment address validation with real-time temperature alerting layered on top.

How do you prevent LLM hallucination on gap-fills?

Three guardrails. First, the gap-fill prompt is constrained to structured JSON with typed fields — the LLM can't invent freeform. Second, every gap-fill is confidence-scored; below threshold, the system routes to voice/WhatsApp rather than accept the LLM's guess. Third, the driver-in-the-loop feedback loop catches systematic errors and retrains weekly. Hallucination rate on structured gap-fill is below 1% in mature deployments.

What's the model behind the agentic voice and chat?

A hybrid stack. Routing and intent classification run on a lightweight in-house model for latency. The conversational turn uses a frontier LLM (provider-agnostic — Gemini Flash, Claude, GPT-family) with PII-safe prompting and local-language support. Responses are grounded in the parsed address structure, not freeform generation.

Can it run on our own cloud / on-prem?

Yes. Shipsy supports customer-VPC deployment and on-prem installation for regulated industries. LLM providers can be swapped for customer-approved endpoints (including private endpoints via Vertex AI, Azure OpenAI, or Bedrock). Data residency is configurable per region.

How is success measured?

Four primary metrics, all pre-agreed at kickoff. First-attempt delivery rate (FADR) — the north-star KPI. Confidence distribution — the share of addresses scoring above 85 at dispatch. Voice/WhatsApp agent resolution rate — the share of low-confidence addresses the agents successfully close before dispatch. Failed-delivery cost reduction — direct rupee/dollar impact. Secondary metrics include customer NPS, driver "customer-not-found" exceptions, and sub-10-minute dispatch delay for voice-corrected parcels.

Does it replace our geocoder?

No — it sits above the geocoder. A geocoder returns coordinates and a match quality. AI Address Intelligence turns that output (plus the raw address, plus the customer's historical delivery graph, plus a polygon-library match) into a confidence-scored, deliverable job. Customers keep Google Maps, HERE, or Mapbox, and Shipsy layers the enrichment, correction, and feedback loop on top.

What about repeated customers and memorised addresses?

The pipeline builds a per-customer address graph over time. For a returning customer, the parser weights their historical successful addresses and reduces the confidence penalty on partial inputs that match prior drops. A customer's "Flat 4B" becomes deliverable on the first CN because the graph already knows the building. The graph is scoped per-tenant and per-customer; it never crosses customer boundaries.