The problem it solves
Most failed deliveries aren’t a routing problem. They’re an address problem. “Near the second mosque, blue door, call when you reach” is a valid human address and an uninterpretable machine input. In emerging markets, 30–50% of shipping addresses are unstructured, partial, misspelled, geocoded to the wrong polygon, or point to a building that has three entrances on three different streets. In mature markets, the failure mode is subtler — apartment numbers missing, gate codes absent, customer-not-present because the window was wrong. Every one of these becomes a re-attempt. Every re-attempt is roughly USD 3–10 of direct cost, a dented NPS, and a driver hour burnt. At national-postal scale, bad addresses are the single largest lever on first-attempt delivery rate — larger than routing, larger than driver skill, larger than vehicle type. AI Address Intelligence exists to make a bad address deliverable before the parcel leaves the hub.
What it is
AI Address Intelligence is Shipsy’s address normalization and enrichment pipeline — the signature mechanism that turns free-text, voice-note, and emoji-laden addresses into structured, geocoded, deliverable jobs. It is not a geocoder. It is a layered system: rule-based parsing, LLM-driven extraction, confidence scoring, polygon-library matching, and — critically — agentic voice and WhatsApp correction when confidence is low. Where a traditional geocoder returns “no match” and gives up, AI Address Intelligence calls the customer, corrects the address, and writes the enriched payload back. What’s new-age about it: the pipeline doesn’t sit in a batch job running the night before — it runs at the moment the consignment note is created, at the moment the parcel is picked up, and again at the moment it’s dispatched to a driver. Confidence is re-scored at every stage. By the time the parcel reaches the last mile, 90%+ of addresses are deliverable first attempt.
Core capabilities
| Capability | What it does |
|---|---|
| Multi-layer parser | Rule-based tokenizer (country-specific), followed by an LLM-based extractor that handles vernacular, transliteration, and multi-script inputs. Returns a canonical JSON with house, street, locality, landmark, pincode, city, country. |
| Confidence scoring | Every address gets a 0–100 confidence score on four dimensions — parse quality, geocode precision, polygon match, historical success rate for the pincode. Tiered thresholds drive routing downstream. |
| Polygon library | Country-specific polygon libraries (India pincode-level, UAE community-level, Southeast Asia district-level) with operator-validated overrides. A geocode that lands in the wrong polygon is flagged, not used. |
| Agentic voice correction | When confidence falls below threshold, a voice agent calls the customer in the local language, asks targeted clarifying questions (landmark, nearest shop, gate code), and writes the corrected address back. Auto-approves when voice confidence exceeds policy. |
| Agentic WhatsApp correction | For markets and customers who prefer chat, a WhatsApp agent messages the customer with a smart-quote flow — “Is it the building with the green gate next to the pharmacy?” — and disambiguates the drop. |
| Duplicate and variant detection | Merges variants of the same physical address (“Flat 4B, Green Tower” vs “4B Green Twr” vs “Green Tower apt 4”) into a canonical entity. Reduces index bloat and improves historical scoring. |
| Landmark and POI enrichment | Matches free-text landmarks (“opposite SBI ATM”) against a POI database and converts them to geocoded anchors. Critical for markets where street naming is thin. |
| Gap-fill via LLM | When the customer writes a partial address, the LLM fills structured gaps using context — order history, delivery history, city heuristics — and flags every gap for human review above a risk threshold. |
| Batch and real-time modes | Real-time API for CN creation and at-pickup validation. Batch mode for order-book backfill and historical cleansing. Same model, same scoring, different cadence. |
| Driver-in-the-loop feedback | Every failed or corrected delivery writes back into the training ledger — the delivery the driver actually made becomes the ground truth for tomorrow. The model learns the city. |
| Fraud signal integration | Addresses with repeated failed-delivery patterns, suspicious landmark combinations, or high-risk postal clusters feed the fraud detection pipeline — shared with Atlas for incident coupling. |
| Privacy-first design | PII-safe LLM prompts, regional data residency, configurable retention windows. Voice and chat transcripts encrypted at rest and pruned per customer policy. |
How it works
The pipeline is event-driven. When a consignment note is created, an address.received event lands on the bus. The Parser runs the rule-based + LLM-based stack. The Geocoder resolves coordinates and matches against the polygon library. The Confidence Engine assigns a tiered score. Below threshold, the Correction Orchestrator dispatches a voice or WhatsApp agent. Above threshold, the address is stamped with a confidence tier and moves downstream to route planning. The Feedback Collector writes every outcome — delivered first attempt, failed, corrected by driver — back into the training ledger.
The execution flow shows how the agentic correction works in practice — what used to be a failed delivery becomes a 90-second phone call before dispatch.
Proven outcomes
| Customer type and scale | Outcome |
|---|---|
| A national postal operator powering 200+ countries | 90% first-attempt delivery rate; 25% reduction in manual workload; 12–18%+ cost reduction |
| A national postal operator serving 15+ atoll offices and 172 postal agencies (island nation) | Full transparency and real-time tracking across complex sea routes; higher SLA adherence via AI incident detection; fraud/failed-delivery prevention |
| India’s largest pharmacy chain, 3,000+ delivery riders | 17 incident types auto-detected; address confidence scoring calls customers when input is weak; 3,000+ selfies auto-processed per shift (sister mechanism) |
| A premium Indian B2B express network, 49 cities, 3,500+ pincodes | 90%+ FADR (from ~75%); real-time address validation and correction at CN creation; 16–18% cost savings at network level |
| A global alco-bev leader across 70+ countries | 50% reduction in failed deliveries due to “customer not present”; 28% reduction in excessive-stay payments to LSPs |
Integrations
- Order sources — Oracle ERP, SAP, Salesforce, Shopify, Magento, Veeva, custom OMS via REST and Kafka
- Geocoders — Google Maps, HERE, Mapbox, OpenStreetMap (country-appropriate), with Shipsy polygon overlays
- Voice and messaging — Voice over SIP, Twilio, Exotel (India), WhatsApp Business API, Telegram for select markets
- LLM providers — Gemini Flash, Claude, GPT-family — provider-agnostic with PII-safe prompting
- Driver apps — Shipsy Driver App, third-party apps via REST; driver correction loop writes back to the ledger
- Data platforms — Snowflake, BigQuery, Databricks; every address decision stream-exported
- Sister agents and mechanisms — Astra (planning), Atlas (control tower), Clara (CX), Micro-Cluster Route Optimization
Deployment
Phase 1 — Discovery (weeks 1–2). Address corpus audit — sample 10,000 historical addresses, score confidence, classify failure modes. Country-specific parser rules reviewed, polygon library scoped. Success criterion: baseline FADR and baseline confidence distribution documented.
Phase 2 — Configuration (weeks 2–5). Parser tuning per country, polygon library loaded, LLM prompt library configured, voice and WhatsApp agents wired to local telephony and language packs. Shadow mode runs against the live order book.
Phase 3 — Pilot (weeks 4–7). One city or region goes live. Real-time mode on CN creation. Tier 1 autonomy — voice and WhatsApp corrections require supervisor approval for the first week, then unlock. Baseline FADR is measured weekly.
Phase 4 — Scale (weeks 8–14). Progressive rollout, typically 2–3 regions per week. Tier 3 autonomy unlocks for high-confidence auto-corrections. Retraining cadence set (weekly for the first quarter, monthly thereafter). Governance review with ops leadership.
Most enterprises reach steady-state in 10–14 weeks. The FADR lift typically lands by month three in mature deployments. Success criteria are pre-agreed: first-attempt delivery rate, confidence distribution, voice-agent resolution rate, failed-delivery cost reduction.
Security and compliance
- SOC 2 Type II, ISO 27001, GDPR-ready data handling with PII minimization in LLM prompts
- Regional data residency — EU, India, Middle East, APAC hosting options
- Voice and chat transcripts encrypted at rest, configurable retention (default 30 days, customer-configurable)
- Full audit trail on every address decision — input, parse, geocode, confidence, correction method, approver
- 21 CFR Part 11, GDP, and GMP-aligned audit trails for pharma cold-chain deployments
- Consent flows for voice and WhatsApp contact — respects local regulations (TRAI, GDPR, TCPA)
- Fraud detection integration with Atlas incident coupling
Case study callouts
A national postal operator powering 200+ countries
Hit a 90% first-attempt delivery rate and cut manual workload by 25% by deploying AI Address Intelligence at the booking layer — the voice and WhatsApp agents correct addresses before the parcel ever leaves the sorting hub. Cost reduction of 12–18%+.
A national postal operator across 15+ atolls and 172 postal agencies
Island-nation geography means sea routes, patchy addresses, and long recovery windows when a delivery fails. AI Address Intelligence combined with AI incident detection delivered full transparency and real-time tracking — and “a real edge,” in the CCO’s words, across first, middle, and last mile.
India’s largest pharmacy chain · 3,000+ delivery riders
Address Intelligence scores every delivery address at dispatch, fills gaps via LLM, and calls the customer when confidence is low — part of a broader AI stack that auto-detects 17 incident types and gives cluster managers a 30-minute early warning before SLA breaches.