A corpus no one else has
1.2M part listings cross-linked by OEM number, vehicle model, engine code and gearbox variant. Merged with workshop-forum vocabulary and 40k pages of scanned Indian OEM manuals.
A small, fast embedding model trained on India's auto-part catalogs, OEM cross-references, and the Hinglish a mechanic actually uses. Because "Swift dicky strut" should find the tailgate gas spring — not nothing at all.
India's auto aftermarket runs on WhatsApp voice notes, dog-eared parts catalogs, and a workshop vocabulary that no keyword search has ever understood.
A buyer asks for a "Scorpio dicky wala strut, 2015, diesel ka". The catalog lists it as Tailgate Lift Support — Mahindra Scorpio mHawk 2.2L · 0609AA0044N. Fuzzy match misses. Keyword match misses. Both leave money on the counter.
General-purpose embedding models don't know that a dicky is a boot, that a silencer is an exhaust muffler, or that Alto K10 parts interchange with Maruti 800 on 60% of their SKUs. They were trained on the open web — not on the language of a 4-seat workshop in Kashmere Gate.
PartIndex is trained on that language. A compact bi-encoder, fine-tuned on triplets of query, matching part, and near-miss — with a vehicle-compatibility graph underneath it so every answer knows whether it will actually fit.
1.2M part listings cross-linked by OEM number, vehicle model, engine code and gearbox variant. Merged with workshop-forum vocabulary and 40k pages of scanned Indian OEM manuals.
Fine-tuned on ~3.4M triplets of (query, matching part, near-miss). 260M parameters. Runs on a single A10 at < 50 ms p95. No LLM round-trip in the hot path.
Every SKU is bound to the vehicles it fits. "Will this clutch work on my Scorpio?" gets a straight yes or no — not a confidence score dressed up as one.
REST + webhook search, a drop-in React widget, a compat-check endpoint, and weekly delta updates. Tenant-keyed, RBAC, India + US-East inference pops.
If you run an auto-parts marketplace, a distributor network, a workshop-facing app, or an OEM e-commerce surface, we'd like to hear from you. One line about your use case is enough — the rest is a conversation.