Not Just Code

Making LLMs Efficient for Survey Cleaning: My Journey from Arrays to Choice Maps

Karthik Sai — Mon, 28 Apr 2025 14:20:00 GMT

Problem Statement

I was building an AI pipeline to clean survey responses. The data structure was like this:

Sample Question:

{
  "id": 3271,
  "text": "How satisfied are you with our service?",
  "choices": [
    { "id": 1, "label": "Very Satisfied" },
    { "id": 2, "label": "Neutral" },
    { "id": 3, "label": "Dissatisfied" }
  ]
}

Sample Response:

{
  "responseId": 1001,
  "responses": [
    { "questionId": 3271, "responses": "2" }
  ]
}

Simple na? The user selected 2, meaning "Neutral".
Now, when sending batches of survey responses to LLM for cleaning and fraud detection, I had a big question in mind: How to send questions and responses efficiently without wasting tokens and making model slow?

My Thought Process

Initially, I thought - "Aree yaar, just send the full questions array and responses array. Simple."
So I was packing:

Full questions (with choices array)
Full responses (with choiceIds)

But slowly I realised...

Every batch was sending the same choices again and again. Every user response needed LLM to read question choices, scan array, match choiceId.
Even a small survey was eating 2k-3k tokens easily just for system context!
Then I thought:

"What if instead of sending same data again and again, I somehow make the choice lookup easier for the model?"

I had explored three Options

Option 1: Keep Choices as Array (Default)

Each question has choices: [{ id, label }] array.
Response uses choiceId.
LLM scans array to match.

Pros: Tiny initial payload.

Cons:

Model has to do O(n) array scanning.
Slow reasoning.
Wastes attention and tokens if survey grows.

(Imagine scanning 10 choices manually every time — uff..)

Option 2: Expand Label Inside Every Response

Instead of sending choiceId, I replace it with "Neutral", "Dissatisfied", etc.
Responses directly readable by model.

Pros: Fast LLM understanding.

Cons:

Response size doubles or triples.
Huge token waste.
Not good for 10k+ responses batch.

(At small scale ok, but at big scale — 🪦RIP tokens!)

Option 3: Prebuilt Choice Map per Question

Build a map like:

{
  "3271": {
    "1": "Very Satisfied",
    "2": "Neutral",
    "3": "Dissatisfied"
  }
}

Response stays as choiceId ("2").
LLM just does O(1) lookup using map.

Pros:

One-time small cost.
Fastest reasoning.
Smallest token usage long term.
Bulletproof at 100k, 1M responses scale.

Cons:

Slightly more work backend-side to generate map.

(But haan yaar... once done, clean and scalable!)

Final Flow

Survey Questions (choices array)
            ↓
Preprocess into Choice Map (one time)
            ↓
Store Choice Map in System Context
            ↓
Send Responses with choiceId only
            ↓
LLM does O(1) lookup from Map
            ↓
Efficient fraud detection and response validation

✅ Pucho advantages kya hai ?

No duplicate choices in every batch.
No ballooning of response size.
No array scanning overhead for LLM.

Key Benefits

Approach	Token Usage	LLM Speed	Scale Readiness
Choices as Array	Medium	Medium	Ok only for small surveys
Expanded Labels	High	Fast	Very costly at scale
Prebuilt Choice Map	Low	Fastest	Best for 100k+ responses

💡 Final Thought

Sometimes, small design decisions, like, whether to send a list vs a map, matter A LOT when you want to scale cleanly.
I learned this by thinking deeply from the angle of:

Token cost
LLM cognitive load
Real-world scaling for lakhs of survey responses

TL;DR

This idea is not only for surveys! It can be applied wherever structured choices are involved.
Some real examples:

Auto-grading MCQ exams at scale (education apps).
Screening candidate forms in HRTech startups.
Cleaning healthcare intake forms efficiently.
Processing ecommerce customer feedback forms cheaply.
Analyzing product satisfaction surveys in SaaS platforms.

Main benefits of using Maps in AI pipelines:

✅ Save massive tokens.
✅ Make LLM think faster.
✅ Scale to millions of records easily.
✅ Keep backend and API payloads clean and simple.

Thanks for reading! 🙏
If you're building AI pipelines like this, comment your thoughts and approaches.

Smarter Techniques for Optimizing User Queries in RAG Systems

Karthik Sai — Fri, 11 Apr 2025 10:52:36 GMT

When I first built my Retrieval-Augmented Generation (RAG) pipeline, the flow was pretty standard, users could upload notes, textbooks, PDFs, or exam material, and I’d chunk it, embed it, and store the embeddings in Pinecone. On the user side, they’d ask a question, I’d embed it too, search vectors semantically, and send the top-k chunks to the LLM for answering.
Simple, effective, and surprisingly decent.

But it didn’t feel intelligent.

The Problem: Generic Questions, Overloaded Vector Search

Real users don’t ask specific, filtered questions.
They just ask “What’s the Doppler effect?” or “Why do dogs hear better than humans?”

At this point, I was running the query across the entire namespace or file, across all subjects, all topics, all difficulties, and just hoping the top semantic match was good enough. And while vector search is powerful, it’s not immune to:

Noisy matches
Overlapping concepts
Chunks from unrelated topics being semantically similar (especially in educational content)

So, I started thinking:

“I already have metadata on each chunk… can I use that to tighten my vector search?”

Step 1: Metadata Tagging at Chunk Time

I used LLMs (GPT-4o-mini) to extract structured metadata for each chunk during ingestion:

subject → Physics, Biology, Chemistry, etc.
topic and subtopic
difficulty (easy/medium/hard)
keywords (up to 5 terms)

This gave me control. I could now store this metadata alongside each vector in Pinecone and apply structured filters like:

filter: { fileId: ..., subject: "Physics", topic: "Sound" }

This was a big upgrade. Suddenly, vector search became more focused, and more accurate. But that was only part of the equation.

Step 2: Reverse Filter Prediction from User Query

Most users don’t select subjects or topics from a dropdown. They just ask the question.
So I thought, why not flip the idea?

What if I use an LLM to guess the subject and topic of the user query and use that to filter the vector search?

So I built a lightweight LLM wrapper that looks at the incoming user query and infers possible metadata like:

{ subject: "Physics", topic: "Sound" }

This worked surprisingly well.
Questions like:

“How does echo differ from reverberation?”
would get tagged as:
{ subject: "Physics", topic: "Sound" } → and boom, only Physics+Sound chunks are searched.

The Roadblock: Hallucinated Filters = 0 Results

Then came the issue.
Sometimes the LLM would guess metadata that didn’t actually exist in my chunked data.
For example, for a question about acoustics, it guessed:

{ subject: "Physics", topic: "Music" }

Problem? I had no vector chunks tagged with topic: "Music", so Pinecone’s strict filtering returned 0 results.
And here’s the kicker:
All vector DBs (Pinecone, Qdrant, Weaviate…) do exact metadata filtering.
They won’t “fuzzy match” or try alternatives. No match = no results.
So now I had a clever guess that completely stopped my retrieval process.

Step 3: Building a Fallback System

To handle these edge cases, I added graceful filter fallback logic in the app layer:

Try full filter (fileId + subject + topic)
If no match, try relaxed filter (fileId + subject)
If still nothing, fallback to pure vector search (just fileId or even entire namespace)

This gave me:

⚡ Speed when filters worked
🎯 Precision when filters were accurate
🔄 Resilience when filters failed

Now even if the LLM guessed something too niche, I didn’t lose the result, I just searched wider.

Step 4: Smarter Guesses Using a Sidecar Metadata DB

But I wasn’t done.
The LLM still occasionally hallucinated weird or off-topic guesses — like "Music" or "Hearing", which weren’t present in my dataset.
So I took it further:

What if the LLM could only choose from actual metadata I already have in the DB?

So I built a small sidecar metadata index, basically a MongoDB collection or even a memory cache,
that stores all known:

Subjects
Topics
Keywords (per file or per namespace)

Now, before guessing filters, I preload those known values into the LLM prompt like:

Available subjects: [Physics, Chemistry, Biology]
Available topics: [Sound, Light, Motion, Laws of Motion, Photosynthesis]

And the LLM only guesses from this known list.
No more hallucinated filters. No more Music. Just sharp, valid filter predictions.

Final Setup (and why I love it)

Now my query pipeline looks like this:

User asks anything : simple, free-form
LLM guesses subject/topic from a list of known values
Filter-based vector search runs (fileId + guessed filters)
Fallback if no results
Top chunks reranked
Answer generated via LLM

It’s smart. It’s resilient. It’s clean.

🤝 Final Thoughts

This whole process turned out to be one of the most valuable architectural upgrades I’ve made.
It added intelligence to the system without sacrificing robustness, and it cost me no extra infra just smarter orchestration.
If you’re building a RAG-based system with user-uploaded content, definitely consider:

Storing chunk-level metadata
Letting LLMs predict filters
Building a fallback tree
Feeding known metadata into the LLM prompt

You’ll end up with a system that feels 10x smarter — and your users won’t even know why.

⚠️ A quick word of caution

This kind of intelligent metadata-driven filtering is best suited for high-accuracy domains (like education, legal, healthcare) or apps operating at production scale, where precision, performance, and cost-efficiency truly matter.
If you're building a simple MVP or experimenting with RAG for the first time, you likely don't need this level of orchestration.
In those cases, a clean vector search with a good reranker is more than enough to get you started, add this layer only when your app (or your users) demand it.