AI Engineering

I Tried RAG on My Production Rails App — Then Replaced It with a $0.002 SQL Generator

I built the full RAG pipeline — vector database, 77,000 embeddings, chunking service. Then I asked a real operational question and realized SQL generation beats vector search for structured data. Here's the $0.04 lesson.

Vibol Teav12 min read

The app: a medical services management platform

I maintain a Rails 8 application that manages cleaning operations for medical facilities. Staff check in and out of jobs via GPS, managers approve time-off requests, and the system tracks everything — tens of thousands of jobs, hundreds of users, and a substantial audit trail.

The super admin gets questions like:

  • “Did Staff Member A check in for their job on Tuesday?”
  • “Is this GPS check-in legitimate or did someone fake their location?”
  • “Why is Thursday's job missing from the dashboard?”
  • “How many sick days has the team taken this quarter?”
  • “What changed on this job record? Who changed it?”

Answering each one means SSHing into the server, opening a Rails console, writing a query, interpreting the results. It takes 10-30 minutes per question. I wanted to build an AI assistant that could answer these instantly.

What I built: the full RAG pipeline

I did what every AI tutorial tells you to do. I built RAG — Retrieval Augmented Generation:

  1. Upgraded to MySQL 9 for native VECTOR(512) columns
  2. Added the neighbor gem for vector similarity search
  3. Built a chunking service that converts database records into text passages
  4. Indexed 77,000 chunks from job notes, time-off requests, accident reports, chat messages, site info
  5. Built an embedding service with OpenAI text-embedding-3-small at 512 dimensions
  6. Created a query pipeline: embed question → vector search → feed top 10 chunks to LLM → answer

Here's what a chunk looked like after indexing:

[Job Note] General Note
Job #4821 | Client A | North Scanning Centre
Date: 2024-08-15

Use the correct colour coding for waste bins.
Ensure sharps containers are sealed before disposal.
Check all hand sanitizer dispensers are refilled.

The infrastructure was impressive. MySQL 9 with native vector columns. Cosine similarity search in SQL. Deduplication via SHA256 digests. Real-time sync via after_commit callbacks. Nightly bulk re-sync via Solid Queue.

Total embedding cost: $0.55 for 77,000 chunks.

The moment RAG failed

I asked the first real question:

“Which staff have checked in but not checked out?”

RAG found text chunks that mentioned check-ins — job notes about check-in procedures, site instructions about GPS requirements. It returned paragraphs of text. None of them answered the question.

The actual answer is a SQL query:

SELECT u.first_name, u.last_name, ju.checked_in_at, j.id
FROM job_users ju
JOIN users u ON ju.user_id = u.id
JOIN jobs j ON ju.job_id = j.id
WHERE ju.checked_in_at IS NOT NULL
  AND ju.checked_out_at IS NULL
  AND ju.deleted_at IS NULL

That's when it clicked. My data is structured. My questions have exact answers. RAG finds similar text — but I need exact rows.

Every question the admin asks — check-in status, holiday balances, GPS locations, audit trails — maps to a SELECT query with JOINs, WHERE clauses, and aggregations. Vector similarity search is the wrong tool entirely.

The replacement: Text-to-SQL in 200 lines

I ripped out the vector search and replaced it with something much simpler. The entire pipeline is two LLM calls:

Call 1: Send database schema + question → LLM returns a SQL query

Execute: Run the SQL against MySQL (read-only, max 50 rows)

Call 2: Send question + SQL + results → LLM explains the answer

The schema prompt is the key. I describe every table, column, type, and relationship the LLM needs to know:

SYSTEM_PROMPT = <<~PROMPT
  You are a SQL expert for a medical services platform.

  ## Schema
  - users: id, email, first_name, last_name, role(integer),
    archived(boolean), sign_in_count, last_sign_in_at
  - jobs: id, client_id, site_id, datetime, state(string),
    deleted_at(soft delete)
  - job_users: id, job_id, user_id, checked_in_at,
    checked_out_at, bypassed_location(boolean)
  - user_locations: id, user_id, job_id, name(address),
    out_of_range(boolean), parameters(json with distance,
    latitude, longitude)
  - time_off_requests: id, user_id, start_time, end_time,
    state, request_type, duration_in_days, notes
  - versions: id, item_type, item_id, event, whodunnit,
    object_changes(json), created_at

  ## Rules
  1. Only generate SELECT queries
  2. Always LIMIT to 50 rows
  3. Join to users/sites to show names, not IDs
  4. For soft deletes, check deleted_at IS NULL
PROMPT

That's it. No vector database. No embeddings. No chunking pipeline. No MySQL 9. The LLM reads the schema description and writes SQL that runs on plain MySQL 8.4.

The model: Claude Haiku (claude-haiku-4-5-20251001). The cheapest Claude model at $0.25/M input tokens. It generates correct SQL with JOINs, subqueries, aggregations, and date arithmetic consistently.

Real queries against production data

Here are actual results from the system running against production data. Names and identifiers are anonymized.

“Which staff have checked in but not checked out?”

The LLM generated a query joining job_users, users, jobs, clients, and sites. Result:

50 staff members checked in without checking out.

Most concerning:
1. Staff Member A — checked in March 2024 (over a year ago)
   Job state: "started" — never progressed
2. Staff Member B — checked in March 2024
   Job state: "read_note" — unusual state
3. Staff Member C — 7 separate incomplete check-ins
   Suggests repeated check-in issues

Recommendation: Investigate the oldest entries (March 2024)
as these are likely data anomalies that should be manually
resolved.

“Show out-of-range check-ins. Any anomalous patterns?”

The LLM queried the location tracking table, joined to users and jobs, filtered for out_of_range = 1, and analyzed the patterns:

Anomalous patterns detected:

Staff Member D:
- Significantly more out-of-range flags than any other user
- Multiple identical coordinates in quick succession
  (genuine movement shows continuous variation)
- Some recorded distances physically implausible

Staff Member E:
- 100% of location pings were out-of-range across
  multiple jobs

Recommendation: Flag for review by operations manager.
Cross-reference with job completion records.

“Summarize sick leave usage this year”

Sick leave summary (approved requests only):

- Total employees with sick leave: 25
- Range: 1 to 9 days per person
- Top user: 2 requests but longest total duration
- Most frequent: 6 separate requests, shorter each time

Pattern note: one employee has the most frequent
requests despite fewer total days — could indicate
a recurring condition or a pattern worth reviewing
with HR.

“What are the busiest sites?”

Top 5 sites by job volume:
1. Site A — over 1,500 jobs (40% completed)
2. Site B — over 1,300 jobs (37% completed)
3. Site C — over 1,000 jobs (34% completed)
4. Site D — over 1,000 jobs (44% completed)
5. Site E — over 600 jobs (13% completed)

One client dominates with 4 of the top 5 sites.
Site E has the lowest completion rate (13%) — the AI
flagged this and suggested investigating whether it
indicates operational issues or a different job type.

Security: how to safely let AI write SQL

Letting AI generate SQL against production sounds dangerous. It is — if you rely on application-level validation alone. Here's the layered approach:

Layer 1: Database-level (the real guardrail)

Read-Only Database User

The AI assistant connects with a MySQL user that only has SELECT privileges. Even if the LLM generates DROP TABLE, MySQL refuses it. This is the only guardrail that truly matters.

-- Create a read-only user for the AI assistant
CREATE USER 'ai_readonly'@'%' IDENTIFIED BY '...';
GRANT SELECT ON your_database.* TO 'ai_readonly'@'%';

-- Optionally restrict to specific tables
REVOKE SELECT ON your_database.sensitive_table FROM 'ai_readonly'@'%';

Layer 2: Application-level (defense in depth)

Query Validation

Must start with SELECT. Dangerous keywords blocked via word-boundary matching (so deleted_at passes but DELETE is blocked). Max 50 rows via forced LIMIT. Query timeout at 5 seconds.

Column Blocklist

Queries referencing encrypted_password, reset_password_token, or other sensitive columns are rejected before execution.

Table Allowlist

Only operational tables are queryable. System tables like information_schema and auth tables are excluded from the schema prompt entirely.

Full Audit Log

Every question, generated SQL, result count, model used, and duration are logged with the user ID. Every query is reviewable.

Layer 3: Infrastructure (production best practice)

  • Read replica: Point the AI at a replica, not the primary database. Zero risk to production writes, zero load on the primary.
  • Query timeout: Kill any query that runs longer than 5 seconds. Prevents accidental CROSS JOINs from consuming resources.
  • Rate limiting: Cap queries per user per hour. Prevents abuse or runaway automation.

Important

Application-level SQL validation is defense in depth, not your primary security layer. A determined attacker or a hallucinating LLM can craft queries that bypass regex checks (e.g., SELECT * INTO OUTFILE, subqueries with write operations). The read-only database user is the guardrail that actually protects you.

The application-level validation code:

def validate_and_execute(sql)
  normalized = sql.strip.gsub(/\s+/, " ").upcase
  return { error: "Only SELECT" } unless normalized.start_with?("SELECT")

  # Word-boundary check — "deleted_at" passes, "DELETE" is blocked
  sql_words = normalized.split(/[\s;,()]+/)
  dangerous = %w[INSERT UPDATE DELETE DROP ALTER TRUNCATE GRANT REVOKE]
  if dangerous.any? { |kw| sql_words.include?(kw) }
    return { error: "Forbidden keyword" }
  end

  # Block sensitive columns
  blocked = %w[ENCRYPTED_PASSWORD RESET_PASSWORD_TOKEN CONFIRMATION_TOKEN]
  if blocked.any? { |col| normalized.include?(col) }
    return { error: "Sensitive column blocked" }
  end

  # Block system tables
  return { error: "System table blocked" } if normalized.include?("INFORMATION_SCHEMA")

  sql = "#{sql.chomp(';')} LIMIT 50" unless normalized.include?("LIMIT")

  # Execute with timeout
  ActiveRecord::Base.connection.execute("SET SESSION MAX_EXECUTION_TIME=5000")
  data = connection.select_all(sql).to_a
  { data: data, error: nil }
end

Cost breakdown: $0.04 for 15 queries

I tested 15 real questions against production data. Total cost:

WhatCost
14 queries on Claude Haiku ($0.002 each)$0.028
1 query on Claude Sonnet (comparison test)$0.015
Total (15 queries)$0.043

For comparison, the RAG approach would have cost:

ComponentRAGText-to-SQL
InfrastructureMySQL 9 + vector columnsExisting MySQL 8.4
Initial indexing$0.55 (77K embeddings)$0 (no indexing)
Per query~$0.003 (embed + LLM)$0.002 (2 LLM calls)
Extra gemsneighbor, tiktokenanthropic only
Sync pipelineChunker + embedder + nightly jobNone needed
Answer accuracyFuzzy (similar text)Exact (real database rows)

The Sonnet comparison was interesting — it produced slightly more polished prose but generated the same SQL. Haiku at 7.5x cheaper with the same data accuracy is the clear winner for this use case.

When RAG is actually the right choice

RAG isn't bad. I used it wrong. Here's the decision framework:

Your data is...UseWhy
MySQL/Postgres tables with defined columnsText-to-SQLQuestions have exact answers in rows. SQL can calculate, join, aggregate.
PDFs, contracts, legal documentsRAGUnstructured text. No schema. Need semantic similarity to find relevant passages.
Emails, support tickets, free-text notesRAGUser's vocabulary differs from source text. “Hygiene issues” should match “mopping not done.”
Mix of both (structured DB + document attachments)HybridSQL for operational queries, RAG for document search. Route based on question type.

The mistake I made: treating structured data like unstructured data. My 77,000 “chunks” were just database rows converted to text paragraphs. They were structured all along — I was throwing away the structure by embedding them.

If your app has a relational database and your users ask operational questions, you probably don't need RAG either. A 200-line SQL generator with the cheapest LLM will outperform a vector search pipeline every time.

But I already had a free solution

Here's the part I should have realized sooner. Before building any of this, I had already built a Claude Code skill file that answers the exact same questions — for free, using my existing Claude Code subscription.

The skill generates ActiveRecord queries, runs them on the production server via kamal console, and interprets the results. Same questions, same data, same answers. No API key, no extra cost.

So when do you need each approach?

AI Skill FileText-to-SQL Assistant
Who uses itYou (developer, in terminal)Anyone (admin, in browser)
Cost$0 (Claude Code subscription)~$0.002/query (API key)
Query powerFull Ruby/ActiveRecord — can run any logicSELECT only — read-only queries
ContextKnows your codebase, models, scopes, business logicOnly knows the schema you describe
SetupOne markdown fileService class + controller + UI + API key
Best forDeveloper investigating bugsNon-technical admin self-service

The skill file is better for developers. It has full codebase knowledge, can run any Ruby code, understands your models and scopes, and costs nothing. If you're the one answering data questions, build a skill file and stop there.

The SQL assistant is for when someone else needs answers. If your operations manager, client support team, or non-technical admin needs to investigate data without asking a developer — that's when the browser-based SQL assistant earns its $0.002 per query. It's a self-service tool. The skill file is a power tool for the person who built the system.

FAQ

Can the AI hallucinate data?

No. The AI generates SQL. Your database executes it. The results are real rows. If the SQL is wrong, you get an error or empty results — not fabricated numbers. The AI only interprets what the database actually returns.

What if the AI generates a bad query?

It happens. Column name mismatches are the most common failure. The system returns the MySQL error message, and the admin can rephrase. In testing, Haiku generated valid SQL about 85% of the time on the first try. The other 15% were schema mismatches that improved as I refined the schema description.

Why not just give the admin a SQL console?

Because the admin doesn't know SQL. The value is the translation layer — natural language in, human-readable analysis out. The SQL is hidden by default, expandable for technical users who want to verify.

Did you keep any of the RAG infrastructure?

No. The vector table, embedding service, chunking pipeline, and MySQL 9 dependency were all removed. The final system is just the SQL assistant, a query log table, and the Anthropic gem. Sometimes the best engineering is knowing what to delete.

The takeaway

RAG is a powerful pattern — for the right problem. If your data lives in a relational database and your questions have factual answers, skip the vector pipeline. Give the LLM your schema, let it write SQL, and let your database do what it's been doing for 40 years: return exact answers to precise queries.

Total cost of the lesson: $0.04 in API calls, one Docker container I no longer need, and 77,000 embeddings that taught me when not to use AI's trendiest pattern.

Related

Why Your Next DevOps Hire Might Be a 200-Line Skill File

How I built 3 AI skill files that query production like a senior dev — the companion piece about encoding domain knowledge into AI.