The app: a medical services management platform
I maintain a Rails 8 application that manages cleaning operations for medical facilities. Staff check in and out of jobs via GPS, managers approve time-off requests, and the system tracks everything — tens of thousands of jobs, hundreds of users, and a substantial audit trail.
The super admin gets questions like:
- “Did Staff Member A check in for their job on Tuesday?”
- “Is this GPS check-in legitimate or did someone fake their location?”
- “Why is Thursday's job missing from the dashboard?”
- “How many sick days has the team taken this quarter?”
- “What changed on this job record? Who changed it?”
Answering each one means SSHing into the server, opening a Rails console, writing a query, interpreting the results. It takes 10-30 minutes per question. I wanted to build an AI assistant that could answer these instantly.
What I built: the full RAG pipeline
I did what every AI tutorial tells you to do. I built RAG — Retrieval Augmented Generation:
- Upgraded to MySQL 9 for native
VECTOR(512)columns - Added the
neighborgem for vector similarity search - Built a chunking service that converts database records into text passages
- Indexed 77,000 chunks from job notes, time-off requests, accident reports, chat messages, site info
- Built an embedding service with OpenAI
text-embedding-3-smallat 512 dimensions - Created a query pipeline: embed question → vector search → feed top 10 chunks to LLM → answer
Here's what a chunk looked like after indexing:
[Job Note] General Note
Job #4821 | Client A | North Scanning Centre
Date: 2024-08-15
Use the correct colour coding for waste bins.
Ensure sharps containers are sealed before disposal.
Check all hand sanitizer dispensers are refilled.The infrastructure was impressive. MySQL 9 with native vector columns. Cosine similarity search in SQL. Deduplication via SHA256 digests. Real-time sync via after_commit callbacks. Nightly bulk re-sync via Solid Queue.
Total embedding cost: $0.55 for 77,000 chunks.
The moment RAG failed
I asked the first real question:
“Which staff have checked in but not checked out?”
RAG found text chunks that mentioned check-ins — job notes about check-in procedures, site instructions about GPS requirements. It returned paragraphs of text. None of them answered the question.
The actual answer is a SQL query:
SELECT u.first_name, u.last_name, ju.checked_in_at, j.id
FROM job_users ju
JOIN users u ON ju.user_id = u.id
JOIN jobs j ON ju.job_id = j.id
WHERE ju.checked_in_at IS NOT NULL
AND ju.checked_out_at IS NULL
AND ju.deleted_at IS NULLThat's when it clicked. My data is structured. My questions have exact answers. RAG finds similar text — but I need exact rows.
Every question the admin asks — check-in status, holiday balances, GPS locations, audit trails — maps to a SELECT query with JOINs, WHERE clauses, and aggregations. Vector similarity search is the wrong tool entirely.
The replacement: Text-to-SQL in 200 lines
I ripped out the vector search and replaced it with something much simpler. The entire pipeline is two LLM calls:
Call 1: Send database schema + question → LLM returns a SQL query
Execute: Run the SQL against MySQL (read-only, max 50 rows)
Call 2: Send question + SQL + results → LLM explains the answer
The schema prompt is the key. I describe every table, column, type, and relationship the LLM needs to know:
SYSTEM_PROMPT = <<~PROMPT
You are a SQL expert for a medical services platform.
## Schema
- users: id, email, first_name, last_name, role(integer),
archived(boolean), sign_in_count, last_sign_in_at
- jobs: id, client_id, site_id, datetime, state(string),
deleted_at(soft delete)
- job_users: id, job_id, user_id, checked_in_at,
checked_out_at, bypassed_location(boolean)
- user_locations: id, user_id, job_id, name(address),
out_of_range(boolean), parameters(json with distance,
latitude, longitude)
- time_off_requests: id, user_id, start_time, end_time,
state, request_type, duration_in_days, notes
- versions: id, item_type, item_id, event, whodunnit,
object_changes(json), created_at
## Rules
1. Only generate SELECT queries
2. Always LIMIT to 50 rows
3. Join to users/sites to show names, not IDs
4. For soft deletes, check deleted_at IS NULL
PROMPTThat's it. No vector database. No embeddings. No chunking pipeline. No MySQL 9. The LLM reads the schema description and writes SQL that runs on plain MySQL 8.4.
The model: Claude Haiku (claude-haiku-4-5-20251001). The cheapest Claude model at $0.25/M input tokens. It generates correct SQL with JOINs, subqueries, aggregations, and date arithmetic consistently.
Real queries against production data
Here are actual results from the system running against production data. Names and identifiers are anonymized.
“Which staff have checked in but not checked out?”
The LLM generated a query joining job_users, users, jobs, clients, and sites. Result:
50 staff members checked in without checking out.
Most concerning:
1. Staff Member A — checked in March 2024 (over a year ago)
Job state: "started" — never progressed
2. Staff Member B — checked in March 2024
Job state: "read_note" — unusual state
3. Staff Member C — 7 separate incomplete check-ins
Suggests repeated check-in issues
Recommendation: Investigate the oldest entries (March 2024)
as these are likely data anomalies that should be manually
resolved.“Show out-of-range check-ins. Any anomalous patterns?”
The LLM queried the location tracking table, joined to users and jobs, filtered for out_of_range = 1, and analyzed the patterns:
Anomalous patterns detected:
Staff Member D:
- Significantly more out-of-range flags than any other user
- Multiple identical coordinates in quick succession
(genuine movement shows continuous variation)
- Some recorded distances physically implausible
Staff Member E:
- 100% of location pings were out-of-range across
multiple jobs
Recommendation: Flag for review by operations manager.
Cross-reference with job completion records.“Summarize sick leave usage this year”
Sick leave summary (approved requests only):
- Total employees with sick leave: 25
- Range: 1 to 9 days per person
- Top user: 2 requests but longest total duration
- Most frequent: 6 separate requests, shorter each time
Pattern note: one employee has the most frequent
requests despite fewer total days — could indicate
a recurring condition or a pattern worth reviewing
with HR.“What are the busiest sites?”
Top 5 sites by job volume:
1. Site A — over 1,500 jobs (40% completed)
2. Site B — over 1,300 jobs (37% completed)
3. Site C — over 1,000 jobs (34% completed)
4. Site D — over 1,000 jobs (44% completed)
5. Site E — over 600 jobs (13% completed)
One client dominates with 4 of the top 5 sites.
Site E has the lowest completion rate (13%) — the AI
flagged this and suggested investigating whether it
indicates operational issues or a different job type.Security: how to safely let AI write SQL
Letting AI generate SQL against production sounds dangerous. It is — if you rely on application-level validation alone. Here's the layered approach:
Layer 1: Database-level (the real guardrail)
Read-Only Database User
The AI assistant connects with a MySQL user that only has SELECT privileges. Even if the LLM generates DROP TABLE, MySQL refuses it. This is the only guardrail that truly matters.
-- Create a read-only user for the AI assistant
CREATE USER 'ai_readonly'@'%' IDENTIFIED BY '...';
GRANT SELECT ON your_database.* TO 'ai_readonly'@'%';
-- Optionally restrict to specific tables
REVOKE SELECT ON your_database.sensitive_table FROM 'ai_readonly'@'%';Layer 2: Application-level (defense in depth)
Query Validation
Must start with SELECT. Dangerous keywords blocked via word-boundary matching (so deleted_at passes but DELETE is blocked). Max 50 rows via forced LIMIT. Query timeout at 5 seconds.
Column Blocklist
Queries referencing encrypted_password, reset_password_token, or other sensitive columns are rejected before execution.
Table Allowlist
Only operational tables are queryable. System tables like information_schema and auth tables are excluded from the schema prompt entirely.
Full Audit Log
Every question, generated SQL, result count, model used, and duration are logged with the user ID. Every query is reviewable.
Layer 3: Infrastructure (production best practice)
- Read replica: Point the AI at a replica, not the primary database. Zero risk to production writes, zero load on the primary.
- Query timeout: Kill any query that runs longer than 5 seconds. Prevents accidental CROSS JOINs from consuming resources.
- Rate limiting: Cap queries per user per hour. Prevents abuse or runaway automation.
Important
Application-level SQL validation is defense in depth, not your primary security layer. A determined attacker or a hallucinating LLM can craft queries that bypass regex checks (e.g., SELECT * INTO OUTFILE, subqueries with write operations). The read-only database user is the guardrail that actually protects you.
The application-level validation code:
def validate_and_execute(sql)
normalized = sql.strip.gsub(/\s+/, " ").upcase
return { error: "Only SELECT" } unless normalized.start_with?("SELECT")
# Word-boundary check — "deleted_at" passes, "DELETE" is blocked
sql_words = normalized.split(/[\s;,()]+/)
dangerous = %w[INSERT UPDATE DELETE DROP ALTER TRUNCATE GRANT REVOKE]
if dangerous.any? { |kw| sql_words.include?(kw) }
return { error: "Forbidden keyword" }
end
# Block sensitive columns
blocked = %w[ENCRYPTED_PASSWORD RESET_PASSWORD_TOKEN CONFIRMATION_TOKEN]
if blocked.any? { |col| normalized.include?(col) }
return { error: "Sensitive column blocked" }
end
# Block system tables
return { error: "System table blocked" } if normalized.include?("INFORMATION_SCHEMA")
sql = "#{sql.chomp(';')} LIMIT 50" unless normalized.include?("LIMIT")
# Execute with timeout
ActiveRecord::Base.connection.execute("SET SESSION MAX_EXECUTION_TIME=5000")
data = connection.select_all(sql).to_a
{ data: data, error: nil }
endCost breakdown: $0.04 for 15 queries
I tested 15 real questions against production data. Total cost:
| What | Cost |
|---|---|
| 14 queries on Claude Haiku ($0.002 each) | $0.028 |
| 1 query on Claude Sonnet (comparison test) | $0.015 |
| Total (15 queries) | $0.043 |
For comparison, the RAG approach would have cost:
| Component | RAG | Text-to-SQL |
|---|---|---|
| Infrastructure | MySQL 9 + vector columns | Existing MySQL 8.4 |
| Initial indexing | $0.55 (77K embeddings) | $0 (no indexing) |
| Per query | ~$0.003 (embed + LLM) | $0.002 (2 LLM calls) |
| Extra gems | neighbor, tiktoken | anthropic only |
| Sync pipeline | Chunker + embedder + nightly job | None needed |
| Answer accuracy | Fuzzy (similar text) | Exact (real database rows) |
The Sonnet comparison was interesting — it produced slightly more polished prose but generated the same SQL. Haiku at 7.5x cheaper with the same data accuracy is the clear winner for this use case.
When RAG is actually the right choice
RAG isn't bad. I used it wrong. Here's the decision framework:
| Your data is... | Use | Why |
|---|---|---|
| MySQL/Postgres tables with defined columns | Text-to-SQL | Questions have exact answers in rows. SQL can calculate, join, aggregate. |
| PDFs, contracts, legal documents | RAG | Unstructured text. No schema. Need semantic similarity to find relevant passages. |
| Emails, support tickets, free-text notes | RAG | User's vocabulary differs from source text. “Hygiene issues” should match “mopping not done.” |
| Mix of both (structured DB + document attachments) | Hybrid | SQL for operational queries, RAG for document search. Route based on question type. |
The mistake I made: treating structured data like unstructured data. My 77,000 “chunks” were just database rows converted to text paragraphs. They were structured all along — I was throwing away the structure by embedding them.
If your app has a relational database and your users ask operational questions, you probably don't need RAG either. A 200-line SQL generator with the cheapest LLM will outperform a vector search pipeline every time.
But I already had a free solution
Here's the part I should have realized sooner. Before building any of this, I had already built a Claude Code skill file that answers the exact same questions — for free, using my existing Claude Code subscription.
The skill generates ActiveRecord queries, runs them on the production server via kamal console, and interprets the results. Same questions, same data, same answers. No API key, no extra cost.
So when do you need each approach?
| AI Skill File | Text-to-SQL Assistant | |
|---|---|---|
| Who uses it | You (developer, in terminal) | Anyone (admin, in browser) |
| Cost | $0 (Claude Code subscription) | ~$0.002/query (API key) |
| Query power | Full Ruby/ActiveRecord — can run any logic | SELECT only — read-only queries |
| Context | Knows your codebase, models, scopes, business logic | Only knows the schema you describe |
| Setup | One markdown file | Service class + controller + UI + API key |
| Best for | Developer investigating bugs | Non-technical admin self-service |
The skill file is better for developers. It has full codebase knowledge, can run any Ruby code, understands your models and scopes, and costs nothing. If you're the one answering data questions, build a skill file and stop there.
The SQL assistant is for when someone else needs answers. If your operations manager, client support team, or non-technical admin needs to investigate data without asking a developer — that's when the browser-based SQL assistant earns its $0.002 per query. It's a self-service tool. The skill file is a power tool for the person who built the system.
FAQ
Can the AI hallucinate data?
No. The AI generates SQL. Your database executes it. The results are real rows. If the SQL is wrong, you get an error or empty results — not fabricated numbers. The AI only interprets what the database actually returns.
What if the AI generates a bad query?
It happens. Column name mismatches are the most common failure. The system returns the MySQL error message, and the admin can rephrase. In testing, Haiku generated valid SQL about 85% of the time on the first try. The other 15% were schema mismatches that improved as I refined the schema description.
Why not just give the admin a SQL console?
Because the admin doesn't know SQL. The value is the translation layer — natural language in, human-readable analysis out. The SQL is hidden by default, expandable for technical users who want to verify.
Did you keep any of the RAG infrastructure?
No. The vector table, embedding service, chunking pipeline, and MySQL 9 dependency were all removed. The final system is just the SQL assistant, a query log table, and the Anthropic gem. Sometimes the best engineering is knowing what to delete.
The takeaway
RAG is a powerful pattern — for the right problem. If your data lives in a relational database and your questions have factual answers, skip the vector pipeline. Give the LLM your schema, let it write SQL, and let your database do what it's been doing for 40 years: return exact answers to precise queries.
Total cost of the lesson: $0.04 in API calls, one Docker container I no longer need, and 77,000 embeddings that taught me when not to use AI's trendiest pattern.
Related
Why Your Next DevOps Hire Might Be a 200-Line Skill File
How I built 3 AI skill files that query production like a senior dev — the companion piece about encoding domain knowledge into AI.