Text to SQL - Today
Text to SQL - Today
Overview
THe current text to SQL experience is magical. An AI will almost always get to a reasonable query given a prompt, assuming the provided context can satisfy. Remaining edge cases exist around ambiguity/clarification, avoiding hallucination when there is no valid answer, and extremely complex syntactic constructs.
So why isn't it everywhere?
SQL is hard.
The Problem with SQL
Agents have solved SQL syntax. But getting the right answer depends on a lot more than syntax. You need to understand the schema, the grain of tables, the meaning of columns, and more. You need to understand joins, foreign keys, and relationships. You need to understand filtering, nulls, and edge cases. You need to understand performance - some queries are just too slow to run in practice.
It's not that agents can't figure that out - it's that it's a lot of context to provide, and it's a lot of context to get right. THe problem of text to sql today is a problem of context engineering that is sustainable.
The problem of query rot
Even worse, bespoke SQL queries age very poorly - a typical warehouse might turn over 50% of key analytic tables every year or two. Your generation test cases rapidly fall out of date, and your prompts need to be updated. Prior artifacts are not reusable, and the cost of maintenance is high.
The Role of Trilogy
We think Trilogy is the right fit for that - it's lightweight, but expressive enough to capture everything an LLM needs. And the query resolution engine handles grain/joins automatically, removing what is a huge source of errors for LLMs.
Your queries change less, since they don't need to be updated for schema changes.
SQL like this has a shelf-life
select
customer.state,
count(distinct customer.id) as customer_count
from
dim_customer
where
is_active = true
group by
customer.state
;
This will inherently last longer - you can swap out tables, update the model and filters - and the query/target will always be the same.
```trilogy
select
active_customer.state,
active_customer.id.count
;
Industry
Lots of pretty cool examples out there. We would generally hope that the techniques applied to these models can then applied to trilogy generation as well, as an easier target.
Model time