Thesis
Thesis
SQL has endured as the default language for analytics not because it is perfect, but because alternatives attack the wrong layer. Real pain comes from SQL tightly coupling semantic meaning to physical tables, joins, and storage details, forcing them to change in lockstep even when that is unnatural.
Trilogy solves this by making semantics first-class while preserving the declarative workflow that makes SQL powerful. Less duplicated logic, fewer decisions made with bad data, safer schema evolution, and more time spent on getting value from data instead of mechanical plumbing.
The Problem
SQL excels at defining relational algebra. That's important - but not what analytics needs. When you need an answer, which physical table currently stores the data is a chore; whether a metric now comes from a replicated, aggregated, or backfilled source is a landmine. You care about the meaning of the data and the transformations you want to express.
Data teams hit walls with the same problems at scale:
- Business logic gets duplicated across dashboards, reports, notebooks, and ad hoc queries
- Lineage and governance become harder as logic spreads through raw SQL
- Schema changes break downstream work or produce silently stale results
- Grain mistakes and accidental duplication lead to bad decisions
- Teams spend too much time focusing on data infrastructure instead of acting on insight.
Teams can try to solve this with software engineering patterns, semantic layers, BI tools, or new query languages. But analytics has a constraint that application developers do not face in the same way: data has gravity. The physical layer matters. Warehouses, tables, partitions, and execution engines are real constraints, and SQL has been the best tool to deal with those - native to warehouses; meeting data where it is, and playing to their strengths.
The Solution
Trilogy separates those concerns.
Trilogy includes a lightweight built-in semantic language for defining business concepts directly alongside queries. That makes it possible to move intent, relationships, and grain out of repeated query text and into reusable definitions. Queries stay declarative and familiar, while the compiler handles the physical details that create so much SQL sprawl.
Trilogy compiles to raw SQL and works with your existing databases, so adoption does not require replacing your warehouse or committing to a closed execution model.
This gives you:
- Static typing
- Automatic join inference where relationships are explicitly modeled
- Grouping and filtering with grain awareness
- Null and aggregation safety
- Declarative asset creation and updates
- A path from exploration to production in the same language
Design Principles
- Declarative first — express the answer you want, not the warehouse plumbing needed to get it
- Close to SQL — preserve the familiarity and flexibility that made SQL durable
- Safe by default — model grain, nullability, relationships, and aggregation semantics explicitly
- Reusable — define logic once and use it across many queries and assets
- Incremental to adopt — start with one domain, one model, or one workflow without rebuilding your stack
What Does It Look Like?
Trilogy looks like SQL, but removes the warehouse-specific machinery that makes analytics queries brittle and makes onboarding hard.
You should not need to restate join paths in every query. You should not need to manually manage GROUP BY clutter when the semantic grain is already known. You should not need special-case mental models for filtering over windowed or aggregated logic when the language can reason about it directly.
Experts may know how to thread all of those details together. Non-experts often get them wrong. Neither group benefits from rediscovering or re-remembering semantic intent from physical implementation every time they need to pull a number from the database.
We can remove that burden at the language level.
WHERE
SELECT
# FROM - moved to semantic layer
# JOIN - moved to semantic layer
# GROUP BY - resolved automatically
# QUALIFY - no special handling required for window functions
HAVING
ORDER BY
Tips
SQL is already declarative. Trilogy takes this further - write the business question without worrying about warehouse plumbing and be confident it will keep working.
Deeper Example
# models can be imported and reused
import store_sales as store_sales;
# where clause idiomatically comes first, but can be run later for familiarity
WHERE
store_sales.date.year=2001
and store_sales.date.month_of_year=1
and store_sales.item.current_price > 1.2 *
avg(store_sales.item.current_price) by store_sales.item.category
# select looks normal
SELECT
store_sales.customer.state,
count(store_sales.customer.id) as customer_count
# having filters the output of select
HAVING
customer_count>10
# order by is normal
ORDER BY
customer_count asc nulls first,
store_sales.customer.state asc nulls first
LIMIT 100;
A Fair SQL Comparison
SELECT
a.ca_state AS state,
COUNT(*) AS customer_count
FROM customer_address a
INNER JOIN customer c
ON a.ca_address_sk = c.c_current_addr_sk
INNER JOIN store_sales s
ON c.c_customer_sk = s.ss_customer_sk
INNER JOIN date_dim d
ON s.ss_sold_date_sk = d.d_date_sk
INNER JOIN item i
ON s.ss_item_sk = i.i_item_sk
WHERE d.d_month_seq = (
SELECT DISTINCT d_month_seq
FROM date_dim
WHERE d_year = 2001
AND d_moy = 1
)
AND i.i_current_price > 1.2 * (
SELECT AVG(j.i_current_price)
FROM item j
WHERE j.i_category = i.i_category
)
GROUP BY
a.ca_state
HAVING COUNT(*) >= 10
ORDER BY
customer_count ASC NULLS FIRST,
a.ca_state ASC NULLS FIRST
LIMIT 100;
SQL can clearly express what is needed.
The complexity lives in places that are expensive to repeat across hundreds or thousands of queries:
- Physical tables and join paths
- Repeated subquery structure
- Manual grouping mechanics
- Logic that is hard to share, review, and evolve safely
Traditional table definitions usually do not carry enough semantic metadata for this: keys, grain, nullability, relationship direction, business meaning, and safe aggregation behavior are often implicit, scattered, or tribal knowledge.
In Trilogy, that complexity moves into reusable semantic definitions. Joins no longer need to be rebuilt by hand every time someone asks a new question.
A concrete example: the physical storage changes from normalized tables to a denormalized table, but the query does not change because the semantic interface is stable.
# what do we need to know?
key order_id int;
properties order_id {
order_time datetime # placed time
}
key product_id string; # SKU; ABC-123 for example
properties <order_id,product_id> {
qty float, # number ordered
}
# how is it stored
datasource orders (
id: order_id
submit_time: order_time
)
grain (order_id)
address tbl_order;
# check our bindings; the physical layout is checked against logical model
validate orders;
# Joins are implicitly based on shared keys
datasource orders_items (
order_id
product_id,
qty,
)
grain (order_id)
address tbl_order_items_final;
# queries do not need to specify tables
select
order_time::date as order_date,
avg(sum(qty) by order_id) avg_order_product_count,
avg(sum(qty) by order_id, product_id) avg_order_per_product_count
;
# refactor when convenient
datasource order_denormalized (
order_id
order_time
product_id,
qty,
)
grain (
order_id, product_id
)
address tbl_order_denormalized;
# build our table
refresh order_denormalized;
# same query, no joins needed, will hit denormalized.
select
order_time::date as order_date,
avg(sum(qty) by order_id) avg_order_product_count,
avg(sum(qty) by order_id, product_id) avg_order_per_product_count
;
Why don't we all just use SQL, again?
SQL is fantastic. It is portable, declarative, expressive, and deeply integrated into every serious data system, from DuckDB to distributed clusters spanning the world. For many tasks, it is still the right tool.
But SQL solves a different problem than the one most analytics teams actually have.
In modern analytics systems, the table is often not the interface people care about. Tables are replicated, renamed, backfilled, aggregated, deprecated, partitioned, and reorganized. The physical container may change; the business question does not.
That is where raw SQL starts to break down.
SQL is a language for operating on tables. Trilogy is a language for getting value from imperfect, evolving data and warehouses - the kind that exist in real data work.
Why Existing Alternatives Fall Short
Many tools recognize the same pain, but struggle in similar ways:
- They move too far away from SQL and lose the familiarity that makes adoption practical
- They treat semantics as an external modeling workflow rather than a core part of exploration that can organically evolve to production
- They improve syntax - such as moving the table earlier, or adding a dot - without really improving reuse, safety, or schema resilience
- They stay focused on metrics or BI use cases rather than the broader surface area of analytics logic
Trilogy's bet is different:
- stay close to SQL
- keep the workflow declarative
- make semantics explicit
- let modeling and querying happen together
Related Projects
We're not alone in thinking SQL can be improved. Related projects include:
Trilogy stands apart by staying close to SQL while making semantics, reuse, and safety more central to the language itself.
How do you get started?
Trilogy starts with a small investment: define a thin semantic layer around one domain, workflow, or dataset. Once that exists, every query built on top of it becomes easier to write, safer to change, and easier to share.
You do not need to model everything up front.
You can start close to the consumption layer, use Trilogy for query authoring and reusable definitions, and then push backward into ETL or transformation workflows over time. At every step, you can integrate with the systems you already use.
Unlike most semantic layers, Trilogy does not require a completely separate modeling workflow before you can do useful work. You can define and extend models inline as part of exploration.
You do not need a separate file, tool, or phase just to get started. You can model and query in one place.
Read more in the concepts and references section to learn how Trilogy works under the hood and the nuances of query design and setup.
Usage
You can:
