SQL is often criticzed for being verbose and missing primitives for reuse and composability.
While those do exist, they're often internal to the database - such as functions and views.
Trilogy attempts to holistically address these concerns from two angles: relationship reuse via the semantic layer, and pattern reuse through functions/macros.
These are applied at the generation layer above SQL, meaning they are database agnostic and simpler to test and migrate.
Semantic Reuse
Semantic reuse is a core Trilogy value proposition, which has two subdimensions:
- Decoupling from the physical layer via the semantic/business layer
- Syntax for reuse of common expressions via CTEs and Functions
Physical Asset Abstraction / Reusable Business Logic
The first tenant of semantic reuse is reuse across physical assets. This faciliates testing, portability, and OLAP style query resolution.
When a basic model is defined:
key order int;
property order.order_placed datetime;
key customer_id int;
property customer_id.customer_name string;
metric order_count <-count(orders);
If you wanted to run
SELECT
customer.name,
customer.id,
order_count
ORDER BY
order_count desc limit 10;
That model can be "bound" against any physical model.
Split Datasources
datasource orders (
id: order,
placed: order_placed
c_id: customer_id
)
grain (id)
address orders
;
datasource customer (
id: customer_id,
name: customer_name
)
grain (customer_id)
address customers
;
Flatted
datasource orders_flat (
id: order,
placed: order_placed,
c_id:customer_id,
c_name: customer_name
)
grain (id)
address orders_flat
No query update is required.
SELECT
customer.name,
customer.id,
order_count
ORDER BY
order_count desc limit 10;
Can resolve from either.
You can continue to extend the semantic model to add new derivations and definitions, and those will be automatically accessible when you bind it to physical data.
This makes testing and refactoring simpler:
- for tests, simply bind synthetic datasources and validate outputs
- for refactoring, you can run with new datasources/old datasources and validate results without changing your queries.
Optimal Source Resolution
This also means that if you define an aggregated source, we can automatically resolve directly from that by selecting the source that resolves our query with the least joins. [OLAP style aggregation/drilldown].
Warning
This is a much more complicated space with regards to correctness/invalidation - it can be powerful, but use with care!
datasource customer_orders_agg (
c_id:customer_id,
order_count: order_count
)
grain (c_id)
address customer_agg;
Semantic Model Reuse
Semantic model reuse is the next principle. Once defined, models can be imported and composed to create new models.
In this case, with a model of
key id int;
property id.street string;
property id.zip string;
auto text_address <- concat(street, ',', zip);
and an import referencing it twice in different contexts:
import address as ship_address;
import address as billing_address;
key order_id int;
datasource orders(
id:order_id,
addr_id_ship: ship_address.id,
addr_id_bill: billing_address.id
)
grain (id)
address orders
;
The address model has been reused to enable independent access and querying of the address model in two different contexts.
At a lower level, the 'text_address' derivation means that that derived expression is accessible in any context where both components can be resolved.
SELECT
count(order_id)->order_count,
order.ship_address.zip,
order.billing_address.zip,
;
For a practical example, check out the reusable funnel analysis example.
Logical Reuse
The other aspect of reuse is enabling concisely expressing repeated transformations.
Trilogy currently supports this in two ways - functions and CTs
Functions
Functions are reusable expression fragments with templating.
Tips
It's reasonable to ask why functions don't work on entire table statements, such as selects, to enable one to generate a partially templated query on demand.
There's no particular reason - we just haven't implemented it yet.
def pretty_print_ratio(a,b, sigfig=4) -> round(a::float / b, sigfig)*100;
select
@pretty_print_ratio(33, 100, 2);
Within the scope of a function, any declared interface reference is replaced in the expression by the passed in values at runtime. References that are not defined in local scope will be pulled in from global scope.
SELECT
SUM(CASE WHEN date.day_of_week = 0 THEN web_sales.extra_sales_price ELSE 0.0 END) +
SUM(CASE WHEN date.day_of_week = 0 THEN catalog_sales.extra_sales_price ELSE 0.0 END)
as sunday_sales,
SUM(CASE WHEN date.day_of_week = 1 THEN web_sales.extra_sales_price ELSE 0.0 END) +
SUM(CASE WHEN date.day_of_week = 1 THEN catalog_sales.extra_sales_price ELSE 0.0 END)
as monday_sales,
SUM(CASE WHEN date.day_of_week = 2 THEN web_sales.extra_sales_price ELSE 0.0 END) +
SUM(CASE WHEN date.day_of_week = 2 THEN catalog_sales.extra_sales_price ELSE 0.0 END)
as tuesday_sales,
SUM(CASE WHEN date.day_of_week = 3 THEN web_sales.extra_sales_price ELSE 0.0 END) +
SUM(CASE WHEN date.day_of_week = 3 THEN catalog_sales.extra_sales_price ELSE 0.0 END)
as wednesday_sales,
SUM(CASE WHEN date.day_of_week = 4 THEN web_sales.extra_sales_price ELSE 0.0 END) +
SUM(CASE WHEN date.day_of_week = 4 THEN catalog_sales.extra_sales_price ELSE 0.0 END)
as thursday_sales,
SUM(CASE WHEN date.day_of_week = 5 THEN web_sales.extra_sales_price ELSE 0.0 END) +
SUM(CASE WHEN date.day_of_week = 5 THEN catalog_sales.extra_sales_price ELSE 0.0 END) as friday_sales,
SUM(CASE WHEN date.day_of_week = 6 THEN web_sales.extra_sales_price ELSE 0.0 END) +
SUM(CASE WHEN date.day_of_week = 6 THEN catalog_sales.extra_sales_price ELSE 0.0 END)
as saturday_sales,
This can be rewritten as per below, pulling in two references from the global scope and one from the function bound.
def weekday_sales(weekday) ->
SUM(CASE WHEN date.day_of_week = weekday
THEN web_sales.extra_sales_price ELSE 0.0 END) +
SUM(CASE WHEN date.day_of_week = weekday
THEN catalog_sales.extra_sales_price ELSE 0.0 END)
;
SELECT
@weekday_sales(0) as sunday_sales,
@weekday_sales(1) as monday_sales,
@weekday_sales(2) as tuesday_sales,
@weekday_sales(3) as wednesday_sales,
@weekday_sales(4) as thursday_sales,
@weekday_sales(5) as friday_sales,
@weekday_sales(6) as saturday_sales
;
Named Queries/CTEs
A select statement can be named using the with keyword and reused.
This is helpful to define reusable base rowsets:
SELECT
customer_id as id,
sum(order.revenue)- sum(order.profit) as revenue;
SELECT
customer_value.id
customer_values.revenue,
;
But also to build business logic before you even have datasources, and then later bind datasources to the input of your query.