Trilogy

Datasources, Datasinks, Data

Datasources, datasinks, data - canonically called datasources in Trilogy - all represent the same thing. Materialized or on-demand-computed data.

They are what bind the logical computation model to physical resources, and enable you to move data in and out of the logical model.

Roots

Root datasources - labeled with the prefix root - represent data arriving into your system that is not managed by Trilogy.

Trilogy will generally not operate on these, and uses them as canonical watermarks for concepts bound to them.

Tips

For example, if you are importing in external data with a PK into your warehouse; that might be the 'root' datasource for all your derived computations from that data.


root datasource (...);

Standard

A standard datasource can be either read or written to by Trilogy, depending on your exact statements.


datasource (...)
incremental by <field>;

Partial

Partial is an optional keyword that marks every bound field on the datasource partial - this removes a source of error when managing concepts.

Partial datasources often make the most sense when you have many roots that are combined to produce one full dataset, such as an archive and recent table, or multiple sources that are merged into one canonical dataset. (tree datasets from multiple cities.)

A partial source will almost always have a 'complete where' modifier, which marks the filter condition for which it is "complete" - the full dataset.

Tips

When a complete where clause matches a query filter, queries can be optimally resolved directly from partial sources - they are complete! Trilogy will always attempt to push down in this way when possible.


partial datasource (...)
complete where field=thing;

Lifecycle

Persist

Datasources can be managed one of two ways. They can be directly modified via persist statements; this is similar to running an insert statement in SQL. Full and incremental persists are possible.

persist into <datsource> from <select>

Refresh

When running from the CLI, a data model can also be 'refreshed' - this will watermark all datasource incrementality fields and update anything that is stale.

This is a more asset-oriented model that minimizes computation and is recommended when possible.