Datasources

Datasources, datasinks, data - canonically called datasources in Trilogy - all represent the same thing. Materialized or on-demand-computed data.

They are what bind the logical computation model to physical resources, and enable you to move data in and out of the logical model.

Backends

Datasources support three backend types, each documented in their own page:

Database — tables and queries in a connected database (e.g. DuckDB, BigQuery, Snowflake)
File — local or remote files (Parquet, CSV, JSON)
Python — UV-style Python scripts that emit an Arrow table

Formatted Addresses (f-strings)

Any address or file path can be written as a backtick string prefixed with f, enabling concept interpolation. This works across all backend types — database addresses, file paths, and remote URLs.

# version variable drives the path at query time
datasource my_table (...)
grain (id)
file f`https://storage.googleapis.com/my-bucket/data_v{data_version}.parquet`;

Roots

Root datasources - labeled with the prefix root - represent data arriving into your system that is not managed by Trilogy.

Trilogy will generally not operate on these, and uses them as canonical watermarks for concepts bound to them.

Tips

For example, if you are importing in external data with a PK into your warehouse; that might be the 'root' datasource for all your derived computations from that data.

root datasource (...);

Standard

A standard datasource can be either read or written to by Trilogy, depending on your exact statements.

Datasources may be marked with one of two fields

Incremental By

Defines an incrementing key to check freshness. Typicaly examples here could include integer PKs or dates. An incremental datasources can be appended to using this key.

datasource (...)
incremental by <field>;

Freshness By

Defines a watermark column; a stale source will be rebult.

datasource (...)
freshness by latest_landmark_update_through;

Tips

Use freshness when a datasource needs to be entirely rebuilt; use incrementality when you want to do optimized incremental loading.

Partial

Partial is an optional keyword that marks every bound field on the datasource partial - this removes a source of error when managing concepts.

Partial datasources often make the most sense when you have many roots that are combined to produce one full dataset, such as an archive and recent table, or multiple sources that are merged into one canonical dataset. (tree datasets from multiple cities.)

A partial source will almost always have a 'complete where' modifier, which marks the filter condition for which it is "complete" - the full dataset.

Tips

When a complete where clause matches a query filter, queries can be optimally resolved directly from partial sources - they are complete! Trilogy will always attempt to push down in this way when possible.

partial datasource (...)
complete where field=thing;

Lifecycle

Persist

Datasources can be managed one of two ways. They can be directly modified via persist statements; this is similar to running an insert statement in SQL. Full and incremental persists are possible.

persist into <datasource> from <select>

Refresh

When running from the CLI, a data model can also be 'refreshed' - this will watermark all datasource incrementality fields and update anything that is stale.

This is a more asset-oriented model that minimizes computation and is recommended when possible.

trilogy refresh <path_to_folder>

Tips

Persist vs Refresh is imperative vs declarative; we recommend using declarative whenever it makes sense, and especially for managinag warehouse processing and updates. Persist can be useful for adhoc scripts or exports.