Python Datasources
Python Datasources
Python datasources are backed by a UV-style Python script that emits a PyArrow table to stdout. Trilogy executes the script and reads the result as the datasource. Support for python sources depends on backend engine.
Script Requirements
The script must:
- Use a
#!/usr/bin/env -S uv runshebang with an inline/// scriptdependency block. - Write a
pyarrow.Tabletosys.stdout.bufferusing the Arrow IPC stream format.
#!/usr/bin/env -S uv run
# /// script
# requires-python = ">=3.13"
# dependencies = ["pyarrow"]
# ///
import sys
import pyarrow as pa
def emit(table: pa.Table) -> None:
with pa.ipc.new_stream(sys.stdout.buffer, table.schema) as writer:
writer.write_table(table)
if __name__ == "__main__":
table = pa.table({
"id": pa.array([1, 2, 3]),
"name": pa.array(["Alice", "Bob", "Carol"]),
})
emit(table)
Datasource Declaration
Reference the script with a backtick file path — the .py extension tells Trilogy to execute it rather than read it as a flat file.
key user_id int;
property user_id.name string;
datasource users (
id: user_id,
name: name,
)
grain (user_id)
file `./users.py`;
Root Python Datasources
Python scripts are commonly used as root datasources when pulling from external APIs or doing enrichment that Trilogy should not try to persist itself.
root datasource sf_raw_tree_info (
TreeID: tree_id,
city: city,
qSpecies: species,
PlantDate: ?plant_date,
Latitude: ?latitude,
Longitude: ?longitude,
)
grain (tree_id)
complete where city = 'USSFO'
file `./sf_tree_info.py`;
The script fetches from the SF Open Data API and emits the result as an Arrow table. Trilogy treats it as a read-only root and will not attempt to write back to it.
Freshness Probes
A separate probe script can be used as the freshness by target. The probe should emit a single-row table with a timestamp column that Trilogy compares against the current watermark.
datasource tree_enrichment (
species,
is_evergreen,
wildlife_value,
)
grain (species)
file f`https://storage.googleapis.com/my-bucket/tree_enrichment_v{data_version}.parquet`
: f`gcs://my-bucket/tree_enrichment_v{data_version}.parquet`
freshness by `./tree_enrichment_probe.py`;
The probe script follows the same UV conventions and emits a table with a single datetime column. Trilogy reads the value to decide whether the cached file is stale.
Tips
Separating the probe from the main script keeps refreshes cheap — the probe is a lightweight API call or metadata read, while the full script only runs when data is genuinely stale.
