File Datasources
File Datasources
File datasources bind concepts to local or remote files. Support for file ingest depends on the engine; DuckDB has a wide range of support; cloud DBs will be unable to see local files, etc.
Local File
A bare string path (relative to the model file) points to a local file.
datasource tree_enrichment (
species,
common_names,
is_evergreen,
mature_height_min_ft,
mature_height_max_ft,
wildlife_value,
fire_risk,
)
grain (species)
file `./tree_enrichment.parquet`;
Remote File
An HTTP or cloud-storage URL can be used directly. DuckDB will fetch the file at query time.
datasource tree_info (
tree_id,
city,
species,
?plant_date,
?latitude,
?longitude,
)
grain (tree_id)
file `https://storage.googleapis.com/my-bucket/trees/tree_info_v1.parquet`;
Primary/Fallback Paths
A colon (:) separates a read path from a write path. This is useful for models used in web browsers and refreshed offline; the fetch can be defined through the https address and the write (used offline in updates) is defined as the GCS portion.
datasource tree_info (
tree_id,
city,
species,
?plant_date,
?latitude,
?longitude,
latest_update_through,
)
grain (tree_id)
file f`https://storage.googleapis.com/my-bucket/trees/tree_info_v{data_version}.parquet`
: f`gcs://my-bucket/trees/tree_info_v{data_version}.parquet`
freshness by latest_update_through;
Partial File Sources
File datasources can be declared partial with a complete where clause, the same as database datasources. This is useful when data is split across per-city or per-partition files that are unioned into one logical dataset.
partial datasource sf_tree_info (
tree_id,
city,
species,
?plant_date,
?latitude,
?longitude,
)
grain (tree_id)
complete where city = 'USSFO'
file f`https://storage.googleapis.com/my-bucket/trees/ussfo_tree_info_v{data_version}.parquet`
: f`gcs://my-bucket/trees/ussfo_tree_info_v{data_version}.parquet`
freshness by ussfo_data_updated_through;
