Datasource Statements
Datasource Statements
Datasource statements are used to both define sources of data - the roots of a model - and to define targets for writing data back out.
Syntax
"datasource" IDENTIFIER "(" column_assignment_list ")" grain_clause? (address | query)
Example
import std.money;
import nation as nation;
key id int;
property id.name string;
property id.address string;
property id.phone string;
property id.account_balance float::usd;
property id.market_segment string; # Capitalized; one of BUILDING | FURNITURE | MACHINERY | AUTOMOBILE | HOUSEHOLD
property id.comment string;
datasource customers (
c_custkey:id,
c_name:name,
c_address:address,
c_nationkey:nation.id,
c_phone:phone,
c_acctbal:account_balance,
c_mktsegment:market_segment,
c_comment:comment
)
grain (id)
address memory.customer;
Components
Identifier
The named alias of the datasource is used if the datasource is referenced in a persist, validate, or publish statement.
Column Assignment List
The column assignment list contains the binding of physical columns on the datasource to concepts.
A binding can take one of 3 forms:
customers(
# a bare binding represents a mapping from "id" on the table to "id" as a concept
id,
# a colon binding maps "name" on the table to "user_name" as a concept
name: user_name,
# a function binding can reference a prior filed
UPPER(name): upper_name,
# a rawsql binding can reference a sql expression
)
Grain Clause
The grain clause defines unique identifiers for rows in the table. (A primary key, or composite primary key).
An empty grain means the table is a "heap" - there is nothing that uniquely identifies a row.
Queries on heaps will typically need to group before using any data.
Address/Query
Datasources can be bound directly to tables, raw SQL scripts, or sql or python files.
Table
The standard approach.
address database.my_fun_table
Query Binding
Triple quotes, surrounding a sql statement. Do not include semicolons or other statement terminators.
query '''
select fun_data
'''
Tips
Query sources are very useful for experimentation, injecting in values, and bespoke database SQL syntax
File Binding
A SQL binding is equivalent to a a query binding, but it will load the file from a relatively located sql script.
The same rules around terminators apply.
file '.my_script.sql'
Python File Binding
For DuckDB, a python file with appropriate configuration can also be used as a source. The python file must be a UV style script (with appropriate headers) and emit a a pyarrow table to stdout.
An example is given below.
#!/usr/bin/env -S uv run
# /// script
# requires-python = ">=3.13"
# dependencies = ["pyarrow"]
# ///
import sys
import pyarrow as pa
def fibonacci(n: int) -> list[int]:
if n <= 0:
return []
if n == 1:
return [0]
fibs = [0, 1]
for _ in range(2, n):
fibs.append(fibs[-1] + fibs[-2])
return fibs
def emit(table: pa.Table) -> None:
with pa.ipc.new_stream(sys.stdout.buffer, table.schema) as writer:
writer.write_table(table)
if __name__ == "__main__":
n = 20 # Keep small to avoid int overflow in pyarrow
fibs = fibonacci(n)
table = pa.table({"index": list(range(1, n + 1)), "fibonacci": fibs})
emit(table)
It can then be used like
key fib_index int;
property fib_index.value int;
datasource fib_numbers(
index:fib_index,
fibonacci: value
)
grain (fib_index)
file `./fib.py`;
select
sum(value) as total_fib;
