Defining a Model
Recently, we were designing a Trilogy model to support analytics against a stock portfolio tracking application. This provides an opportunity to walk through building a data driven app using Trilogy in practice.
This example is simplified a bit to make it easier to follow.
The Context
We have access to a collection of information around the holdings of a portfolio, dividends paid, and orders placed. We also have some generic information about stocks.
We can imagine the various kind of questions we want to be able to answer:
- What's my return for stocks?
- Which stocks have paid the most dividends?
- What's my dividend return by sector?
- etc
We're just shooting for descriptive analytics/reporting here - there's a rich, rich ecosystem of quantitive finance libraries that can help you with more advanced questions, but we're just trying to get a handle on what we have.
THe Tech Stack
We're using Trilogy to define our model, and duckdb to run our queries. We'll load the relevant data into a local duckdb instance as it's relatively small in the grand scheme of things - thousands of stocks, divideneds, orders, not billions.
The Model - Entrypoints
Let's assume we have 4 core domains we care about.
- Stocks
- Orders
- Dividends
- Portfolio
The Model - Stocks
So let's get down to creation. We know we need to know things about stocks. Stocks can be identified in many ways - ISIN, ticker, etc. We'll assume that ticker is the unique key, but that we can load a integer surrogate key for it. (we like integers for anything we might use in a join).
We'll start a model with concepts. We want our surrogate id, and hten we can associate that with the ticker, ISIN, name, sector, string, industry
key id int; # surrogate identifier for a stock
property id.ticker string # the ticker;
property id.isin string # the isin;
property id.name string # the name of the stock;
property id.sector string # the sector of the stock;
property id.industry string # the industry of the stock;
We'll need some way to get that data. Let's assume we ran this DDL in duckdb.
DROP TABLE IF EXISTS symbols CASCADE;
CREATE TABLE symbols (
id INTEGER PRIMARY KEY,
ticker VARCHAR,
isin VARCHAR,
name VARCHAR,
sector VARCHAR,
industry VARCHAR,
);
We'd bind that in our trilogy like this. Since all the column names match up, this will look quite symmetric.
Note we know the primary key is id, so we can set the grain that way.
datasource stock_info (
id: id,
ticker: ticker,
isin: isin,
name: name,
sector: sector,
industry: industry,
)
grain (id,)
address symbols;
Let's save this file as stocks.preql
.
The Model - Orders
Let's pivot over to orders. We already defined a stock dataset, so let's pull that in first.
import stocks as stock;
We know an order will be placed for a stock with a certain quantity and price. We'll also want to know the date of the order. Let's assume a surrogate PK tooo.
key id int; # surrogate identifier for an order
property id.quantity float;
property id.price float;
property id.date_placed date;
We'll need some way to get that data. Let's assume we ran this DDL in duckdb.
DROP TABLE IF EXISTS orders CASCADE;
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
stock_id INTEGER,
quantity FLOAT,
price FLOAT,
date_placed DATE,
);
We'd bind that in our trilogy like this. Note that we bind the import stock.id concept to the stock_id column directly.
datasource order_info (
id: id,
stock_id: stock.id,
quantity: quantity,
price: price,
date_placed: date_placed,
)
grain (id)
address orders;
Let's save this file as orders.preql
.
Let's Take A Query Break
We've set up two components of our model - stocks and orders. Let's take a break to see what we can do now.
Let's answer - how many orders have we placed by industry?
There's actually two ways to slice this; let's start with the easiest one.
import orders as orders;
SELECT
orders.stock.industry,
count(orders.id)-> order_count
order by
order_count desc;
Since orders references the stock model, we can directly access the industry property. We're counting the number of orders, and ordering by that count.
But what if you do industry level reporting, and you want to know all the industries, even if no orders were placed?
We can do that too, but we'll need to pivot the query around the stock model. The stock model right now doesn't have any order information.
We can merge in our order information by saying that every property on the orders.stock model should be considered to be equivalent to a partial source for the stock model.
import stocks as stocks;
import orders as orders;
MERGE orders.stock.* into ~stocks.*;
-- this will return all industries, even if no orders were placed
SELECT
stocks.industry,
ifnull(count(orders.id),0)-> order_count
order by
order_count desc;
Back to Model - Dividends
Let's pivot over to dividends.
We already defined a stock dataset, so let's pull that in first.
import stocks as stock;
We know a dividend will be paid for a stock with a certain amount and date. We'll also want to know the date of the dividend. Let's assume a surrogate PK as well.
key id int;
property id.amount float;
property id.date_paid date;
We'll need some way to get that data. Let's assume we ran this DDL in duckdb.
DROP TABLE IF EXISTS dividends CASCADE;
CREATE TABLE dividends (
id INTEGER PRIMARY KEY,
stock_id INTEGER,
amount FLOAT,
date_paid DATE,
);
Create the datasource - note that we bind the import stock.id concept to the stock_id column directly, again.
datasource dividend_info (
id: id,
stock_id: stock.id,
amount: amount,
date_paid: date_paid,
)
grain (id)
address dividends;
Save it as dividends.preql
;
Query Break 2
Let's check out how many dividends we got
import stocks as stocks;
import orders as orders;
import dividends as dividends;
MERGE orders.stock.* into ~stocks.*;
MERGE dividends.stock.* into ~stocks.*;
-- this will return all industries, even if no orders were placed
SELECT
stocks.industry,
ifnull(count(orders.id),0)-> order_count,
ifnull(sum(dividends.amount_paid)) -> dividend_return
order by
order_count desc;
Holdings
Last but not least, let's define our holdings.
Surprise surprise, we'll start with our stock info.
import stocks as stock;
We know we'll have a certain quantity held. Since this is 1-1 with stocks, we can just use the stock ID as our PK.
property stock.id.qty_held float;
property stock.id.value float;
property stock.id.cost_basis float;
Create a datasource