Defining a Model

Recently, we were designing a Trilogy model to support analytics against a stock portfolio tracking application. This provides an opportunity to walk through building a data driven app using Trilogy in practice.

This example is simplified a bit to make it easier to follow.

The Context

We have access to a collection of information around the holdings of a portfolio, dividends paid, and orders placed. We also have some generic information about stocks.

We can imagine the various kind of questions we want to be able to answer:

  • What's my return for stocks?
  • Which stocks have paid the most dividends?
  • What's my dividend return by sector?
  • etc

We're just shooting for descriptive analytics/reporting here - there's a rich, rich ecosystem of quantitive finance libraries that can help you with more advanced questions, but we're just trying to get a handle on what we have.

The Tech Stack

We're going to use Trilogy to define our model, and duckdb to run our queries. We'll load the relevant data into a local duckdb instance as it's relatively small in the grand scheme of things - thousands of stocks, divideneds, orders, not billions. We're going to skip over where we got our information from and loading it to duckdb; imagine that we're pulling from a Robinhood account or similar.

The Model - Entrypoints

Let's assume we have 4 core domains we care about.

  • Stocks
  • Orders
  • Dividends
  • Portfolio

The Model - Stocks

So let's get down to creation. We know we need to know things about stocks. Stocks can be identified in many ways - ISIN, ticker, etc. We'll assume that ticker is the unique key, but that we can load a integer surrogate key for it. (we like integers for anything we might use in a join).

We'll start a model with concepts. We want our surrogate id, and hten we can associate that with the ticker, ISIN, name, sector, string, industry


key id int; # surrogate identifier for a stock
property id.ticker string # the ticker;
property id.isin string # the isin;
property id.name string # the name of the stock;
property id.sector string # the sector of the stock;
property id.industry string # the industry of the stock;

We'll need some way to get that data. Let's assume we ran this DDL in duckdb.

DROP TABLE IF EXISTS symbols CASCADE;
CREATE TABLE symbols (
    id INTEGER PRIMARY KEY,
    ticker VARCHAR,
    isin VARCHAR,
    name VARCHAR,
    sector VARCHAR,
    industry VARCHAR,
);

We'd bind that in our trilogy like this. Since all the column names match up, this will look quite symmetric.

Note we know the primary key is id, so we can set the grain that way.


datasource stock_info (
    id: id,
    ticker: ticker,
    isin: isin,
    name: name,
    sector: sector,
    industry: industry,
)
grain (id,)
address symbols;

Let's save this file as stocks.preql.

The Model - Orders

Let's pivot over to orders. We already defined a stock dataset, so let's pull that in first.

import stocks as stock;

We know an order will be placed for a stock with a certain quantity and price. We'll also want to know the date of the order. Let's assume a surrogate PK tooo.


key id int; # surrogate identifier for an order
property id.quantity float;
property id.price float;
property id.date_placed date;

We'll need some way to get that data. Let's assume we ran this DDL in duckdb.

DROP TABLE IF EXISTS orders CASCADE;
CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    stock_id INTEGER,
    quantity FLOAT,
    price FLOAT,
    date_placed DATE,
);

We'd bind that in our trilogy like this. Note that we bind the import stock.id concept to the stock_id column directly.


datasource order_info (
    id: id,
    stock_id: stock.id,
    quantity: quantity,
    price: price,
    date_placed: date_placed,
)
grain (id)
address orders;

Let's save this file as orders.preql.

Let's Take A Query Break

We've set up two components of our model - stocks and orders. Let's take a break to see what we can do now.

Let's answer - how many orders have we placed by industry?

There's actually two ways to slice this; let's start with the easiest one.

import orders as orders;

SELECT
    orders.stock.industry,
    count(orders.id)-> order_count
order by
    order_count desc;

Since orders references the stock model, we can directly access the industry property. We're counting the number of orders, and ordering by that count.

But what if you do industry level reporting, and you want to know all the industries, even if no orders were placed?

We can do that too, but we'll need to pivot the query around the stock model. The stock model right now doesn't have any order information.

We can merge in our order information by saying that every property on the orders.stock model should be considered to be equivalent to a partial source for the stock model.

import stocks as stocks;
import orders as orders;

MERGE orders.stock.* into ~stocks.*;

-- this will return all industries, even if no orders were placed
SELECT
    stocks.industry,
    ifnull(count(orders.id),0)-> order_count
order by
    order_count desc;

Back to Model - Dividends

Let's pivot over to dividends.

We already defined a stock dataset, so let's pull that in first.

import stocks as stock;

We know a dividend will be paid for a stock with a certain amount and date. We'll also want to know the date of the dividend. Let's assume a surrogate PK as well.

key id int;
property id.amount float;
property id.date_paid date;

We'll need some way to get that data. Let's assume we ran this DDL in duckdb.

DROP TABLE IF EXISTS dividends CASCADE;
CREATE TABLE dividends (
    id INTEGER PRIMARY KEY,
    stock_id INTEGER,
    amount FLOAT,
    date_paid DATE,
);

Create the datasource - note that we bind the import stock.id concept to the stock_id column directly, again.

datasource dividend_info (
    id: id,
    stock_id: stock.id,
    amount: amount,
    date_paid: date_paid,
)
grain (id)
address dividends;

Save it as dividends.preql;

Query Break 2

Let's check out how many dividends we got

import stocks as stocks;
import orders as orders;
import dividends as dividends;

MERGE orders.stock.* into ~stocks.*;
MERGE dividends.stock.* into ~stocks.*;

-- this will return all industries, even if no orders were placed
SELECT
    stocks.industry,
    ifnull(count(orders.id),0)-> order_count,
    ifnull(sum(dividends.amount_paid)) -> dividend_return
order by
    order_count desc;

Holdings

Last but not least, let's define our holdings.

Surprise surprise, we'll start with our stock info.

import stocks as stock;

We know we'll have a certain quantity held. Since this is 1-1 with stocks, we can just use the stock ID as our PK.


property stock.id.qty_held float;
property stock.id.value float;
property stock.id.cost_basis float;

We'll need some way to get that data. Let's assume we ran this DDL in duckdb.

DROP TABLE IF EXISTS holdings CASCADE;
CREATE TABLE holdings (
    stock_id INTEGER PRIMARY KEY,
    value FLOAT,
    cost_basis FLOAT,
    qty_held FLOAT,
);

Create a datasource

datasource dividend_info (
    stock_id: stock.id,
    value:value,
    cost_basis:cost_basis,
    qty_held:qty_held
)
grain (stock.id)
address holdings;

Query Break 3

Now we can calculate which of our holdings give us the most dividends.

import stocks as stocks;
import orders as orders;
import dividends as dividends;
import holdings as holdings;

MERGE orders.stock.* into ~stocks.*;
MERGE dividends.stock.* into ~stocks.*;
MERGE holdings.stock.* into ~stocks.*;
-- this will return all industries, even if no orders were placed
SELECT
    symbol.sector,
    symbol.industry,
    sum(holdings.qty) as total_holding_qty,
    sum(holdings.value) as total_holding_value,
    sum(dividend.amount) as total_dividend,
    total_dividend/total_holding_value * 100 as dividend_yield,
    total_holding_value / sum(holdings.value) by * as pct_of_total
order by
    total_holding_qty desc
;

Wrap up

If we were going to use this model a lot, we'd probably define an entrypoint.preql file that would import and do the merges.

Each analysis/view you need can create a customer entrypoint.preql to replicate teh intended environment.