You Need a Semantic Layer

Review the Problem

Semantic layers have been going in and out of vogue for years. You might also hear about ontologies, graphs, data domains, and other terms. They're all going after one core concept - it's pretty hard to get the write answer out of a SQL database without context. A new analyst, starting at a new company, when asked a business question - "how many users do we have?" - is almost certainly going to get the wrong answer. They don't know which table has users; they don't even know what database has users. They don't know if users means 'website visitors', 'customers', 'accounts', 'emails', or something else. They don't know that IDs from 0-1000000 are test users. You can go on.

So the problem is clear - how do you get someone context as quickly as possible? That's what a semantic layer is for, at it's heart - a structured format for contextualizing raw data - what it means, how to join it, and more.

You don't need the modern data stack

Some relational databses - such as postgres - have rich typing and comment support. Your semantic layer in that context can be supported directly in the database - add column descriptions, create foreign keys, have enum types, etc. Other databases - such as bigquery - have less rich typing, but still support comments and some constraints. You can get a long way with just these features.

Long-time DBAs recognize and value these. You have ERD and database design tools that excel at this layer.

But Can We Do Better?

There are issues with this - DB features are mismatched between different vendors; coupling of this metadata to the database is a problem. It's not portable, it's not reusable, and it's not version controlled. You can't easily share it between databases, or even different instances of the same database.

Enter semantic

Enter AI

To get an LLM to write good SQL, you need to inject it with context. You can fine-tune - which is expensive, slow, and is out of date as soon as your engineering team pushes that next update to your customer model. You can let the agent explore - a popular option - using their familiarity with DBs to query around. However, this is slow, inconsistent in results, requires security tradeoffs, and is hard to test/evolve - you can't really A/B test a whole new database schema without jumping through a lot of hoops.

So giving your AI context outside the DB is where people tend to go. And that's where a semantic layer shines - it's a right-sized, structure injection of context right where an LLM needs it, that you can test/version, evolve to your heart's content.

Humans Too

But if you treat it just for AI, you get semantic drift - humans are still the ones keeping your model up to date. Sharing the same context between users and AIs ensures that improvements to the semantic layer benefit both.

Takeaway

You do want a semantic layer. It makes your LLMs better, and your humans more productive. But it needs to be more than just a set of comments in a database, or a YAML file. It needs to be a first-class, version-controlled, testable, and evolvable artifact that all consumption surfaces can share.

What meets this bar?

LookML is probably the closest; Malloy is a newer take. All the top vendors are having their new semantic layer offerings as well; Databricks, Snowflake, etc.

We're biased, but we think Trilogy checks those boxes pretty well too.