To Have or Have Not
The Having Clause
Trilogy originally dropped the having clause entirely. This was in the service of simplicity; since a query would have a where clause that operated on the output projection, all values would be available for filtering and the where/having distinction was unneeded.
However, performance and authoring simplicity started to challenge that assumption. We've brought back the having clause, with a new purpose.
Implicit Where
The restrict to filtering on the output projection only made some authoring statements quite tedious. Since there was no implicit 'table' in the from clause, it was no longer possible to straightforwardly filter on a value that you didn't need to output.
While this could be worked around with hidden fields, it didn't feel... delightful.
SELECT
name,
--age,
where
age>30
;
Implicit where projection was the solution to this. For the above query, we could infer that age was referenced as a filter, so that we need a first projection with name + age to apply the filter to. We can then optionally group/restrict our values down to return the original projection set.
Now we can write:
SELECT
name,
where
age>30
;
That's delightful!
The Aggregate Issue
However, aggregates present a problem. Imagine the following query:
select
age,
count(person.id) ->person_count
where
state in ('vt', 'ma')
;
Does the user mean?
select
age,
--state,
count(person.id) ->person_count
where
state in ('vt', 'ma')
;
Likely not. They'll have duplicate rows for each state. To get the reduced set, the user will need to first define a CTE.
with restricted as select
age,
state,
person.id
where state in ('vt', 'ma');
select
restricted.age,
count(restricted.id) ->person_count ;
This starts to defeat the purpose of the implicit where. If every aggregate forces you to define a CTE first, authoring feels slow and disjointed.
Can we fix it?
Select Context
Aggregates already respond to the query context they exist in, such as being assigned a grain of the other dimensions. We can extend this to include the conditions as well.
If what the user really means is this:
select
age ? state in ('vt', 'ma'),
count(person.id ? state in ('vt', 'ma')) ->person_count
Then we can evaluate that automatically from a global where clause.
So now we get the right output for a condition that is not referenced in teh select:
select
age,
count(person.id) -> person_count,
where
state in ('vt', 'ma')
;
But what if the query is instead this?
select
age,
count(person.id) -> person_count,
where
state in ('vt', 'ma')
and
person_count>10
We can't push the condition clause into the aggregate definition because it's a circular dependency - th epushdown of the person_count would depend on the person_count itself.
Back to Having
To resolve this, we've reintroduced the having clause. It is restricted to filtering on the output of the select, and will not propagate context implicity up.
To promote a more natural reading order that matches operations, we've also enabled the where clause to come before the select.
Our final query might look like this:
-- the conceptual universe we're operating in
where
state in ('vt', 'ma')
-- define the projection
select
age,
count(person.id) -> person_count,
-- and filter the results
having
person_count>10
Future Steps
It's currently not strictly enforced on what conditions can be placed in each clause, as scalar values derived in the select can be safely pushed up if they don't have circular dependencies.
For improved DevX, we'll need to ensure invalid where clause definitions are surfaced a syntax error with guidance. [TODO: do this]