To Have or Have Not

The Having Clause

Trilogy originally dropped the having clause entirely. This was in the service of simplicity; since a query would have a where clause that operated on the output projection, all values would be available for filtering and the where/having distinction was unneeded.

However, performance and authoring simplicity started to challenge that assumption. We've brought back the having clause, with a new purpose.

Implicit Where

The restrict to filtering on the output projection only made some authoring statements quite tedious. Since there was no implicit 'table' in the from clause, it was no longer possible to straightforwardly filter on a value that you didn't need to output.

While this could be worked around with hidden fields, it didn't feel... delightful.

SELECT
    name,
    --age,
where
    age>30
;

Implicit where projection was the solution to this. For the above query, we could infer that age was referenced as a filter, so that we need a first projection with name + age to apply the filter to. We can then optionally group/restrict our values down to return the original projection set.

Now we can write:

SELECT
    name,
where
    age>30
;

That's delightful!

The Aggregate Issue

However, aggregates present a problem. Imagine the following query:

select
    age,
    count(person.id) ->person_count
where 
    state in ('vt', 'ma')
;

Does the user mean?

select
    age,
    --state,
    count(person.id) ->person_count
where 
    state in ('vt', 'ma')
;

Likely not. They'll have duplicate rows for each state. To get the reduced set, the user will need to first define a CTE.

with  restricted as select
age,
state,
person.id
where state  in ('vt', 'ma');

select
    restricted.age,
    count(restricted.id) ->person_count ;

This starts to defeat the purpose of the implicit where. If every aggregate forces you to define a CTE first, authoring feels slow and disjointed.

Can we fix it?

Select Context

Aggregates already respond to the query context they exist in, such as being assigned a grain of the other dimensions. We can extend this to include the conditions as well.

If what the user really means is this:

select
    age ? state in ('vt', 'ma'),
    count(person.id ? state in ('vt', 'ma')) ->person_count

Then we can evaluate that automatically from a global where clause.

So now we get the right output for a condition that is not referenced in teh select:

select
    age,
    count(person.id) -> person_count,
where 
    state in ('vt', 'ma')
;

But what if the query is instead this?

select
    age,
    count(person.id) -> person_count,
where 
    state in ('vt', 'ma')
    and
    person_count>10

We can't push the condition clause into the aggregate definition because it's a circular dependency - th epushdown of the person_count would depend on the person_count itself.

Back to Having

To resolve this, we've reintroduced the having clause. It is restricted to filtering on the output of the select, and will not propagate context implicity up.

To promote a more natural reading order that matches operations, we've also enabled the where clause to come before the select.

Our final query might look like this:

-- the conceptual universe we're operating in
where
    state in ('vt', 'ma')
-- define the projection
select
    age,
    count(person.id) -> person_count,
-- and filter the results
having 
    person_count>10

Future Steps

It's currently not strictly enforced on what conditions can be placed in each clause, as scalar values derived in the select can be safely pushed up if they don't have circular dependencies.

For improved DevX, we'll need to ensure invalid where clause definitions are surfaced a syntax error with guidance. [TODO: do this]