Understanding not-clause in Datalog

tejonaco · April 25, 2024, 5:17pm

In a few words, why this:

(not
[?b :block/refs ?ref]
[?ref :block/name ?name]
[(contains? ?ignore ?name)]
)

is not the same as this:

[?b :block/refs ?ref]
[?ref :block/name ?name]
(not
[(contains? ?ignore ?name)]
)

Above this code I’m asking for all TODO tasks, ?ignore is a map with all refs that I want to exclude.
The first example returns all tasks except the one with refs to one page included in ?ignore (expected behaviour).
The second throws all tasks including the ones I didn’t want to retrieve.

I’m sure it’s just a basic concept about this languaje that I missunderstood (I didn’t even know how is this called, datascript? datalog? datomic? clojurescript?.. I’m a bit confused).

Thanks!

mentaloid · April 25, 2024, 5:31pm

The meaning is roughly this:

for the first example:
- the block should not contain a reference to something that is in the ignore list
for the second example:
- the block should contain a reference to something that is not in the ignore list

tejonaco · April 25, 2024, 5:36pm

Great explanation! I think I’m a bit closer to manage queries

Siferiax · April 29, 2024, 11:00am

PS. the way to look at the not clause in general is as such:
you make a dataset within that clause which then gets subtracted from the query results as they are.

The set that gets removed in your first example is any ?b entities with a reference from your ignore list
The set that gets removed in your second example is any ?name values that are in your ignore list.

And then you basically get what mentaloid indicates.
So the way to think of it is, “what subset of data do I wish to remove from my results?”

Whenever I get confused about what my not is suppose to be, I actually build a query trying to get the exact subset I don’t want to see. This will then inform my not clause.

Hope this helps

mentaloid · April 29, 2024, 3:39pm

That perspective goes well for the first level of nesting.
- In the second example, it would essentially read it like this:
  - create a set A of blocks that contain a reference
  - create a set B of blocks that contain a reference in the ignore list
  - subtract B from A
- Would be easier to read it like this:
  - include blocks that contain references
  - from the resulting set, exclude blocks that their reference is in the ignore list
Things get tricky when adding more nested levels.
- Consider a query like this:
```
:where
  [...a...]
  (not
    [...b...]
    (not [...c...])
  )
```
- The above perspective would read it like this:
  - create a set A that passes condition a
  - create a set B that additionally passes condition b
  - create a set C that additionally passes condition c
  - subtract C from B to get a set S
  - subtract S from A
- Would be easier to read it like this:
  - include blocks that pass condition a
  - from the resulting set, exclude blocks that pass condition b
    - but don’t exclude them if they also pass condition c
- However, I prefer reading it like this:
  - blocks that pass condition a
  - but don’t pass condition b
    - except if they also pass condition c

Siferiax · April 29, 2024, 4:52pm

I don’t think so?
I would read it as

from the dataset remove
- those ?name values that are present in the ?ignore list
- and therefore those ?ref entities with those ?name values as their :block/name
crucially that leaves all ?b entities that have a reference outside of the ignore list
- regardless of whether it also has a reference present in the ignore list.
  - this is because :block/refs is not a single value
to work around the multiple values, we need to include the ?b entities in our subset
- this way the ?b entity gets subtracted as a whole, even if there are references outside the ignore list.

Example 2 therefore only excludes ?b entities that only have references present in the ignore list.
And example 1 excludes all ?b entities that have 1 or more references in the ignore list.

I think I will just conclude not clauses can be very opaque if you want to understand them thoroughly.
I was trying to think of a way to better articulate how not clauses work, but I feel I’m writing in circles.
And as you demonstrated already, when it comes to nesting it gets even more confusing! I have such an example in my own graph:

Block that don’t have a reference
and whose lineage doesn’t have a reference
don’t count references to note or activity, as having a reference

Basically a nested example 2.

(not [?b :block/refs])
(not
  [?b :block/parent ?par]
  [?par :block/refs ?r]
  (not 
    [?r :block/name ?name]
    [(contains? #{"notitie" "activiteit"} ?name)]
  )
)

(The actual query uses a rule to get through the whole lineage)

mentaloid · April 30, 2024, 8:20am

Describing sets comes less natural to people that don’t use them in a daily fashion.

This description right there reads better in my opinion.
- Although the last line is vague.
  - e.g. not clearing up whether the exception applies:
    - to blocks as well
    - exclusively to their lineage
  - Indenting it would clarify its scope.
I would slightly change it like this:
- blocks that neither have a reference themselves
- nor their lineage has a reference
  - except (or other than) to note or activity
In other words, it helps when expressing the various nots with more specific negative words (the bold ones).

Siferiax · April 30, 2024, 9:32am

True!

Perhaps it is a language barrier thing, for me except is not natural
I mean in the not clause sense of using it.
I guess that’s why I went for the “not count as” wording.
Other than would work I guess. Not sure why I’m tripping over the word except.

Probably a language barrier that I use simpler/less specific terms.

mentaloid · April 30, 2024, 9:54am

“except” can be read as:

besides
excluding
- This should be natural enough.
with the exception/exclusion of
without including/counting those
- This is closer to your own expression.
keep out those
- This is the literal meaning.