How advanced queries work - step-by-step explainer

naught101 · December 20, 2024, 2:07am

Following some discussion at Advanced Query Question: Find page with 2 specific blocks I found it useful to write down my understanding of the over-arching process of advanced queries. This was useful because I’m new to Clojure, and the syntax is pretty alien.

Working step by step through the logic of an example query of medium complexity helped me understand what each piece is doing.
I’m sharing it here in case anyone else finds it useful.

Explainer

Logseq queries are based on datomic, which is based on datalog, which is both a programming language and a database
Logseq data is stored (in memory?) in the form of a series of identically formatted data chunks called vectors, that look like:
- [id attribute value]. In Logseq:
  - id is generally a numeric ID of a block
  - attribute is any property a block can have, often namespaced keywords, e.g. :block/tags
  - value can be anything, a string, number, or any other data type
    - For some attributes, such as :block/tags or :block/page, the value is an id - a reference to another id, the properties of which can be looked up separately.
  - Technically there is a 4th value, transaction-id, but this is usually ignored for Logseq use-cases (it can be excluded from the vectors). So practically, all Logseq data vectors have 3 values.
- If you make a call like [?variable1 :something ?variable2], then:
  - Filter the vectors in the database to only include vectors with a :something attribute in the second position.
  - If the ?variables have values already, then they also act as a filter, further limiting the results
  - Values from all the matching vectors in the database are injected into the ?variables
  - This usually means that each ?variable is now a subset of what it was before that line.
Pages are a special type of block, which appear
- in the id position when the :block/name predicate is used:
  - [?page :block/name _] - this finds all blocks that have a :block/name (i.e. pages - other blocks don’t have names).
    - _ is like a wildcard that matches anything, and doesn’t insert it into a variable.
- in the value position, when attributes such as :block/page are used (also other page references, such as :block/tags):
  - [?block :block/page ?page] - this filters all database vectors that describe blocks that belong to a page and stores the block IDs in ?block in the page IDs in the ?page variable.
Worked example

Let’s take an example from the Logseq docs:

#+BEGIN_QUERY
{:title "All pages have a *programming* tag"
 :query [:find ?name
         :in $ ?tag
         :where
           [?t :block/name ?tag]
           [?p :block/tags ?t]
           [?p :block/name ?name]
       ]
 :inputs ["programming"]
 :view (fn [result]
         [:div.flex.flex-col
           (for [page result]
             [:a {:href (str "#/page/" page)} (clojure.string/capitalize page)]          
           )
         ]
       )
 }
#+END_QUERY

This query looks up names of pages that have the programming tag, and then formats them as a bunch of links.

To break it down. There are a few basic chunks you should pay attention to:
- :query this is the main important element. Nearly everything else can be removed and it will still work.
  - :inputs are the values passed in to the :in part of the query.
    - Not sure what the $ is for?
- :view does the formatting
- :title just sets the title of the results block
The :query has a few main chunks:
- :find - the values that you want to return
- :in - variables being passed in to the query
- :keys - names of the returned values, used in a map (not used in this example)
The chunks resolve in this order:
- :in → :where → :find → :keys
Working through the query logic
So, working through the query in logical order, we have:
```
:inputs ["programming"]
```
- Set the inputs vector, which just contains one value, the string “programming”
```
:in $ ?tag
```
- This accepts the inputs from :inputs, unpacks them and assigns them to variable(s), in this case ?tag.
- $ is a reference to the database.
```
:where
```
- This demarks the beginning of the database filtering, all the following filters are run in sequence, and implicitly joined
```
[?t :block/name ?tag]
```
The first filter:
- Find all database entries that:
  - use the :block/name attribute (and therefore are pages), and
  - have a value matching ?tag (currently= “programming”)
- Since ?t doesn’t yet exist, create it and assign it all the matching block IDs
  - it’s now a list with just one entry, the (numeric) ID of the page with the name “programming”
```
[?p :block/tags ?t]
```
Second filter:
- Find all database entries that:
  - use the :block/tags attribute, and
  - have a value matching any of the ID ?t (vector of IDs created by the previous filter, containing only the “programming” page ID)
- Since ?p doesn’t yet exist, create it and assign it all the matching IDs
  - it’s now a vector of IDs of blocks that include the “programming” tag
- ?t does exist, but this filter narrows the results, so the variable gets updated
  - it’s now a smaller vector of IDs (that appear as :block/tags values AND blocks that have the ‘programming’ tag
```
[?p :block/name ?name]
```
Third filter:
- Find all database entries that:
  - use the :block/name attribute, and
  - have an ID matching any of the IDs in ?p (vector of IDs created by the previous filter)
- ?p gets updated to only include rows with a :block/name attribute
  - Which means that they are pages
- ?name doesn’t yet exist, so it gets populated with the a vector of values (the 3rd element of the database vectors), which are
  - all of the names of pages (blocks that have a name, from filter 3), out of
  - all of the blocks which have “programming” tags (filter 2 + filter 1)
```
:find ?name
```
Return the ?name variable, which is a list of page names.

:view (fn [result]
      [:div.flex.flex-col
       (for [page result]
         [:a {:href (str "#/page/" page)} (clojure.string/capitalize page)])])}

This is a anonymous Clojure function, which:
- Creates an HTML div tag with flex classes
- loops over the values in result (which is the ?name vector)
  - makes an HTML link using each string in the results.

mentaloid · December 20, 2024, 11:15am

Here is a review:

Datalog is only a query language.
- The respective database is (counter-intuitively) called Datascript.
[?variable1 :something ?variable2] is not a call, it is a clause
IDs are not exactly numeric, they include dashes and may be even held as objects.
- “a reference to another id” should be “a reference to another block”
- blocks (thus also pages) don’t appear anywhere, only their IDs appear
  - Should somehow separate:
    - the low-level descriptions: db entries, IDs etc.
    - from the high-level descriptions: blocks, pages etc.
$ is for the target database.
- Remove “Not sure what the $ is for”
- It is meaningful in applications that query multiple databases.
  - Logseq’s queries typically target only one database at a time.
Special terminology (vector, anonymous etc.) should be avoided whenever possible.
- Otherwise clear up that this guide is from the perspective of a programmer.
A proper guide should discuss multiple queries, each one of them introducing a new concept in a gradual manner, probably ordered similarly to this:
- :query
  - :find
  - :where
    - nested clauses
- :title
- modifiers: :collapsed? etc.
- :inputs / :in
  - :rules
- :keys
- Clojure stuff:
  - :result-transform
  - :view
    - hiccup

How advanced queries work - step-by-step explainer

Explainer

Worked example

Working through the query logic