Following some discussion at Advanced Query Question: Find page with 2 specific blocks I found it useful to write down my understanding of the over-arching process of advanced queries. This was useful because I’m new to Clojure, and the syntax is pretty alien.
Working step by step through the logic of an example query of medium complexity helped me understand what each piece is doing.
I’m sharing it here in case anyone else finds it useful.
Explainer
-
Logseq queries are based on datomic, which is based on datalog, which is both a programming language and a database
-
Logseq data is stored (in memory?) in the form of a series of identically formatted data chunks called vectors, that look like:
[id attribute value]. In Logseq:idis generally a numeric ID of a blockattributeis any property a block can have, often namespaced keywords, e.g.:block/tagsvaluecan be anything, a string, number, or any other data type- For some attributes, such as
:block/tagsor:block/page, the value is anid- a reference to another id, the properties of which can be looked up separately.
- For some attributes, such as
- Technically there is a 4th value,
transaction-id, but this is usually ignored for Logseq use-cases (it can be excluded from the vectors). So practically, all Logseq data vectors have 3 values.
- If you make a call like
[?variable1 :something ?variable2], then:- Filter the vectors in the database to only include vectors with a
:somethingattribute in the second position. - If the
?variableshave values already, then they also act as a filter, further limiting the results - Values from all the matching vectors in the database are injected into the
?variables - This usually means that each
?variableis now a subset of what it was before that line.
- Filter the vectors in the database to only include vectors with a
-
Pages are a special type of block, which appear
- in the
idposition when the:block/namepredicate is used:[?page :block/name _]- this finds all blocks that have a:block/name(i.e. pages - other blocks don’t have names)._is like a wildcard that matches anything, and doesn’t insert it into a variable.
- in the
valueposition, when attributes such as:block/pageare used (also other page references, such as:block/tags):[?block :block/page ?page]- this filters all database vectors that describe blocks that belong to a page and stores the block IDs in?blockin the page IDs in the?pagevariable.
- in the
-
Worked example
-
Let’s take an example from the Logseq docs:
-
#+BEGIN_QUERY {:title "All pages have a *programming* tag" :query [:find ?name :in $ ?tag :where [?t :block/name ?tag] [?p :block/tags ?t] [?p :block/name ?name] ] :inputs ["programming"] :view (fn [result] [:div.flex.flex-col (for [page result] [:a {:href (str "#/page/" page)} (clojure.string/capitalize page)] ) ] ) } #+END_QUERY - This query looks up
namesofpagesthat have theprogrammingtag, and then formats them as a bunch of links.
-
-
To break it down. There are a few basic chunks you should pay attention to:
:querythis is the main important element. Nearly everything else can be removed and it will still work.:inputsare the values passed in to the:inpart of the query.- Not sure what the
$is for?
- Not sure what the
:viewdoes the formatting:titlejust sets the title of the results block
-
The
:queryhas a few main chunks::find- the values that you want to return:in- variables being passed in to the query:keys- names of the returned values, used in a map (not used in this example)
-
The chunks resolve in this order:
:in→:where→:find→:keys
-
Working through the query logic
-
So, working through the query in logical order, we have:
-
:inputs ["programming"]- Set the inputs vector, which just contains one value, the string “programming”
-
:in $ ?tag- This accepts the inputs from
:inputs, unpacks them and assigns them to variable(s), in this case?tag. $is a reference to the database.
- This accepts the inputs from
-
:where- This demarks the beginning of the database filtering, all the following filters are run in sequence, and implicitly joined
-
[?t :block/name ?tag] -
The first filter:
- Find all database entries that:
- use the
:block/nameattribute (and therefore are pages), and - have a value matching
?tag(currently= “programming”)
- use the
- Since
?tdoesn’t yet exist, create it and assign it all the matching block IDs- it’s now a list with just one entry, the (numeric) ID of the page with the name “programming”
- Find all database entries that:
-
[?p :block/tags ?t] -
Second filter:
- Find all database entries that:
- use the
:block/tagsattribute, and - have a value matching any of the ID
?t(vector of IDs created by the previous filter, containing only the “programming” page ID)
- use the
- Since
?pdoesn’t yet exist, create it and assign it all the matching IDs- it’s now a vector of IDs of blocks that include the “programming” tag
?tdoes exist, but this filter narrows the results, so the variable gets updated- it’s now a smaller vector of IDs (that appear as
:block/tagsvalues AND blocks that have the ‘programming’ tag
- it’s now a smaller vector of IDs (that appear as
- Find all database entries that:
-
[?p :block/name ?name] -
Third filter:
- Find all database entries that:
- use the
:block/nameattribute, and - have an ID matching any of the IDs in
?p(vector of IDs created by the previous filter)
- use the
?pgets updated to only include rows with a:block/nameattribute- Which means that they are pages
?namedoesn’t yet exist, so it gets populated with the a vector of values (the 3rd element of the database vectors), which are- all of the
namesof pages (blocks that have a name, from filter 3), out of - all of the blocks which have “programming” tags (filter 2 + filter 1)
- all of the
- Find all database entries that:
-
:find ?name -
Return the
?namevariable, which is a list of page names. -
:view (fn [result] [:div.flex.flex-col (for [page result] [:a {:href (str "#/page/" page)} (clojure.string/capitalize page)])])} -
This is a anonymous Clojure function, which:
- Creates an HTML div tag with flex classes
- loops over the values in
result(which is the?namevector)- makes an HTML link using each string in the results.