Following some discussion at Advanced Query Question: Find page with 2 specific blocks I found it useful to write down my understanding of the over-arching process of advanced queries. This was useful because I’m new to Clojure, and the syntax is pretty alien.
Working step by step through the logic of an example query of medium complexity helped me understand what each piece is doing.
I’m sharing it here in case anyone else finds it useful.
Explainer
-
Logseq queries are based on datomic, which is based on datalog, which is both a programming language and a database
-
Logseq data is stored (in memory?) in the form of a series of identically formatted data chunks called vectors, that look like:
[id attribute value]
. In Logseq:id
is generally a numeric ID of a blockattribute
is any property a block can have, often namespaced keywords, e.g.:block/tags
value
can be anything, a string, number, or any other data type- For some attributes, such as
:block/tags
or:block/page
, the value is anid
- a reference to another id, the properties of which can be looked up separately.
- For some attributes, such as
- Technically there is a 4th value,
transaction-id
, but this is usually ignored for Logseq use-cases (it can be excluded from the vectors). So practically, all Logseq data vectors have 3 values.
- If you make a call like
[?variable1 :something ?variable2]
, then:- Filter the vectors in the database to only include vectors with a
:something
attribute in the second position. - If the
?variables
have values already, then they also act as a filter, further limiting the results - Values from all the matching vectors in the database are injected into the
?variables
- This usually means that each
?variable
is now a subset of what it was before that line.
- Filter the vectors in the database to only include vectors with a
-
Pages are a special type of block, which appear
- in the
id
position when the:block/name
predicate is used:[?page :block/name _]
- this finds all blocks that have a:block/name
(i.e. pages - other blocks don’t have names)._
is like a wildcard that matches anything, and doesn’t insert it into a variable.
- in the
value
position, when attributes such as:block/page
are used (also other page references, such as:block/tags
):[?block :block/page ?page]
- this filters all database vectors that describe blocks that belong to a page and stores the block IDs in?block
in the page IDs in the?page
variable.
- in the
-
Worked example
-
Let’s take an example from the Logseq docs:
-
#+BEGIN_QUERY {:title "All pages have a *programming* tag" :query [:find ?name :in $ ?tag :where [?t :block/name ?tag] [?p :block/tags ?t] [?p :block/name ?name] ] :inputs ["programming"] :view (fn [result] [:div.flex.flex-col (for [page result] [:a {:href (str "#/page/" page)} (clojure.string/capitalize page)] ) ] ) } #+END_QUERY
- This query looks up
names
ofpages
that have theprogramming
tag
, and then formats them as a bunch of links.
-
-
To break it down. There are a few basic chunks you should pay attention to:
:query
this is the main important element. Nearly everything else can be removed and it will still work.:inputs
are the values passed in to the:in
part of the query.- Not sure what the
$
is for?
- Not sure what the
:view
does the formatting:title
just sets the title of the results block
-
The
:query
has a few main chunks::find
- the values that you want to return:in
- variables being passed in to the query:keys
- names of the returned values, used in a map (not used in this example)
-
The chunks resolve in this order:
:in
→:where
→:find
→:keys
-
Working through the query logic
-
So, working through the query in logical order, we have:
-
:inputs ["programming"]
- Set the inputs vector, which just contains one value, the string “programming”
-
:in $ ?tag
- This accepts the inputs from
:inputs
, unpacks them and assigns them to variable(s), in this case?tag
. $
is a reference to the database.
- This accepts the inputs from
-
:where
- This demarks the beginning of the database filtering, all the following filters are run in sequence, and implicitly joined
-
[?t :block/name ?tag]
-
The first filter:
- Find all database entries that:
- use the
:block/name
attribute (and therefore are pages), and - have a value matching
?tag
(currently= “programming”)
- use the
- Since
?t
doesn’t yet exist, create it and assign it all the matching block IDs- it’s now a list with just one entry, the (numeric) ID of the page with the name “programming”
- Find all database entries that:
-
[?p :block/tags ?t]
-
Second filter:
- Find all database entries that:
- use the
:block/tags
attribute, and - have a value matching any of the ID
?t
(vector of IDs created by the previous filter, containing only the “programming” page ID)
- use the
- Since
?p
doesn’t yet exist, create it and assign it all the matching IDs- it’s now a vector of IDs of blocks that include the “programming” tag
?t
does exist, but this filter narrows the results, so the variable gets updated- it’s now a smaller vector of IDs (that appear as
:block/tags
values AND blocks that have the ‘programming’ tag
- it’s now a smaller vector of IDs (that appear as
- Find all database entries that:
-
[?p :block/name ?name]
-
Third filter:
- Find all database entries that:
- use the
:block/name
attribute, and - have an ID matching any of the IDs in
?p
(vector of IDs created by the previous filter)
- use the
?p
gets updated to only include rows with a:block/name
attribute- Which means that they are pages
?name
doesn’t yet exist, so it gets populated with the a vector of values (the 3rd element of the database vectors), which are- all of the
names
of pages (blocks that have a name, from filter 3), out of - all of the blocks which have “programming” tags (filter 2 + filter 1)
- all of the
- Find all database entries that:
-
:find ?name
-
Return the
?name
variable, which is a list of page names. -
:view (fn [result] [:div.flex.flex-col (for [page result] [:a {:href (str "#/page/" page)} (clojure.string/capitalize page)])])}
-
This is a anonymous Clojure function, which:
- Creates an HTML div tag with flex classes
- loops over the values in
result
(which is the?name
vector)- makes an HTML link using each string in the results.