Advanced Query Question: Find page with 2 specific blocks

Hi all

I’m wondering if it is possible to split properties on a page on several blocks an use an advanced query to find specific pages.

Example Page Structure:

Product 123

- ## Summary
  type:: Product
  Name:: Product 123
  Material-Category:: Metal
  Color:: Gray
	- ### Description
		- some Text
	- ### Details
		- ### Material
		  Material:: Aluminium
		- ### Dimensions
		  Height:: 0.5cm
		  Width:: 0.8cm

Is it possible to write an advanced query to find all pages where Material=Aluminium AND type=Product?

I’m able to find each block separately, but I’m unable to “link” them together (I think they must have the same block/page value).

Any help is appreciated!

Welcome. Try something like this:

#+BEGIN_QUERY
{
 :query [:find (pull ?page [*])
 :where
   [?details :block/properties ?details-props]
   [(get ?details-props :material) ?material]
   [(= ?material "Aluminium")]

   [?details :block/page ?page]
   [?summary :block/page ?page]

   [?summary :block/properties ?summary-props]
   [(get ?summary-props :type) ?type]
   [(= ?type "Product")]
 ]
}
#+END_QUERY

Hi mentaloid

Thank you for this hint! I‘ll try this tomorrow.

Hi mentaloid

Your query works like charm, thanks!

I’m trying to do something similar to this, but really struggling to comprehend the query syntax. This is probably the wrong place, but it’s a good example, so a couple of questions:

  1. Where does ?details come from? and ‘?summary’?
    • Are they some how generated from the page content itself? If so, how does this happen? What other things become available?
    • If they are built-in things, then where do I find a list of these? They’re not in the advanced queries doc
  2. How do the where clauses combine? Is it something like:
    • each one applied sequentially to the results of the find clause, as a filter, or
    • as a function that may retrieve extra data and store it in a temporary variable (any vectory that ends in ?something where that something is not already defined?)
      • (that variable might be returned later with :keys)
  3. Are your 3 groupings of vectors semantic, or just for easier reading?
  4. Your first 3 where lines and your last three seem basically the same (two filters on block properties), but you’re using one before the :block/page filter in the middle group, and one after it. How does this work? Does this mean that it’s NOT sequential?

More than a couple. Should definitely read much more and practice a lot.

  • Nobody grasped the syntax just by guessing.
  • The advanced queries doc is merely a starting point.
  • Should experiment with simpler queries in tiny test-graphs.

I’m going to answer your individual points in an unprecise but simplified way:

  1. All parts that look like ?name are arbitrarily-named variables.
    • They don’t relate to the page content.
      • They apply to a database maintained by Logseq separately from the markdown files.
    • They are not built-in, so there cannot be a list of them.
      • The names are chosen by the users, the computer doesn’t care.
    • Same-named variables represent:
      • the same single value at any given time
      • but potentially multiple values during the same execution of the same query
    • They are not defined, they are placeholders.
  2. The :where clauses:
    • combine:
      • with an implicit top-level AND
      • sequentially from top to bottom
    • apply before the :find clause
      • They apply to the entries of the database.
        • The first clause filters the whole database, the second clause filters the results of the first clause and so on.
      • Technically speaking, all :find, :keys and :where are at the same level.
        • However, they apply in this order: :where:find:keys
  3. Easier reading.
  4. It is sequential.
    • The same results can be achieved in different line-order (and also in multiple ways).
    • Different orders (and/or ways) can differ in their performance.

Here is a rough but thorough description of what goes on at a middle level:

  • Begin with :where
    • Gather (the IDs of) all blocks which have properties.
      • This is equivalent to filtering-out all blocks without properties.
      • Name those blocks ?details and their properties ?details-props
        • At this point, variable ?details holds a relatively big number of IDs.
    • Then keep only those ?details-blocks that have property material::
      • And name the value ?material
      • At this point, variable ?details holds less IDs than in the beginning.
    • Then keep only if that property’s value is “Aluminium”.
      • This clause cannot be placed earlier than where the variable ?material appears.
      • At this point, variable ?details holds even less IDs than in the beginning.
    • Then gather all (the IDs of) the pages that contain the blocks with ?details-IDs.
      • Name those pages ?page
      • At this point, variable ?page holds relatively many IDs.
        • But not more than the number of ?details-blocks.
    • Then keep only the ?page-pages that have blocks.
      • They all have, because they already contain ?details-blocks.
      • We still need to do this step, in order to consider all other blocks in ?page-pages.
        • Name those blocks ?summary
          • At this point:
            • variable ?summary holds a relatively big number of IDs.
            • ?details-blocks is a subset of ?summary-blocks
              • because although they both belong to the same ?page-pages:
                • ?details-blocks got further filtered by property
                • while ?summary-blocks are not filtered yet
      • Alternatively, it is possible to:
        • begin with ?summary-blocks and deal with ?details-blocks later
          • The relative performance of that approach depends on the actual graph.
        • move the two [?... :block/page ?page] clauses to the bottom
          • That would be less performant.
        • even swap the two [?... :block/page ?page] clauses with each-other
          • That would be worse, though fully valid.
    • Then from ?page-pages keep only the ones that contain blocks with properties.
      • They all do, because they already contain ?details-blocks.
      • We still need to do this step, in order to consider all other blocks with properties.
        • Name those properties ?summary-props
          • At this point, ?details-props is a subset of ?summary-props
    • Then keep only those that have property type::
      • And name the value ?type
      • At this point:
        • variable ?summary holds less IDs than earlier
        • thus variable ?page holds fewer IDs than earlier
        • thus variable ?details also shrinks (if it was to be used again)
    • Then keep only if that property’s value is “Product”.
      • This clause cannot be placed earlier than where the variable ?type appears.
      • At this point:
        • variable ?summary holds even less IDs (if it was to be used again)
        • thus variable ?page holds much fewer IDs
        • thus variable ?details shrinks further (if it was to be used again)
  • Continue with :find
    • Return all the fields of the remaining ?page-pages.
    • If no pages remain, return nothing.
1 Like

Thanks @mentaloid!

I have been doing other reading (e.g. the datomic docs, learndatalogtoday, and I found some of those resources in the wiki you linked, but hadn’t found the wiki itself). I’ve been reading clojure tutorials too (I’m primarily a python coder, only know lisp-family stuff out of general interest in coding languages). I have been trying to experiment a lot, but I’ve been struggling with the lack of clear feedback with errors. I see you posted about keeping the console open in another post, so I’ll try that.

Your extended reply is EXTREMELY helpful, and very clear, so thank you very much.

This is a game-changer for me. This makes it make a lot more sense.

A clarification: I think this means that advanced queries always start with blocks (this is implict, not obvious from the docs), and in order to get pages, you have to use the :block/page key(?), and then the (pull ?page [*]) is used as a secondary database query that retrieves all the other attributes of a page and puts them in the results? is that right?

OK, this [?summary :block/page ?page] line is a bit of a head-scratcher. let me get this right:

  • there is an implicit ?ALL_BLOCKS variable that contains the list of all blocks in the database
    -(consider this pseudo-code thinking if I don’t have the technical details right)
  • ?details contains ?ALL_BLOCKS filtered to only include blocks that have the material:: Aluminium proerty.
  • ?page contains all the pages that the blocks in ?details are owned by (not sure the right terminology here)
  • because the [?summary :block/page ?page] doesn’t reference ?details or some other variable first, ?summary is implicitly a new filter against ?ALL_BLOCKS,
    • which then also gets run through a block property filter on the next few lines.

This is wild/impressive to me. It sounds somehow recursive and difficult to implement (I probably don’t need to know the details yet). Very cool.


This worked-example explanation format is extremely useful. I’m wondering if it might be valuable to add such an example to the docs directly. If I wrote one up, referencing this thread, would you be up for reviewing it?

Also, I think the advanced queries docs page could do with a paragraph or two of plain-english intro to add some of this context, perhaps I could have a go at writing that too, if my understanding of the above is correct.

  • Logseq treats pages as special types of blocks.
    • For example, blocks can also have properties.
    • It is useful to query (pull ?block [*])
  • As about the database and its queries, they don’t even know what a block or page is.
    • They only know IDs.
      • And they don’t start with IDs, they start with entries.
  • Logseq fills the database in such a way that:
    • blocks are IDs that participate in the left part of entries [left :block/page right]
    • pages are IDs that participate in the right part of entries [left :block/page right]
      • Though a page may be empty.
        • So it primarily participates in the left part of a single entry [left :block/name right]
  • As far as the database is concerned, any ID can participate in any part.
    • i.e. clause [left :block/page right] simply filters all entries with middle part :block/page
    • It is Logseq’s conventions that prevent some IDs from participating in the “wrong” part.
      • It also ensures that each block belongs to a single page.
    • So your ?ALL_BLOCKS should be understood as ?ALL_LEFT_PARTS.
      • It is Logseq’s conventions when applied to clause [left :block/page right] that turn:
        • its left part from ?ALL_LEFT_PARTS to ?ALL_BLOCKS
        • its right part from ?ALL_RIGHT_PARTS to ?ALL_PAGES.
      • Whenever such a clause is used, every time any of the two parts shrinks, the other part also shrinks accordingly, so:
        • parts represent sets
        • variables identify sets among clauses
          • Whenever we name a variable, we tell the query to consider a different set of arbitrary values.
            • Humans care about the meaning of each name.
              • It indicates the intended content of the respective set.
            • Queries only care about the uniqueness of each name.
              • In order for same-named variables to shrink the same set.
            • Clauses is how we instruct queries to remove from each set the undesired values.
  • This is welcome, but premature.
    • Give it one more month to grow your understanding before attempting something like that.

Excellent, thanks again! That’s all very clear.

Agreed. I had a go anyway, and it was really useful for consolidating my understanding. Maybe I will sit on it for a couple of weeks, and see if I find any mistakes, the I’ll post it.

Posted here, FYI: How advanced queries work