Limitations on combining queries? ChatGPT assisted

mlanza · February 2, 2024, 12:39am

I wanted to combine two advanced queries, both of which had results on their own. The intent was the joined query combine the results of both. I wasn’t sure how given my limited understanding of Datalog. So I asked ChatGPT.

It had little issue combining the queries. I continued to prompt it to limit nesting and other things, but in the end, although all the queries it generated looked correct, they wouldn’t work in Logseq. The result, returning nothing, would just have the block drop out of the view.

Query #1

{
      :title [:h4 "☀️ Today"]
      :query [:find (pull ?block [*])
              :in $ ?day
              :where
              [?block :block/marker ?marker]
              (or
                [?block :block/scheduled ?d]
                [?block :block/deadline ?d])
              [(contains? #{"TODO","DOING","WAITING"} ?marker)]
              [(<= ?d ?day)]]
      :result-transform
        (fn [result]
          (sort-by (fn [h] [(get h :block/priority "Z") (get h :block/created-at)]) result))
      :inputs [:today]
      :collapsed? false
    }

Query #2

{
      :title [:h4 "🪣 Next"]
      :query [:find (pull ?block [*])
              :where
              [?block :block/marker ?marker]
              [(contains? #{"DOING","WAITING"} ?marker)]]
      :result-transform
        (fn [result]
          (sort-by (fn [h] [(get h :block/priority "Z") (get h :block/created-at)]) result))
      :collapsed? false
    }

ChatGPT Attempts

{
  :title [:h4 "Merged Query"]
  :query [:find (pull ?block [*])
          :in $ ?day
          :where
          (or
            (and 
              [?block :block/marker ?marker]
              (or
                [?block :block/scheduled ?d]
                [?block :block/deadline ?d])
              [(contains? #{"TODO","DOING","WAITING"} ?marker)]
              [(<= ?d ?day)])
            (and
              [?block :block/marker ?marker]
              [(contains? #{"DOING","WAITING"} ?marker)]))]
  :result-transform
    (fn [result]
      (sort-by (fn [h] [(get h :block/priority "Z") (get h :block/created-at)]) result))
  :inputs [:today]
  :collapsed? false
}

{
  :title [:h4 "Merged Query"]
  :query [:find (pull ?block [*])
          :in $ ?day
          :where
          (or-join
            [?block :block/marker ?marker]
            [(contains? #{"TODO","DOING","WAITING"} ?marker)]
            [(<= ?d ?day)]
            (or
              [?block :block/scheduled ?d]
              [?block :block/deadline ?d])
            [(contains? #{"DOING","WAITING"} ?marker)])]
  :result-transform
    (fn [result]
      (sort-by (fn [h] [(get h :block/priority "Z") (get h :block/created-at)]) result))
  :inputs [:today]
  :collapsed? false
}

Why would I not be able to simply wrap the two queries effectively in parantheses as below? I’m not sure why ChatGPT’s many valid attempt all seem to fail, as do mine. I don’t understand why nesting a couple levels deeper would be any more difficult in Datalog than in other languages like JavaScript.

{
  :title [:h4 "Merged Query"]
  :query [:find (pull ?block [*])
          :in $ ?day
          :where
          (or
            (and ;Query 1 -- comment
              [?block :block/marker ?marker]
              (or
                [?block :block/scheduled ?d]
                [?block :block/deadline ?d])
              [(contains? #{"TODO","DOING","WAITING"} ?marker)]
              [(<= ?d ?day)])
            (and ;Query 2 -- comment
              [?block :block/marker ?marker]
              [(contains? #{"DOING","WAITING"} ?marker)]))
         ]
  :result-transform
    (fn [result]
      (sort-by (fn [h] [(get h :block/priority "Z") (get h :block/created-at)]) result))
  :inputs [:today]
  :collapsed? false
}

mentaloid · February 2, 2024, 1:53am

Your “limited understanding of Datalog” is more than ChatGPT’s zero understanding.
- Whether it is aware of the syntactical rules or not, it doesn’t understand them.
  - Can ask it for confirmation.
- When it happens to follow the rules, it is by chance.
  - When the chance happens to be high, it remains a mere chance.
    - If you keep trying for a long enough time, you may even get something working.
      - Just because it works, it doesn’t mean that it works correctly.
        
        Although this case is a relatively simple one.
The generated queries are syntactically wrong.
- The error messages mention the reason.
  - Could try prompting the error messages.
- One attempt got very close to the correct query.
  - So it looks correct, but looks don’t matter during execution.
There are no relevant limitations in either Logseq or Datalog, other than the syntax itself.
- Different languages, different rules.
  - Even when most of them look like English, that doesn’t make them similar.
  - Check the syntax here.
    - In particular what or-join is for and what its syntax is.
      - Or ask ChatGPT.
        
        There is a chance to give a nice explanation, while unable to understand it itself.
- Their difference from ChatGPT is that:
  - they are actually fully aware of the relevant rules
  - they follow them consistently
  - they have hard-coded both:
    - the exact rules
    - the exact way to follow them

Siferiax · February 3, 2024, 11:10am

A better approach may be to examine what you want to ask and first write that down in plain English and work from there. Ignore the queries you have and focus on the new result you are trying to establish.

In your case, reading the queries what you are essentially asking is.

Give me all my tasks that are either DOING or WAITING.
AND give me those tasks that are TODO and that have a schedule or deadline on or before today.

We can break that down for an OR constructions. Valid for both is the line

[?block :block/marker ?marker]

Then for the first ask the necessary line is:

[(contains? #{"DOING","WAITING"} ?marker)]

For the second it is:

[(contains? #{"TODO"} ?marker)]
(or
  [?block :block/scheduled ?d]
  [?block :block/deadline ?d]
)
[(<= ?d ?day)]

These are the components we need to combine.
First.

Whenever we use an OR it is important to understand that datalog will try to bind all variables (?) with the rest of the query.
- To accomplish this all parts of the OR should use the same variables.
- In our case this isn’t true. We only use ?d in that last part and nowhere else in the query.
So we will need to use the or-join construction to let datalog know which variables are relevant for the rest of the query.
- In this case that is all the variables except ?d.

Second.

We need multiple lines within our or-join to be true at the same time. So we will need an extra AND statement for that.

So the correct solution for the where is:

[?block :block/marker ?marker]
(or-join [?block ?marker ?day]
  [(contains? #{"DOING","WAITING"} ?marker)]
  (and
    [(contains? #{"TODO"} ?marker)]
    (or
      [?block :block/scheduled ?d]
      [?block :block/deadline ?d]
    )
    [(<= ?d ?day)]
  )
)

mlanza · February 4, 2024, 1:07pm

Thank you for the big attempt to help, but even your suggested alternative did not work. Basically, the above two queries are to be united into what is effectively a SQL union. So if I have on my screen a list of As and a list of Bs, then I can be 100% certain that the combined list will have all the As and Bs none of which get repeated. And if that list won’t even appear it doesn’t work.

Alas, it’s no longer a concern as I have decided to work around the dilemma and found a good alternative which I am now using. I am simply troubled that unions are so challenging. Unions, it seems, are the most basic thing and I cannot understand why they are not as simple as:

([[As]] union [[Bs]])

Or to put it another way, why, in Datalog, can I not easily just jam the two queries together with something like a “union” so that the original queries are fully intact? Why must they be separated into parts and then carefully weaved together? This is trivial in SQL.

Siferiax · February 5, 2024, 11:05am

Why not? What result did you get and what was the expected result?

Why would you want to? It creates unnecessary bloat of your coding.

Whether datalog or SQL the question of what you want to have returned should always be clear.
Going back to my post, what about my plain English statement was incorrect? What question are you trying to ask of your data?

Since you’re versed in SQL, let me pose this.
Why would you write

select *
from tasks
where marker in ("TODO", "DOING", "WAITING")
AND (scheduled <= getdate() or deadline <= getdate())
UNION
select *
from tasks
where marker in ("DOING", "WAITING")

Instead of

select *
from tasks
where marker in ("DOING", "WAITING")
OR (marker = "TODO" 
  AND (scheduled <= getdate() or deadline <= getdate())
  )

The result is effectively the same.

To put it visually, your today query, your next query and my merged suggestion.

mentaloid · February 5, 2024, 12:46pm

@mlanza’s point is that the language (and the engine that implements it) should be able to perform this kind of merging by itself (which is not technically difficult), allowing the user to just throw some queries together and get the union of their results. Of course Datalog has more serious shortcomings than this one.

jonor · February 10, 2024, 12:41pm

Could you elaborate on which these shortcomings are, and perhaps in what directions you could look for alternative solutions in such cases?

mentaloid · February 10, 2024, 1:27pm

Datalog’s shortcomings include the following:
- it doesn’t support nested expressions
  - with some exceptions
- it has very limited vocabulary and general expressiveness
  - e.g. it doesn’t have the concept of “all”
- it has very few datatypes
- it is not Turing-complete
  - i.e. some things are simply impossible
alternative solutions
- some things are still possible, just not in a sane way
  - they need complex tricks
    - e.g. inverse logic, double negations, comparisons against infinity, etc.
- other things need a different language
  - clojurescript, javascript etc.