Knowledge Management for Tags / Tag Hierarchies

brsma · August 17, 2022, 3:09pm

Yes and no. For some of the use cases I read here it totally seems to make sense (as far as I can judge). From my own perspective not so much, though. Here’s why:

In contrast to qualified links page properties stored in the header are not contextual and not specific. I can neither create nor discover them in writing. Moreover, if there is more than one relationship modelled via page properties I cannot know or indicate which of these is meaningful in my note’s context. You need to model, manage, and keep all relationships in the specific page header rather than having them spelled out where they actually matter. This obstructs one of Logseq’s core features, namely emerging connections between notes within context.

I think that we have two significantly different – and moreover: conceptionally incompatible – mental models at play here in how we approach working with knowledge:

Relational Database (RDB): distinct (semi-)structured objects of more or less similar types with specific key/value fields (modeled as tables) storing data and modelling relationships
Graph: unstructured objects of varying types with relationships expressed as edges between graph nodes. (Nodes can have key/value properties, ideally storable anywhere in the node)

Several of the contributors here seem to approach Logseq from a RDB perspective and not from a note graph perspective. Fair enough. But if you approach a graph-based tool with a RDB perspective you won’t get far (and vice versa). You can observe similar tendencies in the Obsidian neighbourhood where people are using DataView to tweak the system into a classical database (with all relevant data stored as key/value table in the header rather than the note body). To me that’s like trying to swim faster by putting on the best running shoes because they work so well on land

TL;DR: If you need a RDB, use a RDB and not a graph. The only tool I know that does both fairly well is Notion. And it does so by encapsulating the RDB blocks and keeping RDB records and graph nodes separate from each other. That way you can interact with both using an appropriate interface etc. without shoehorning one concept badly into the other.

PS: I totally acknowledge that there’s a warranted need combining both models in one ‘Integrated Thinking Environment’, and being able to use the fitting one depending on which kind of content you deal with. I just strongly object against turning any decidedly graph-based tool into a mishmash that tries to do both at once – and consequently neither well because the fundamental models involuntarily are clashing. That said: how about following Notion in that respect and implementing an RDB model on a block basis (perhaps using CSV instead of MD for storage)?

alex0 · August 17, 2022, 4:58pm

So if I understand it correctly you want to enrich references with a property. But the syntax you mentioned would create a page for a single statement.

Instead I would extend my approach to blocks:

- property:: <block-id>
  A statement.

for example:

- because:: 1234-5678-1234-5678
  A statement.

The block with ID 1234-5678-1234-5678 contains another statement that in this example is the reasoning that justify the first statement.

What do you think?

Yes, Logseq derives a graph from references but I don’t see why Logseq should prefer this model over the RDB one. I think Logseq just provides very general and abstract tools and each user tries to come up with a data structure that work for them.

But I think both graph UI/UX and RDB UI/UX like queries are not yet ripe in Logseq. I hope Logseq will develop both and the tools to integrate them.

Isn’t this already the case? Queries work on a block basis by default: the results of queries are blocks and their properties can be used in queries to filter or in the visualization (columns of a table at the moment).

brsma · August 17, 2022, 5:05pm

No necessarily (see Introduction to Semantic MediaWiki - semantic-mediawiki.org – where I borrowed the syntax): the first part before the :: separator would actually create a page or block property with the second part as the property’s value.

brsma · August 17, 2022, 5:16pm

The significant difference lies in the storage format of the data (and consequently how you can edit/mangle/query/… it). In an RDB you have a fixed set of fields in a fixed tabular structure. When you add or delete a field to the DB schema this impacts all records in the table. And you can only store content that adheres to the RDB table schema (a massive benefit if your data is structured). The only common aspect is that in both cases you can store and query key/value pairs per record.

brsma · August 17, 2022, 5:25pm

Which is most probably why the database engine on which Logseq is built is a graph database: It needs to be highly malleable

The first answer here (sql - Comparison of Relational Databases and Graph Databases - Stack Overflow) is explains it better than I could quickly:

There actually is conceptual reasoning behind both styles. Wikipedia on the relational model and graph databases gives good overviews of this.

The primary difference is that in a graph database, the relationships are stored at the individual record level, while in a relational database, the structure is defined at a higher level (the table definitions).

This has important ramifications:

A relational database is much faster when operating on huge numbers of records. In a graph database, each record has to be examined individually during a query in order to determine the structure of the data, while this is known ahead of time in a relational database.

Relational databases use less storage space, because they don’t have to store all of those relationships.

Storing all of the relationships at the individual-record level only makes sense if there is going to be a lot of variation in the relationships; otherwise you are just duplicating the same things over and over. This means that graph databases are well-suited to irregular, complex structures. But in the real world, most databases require regular, relatively simple structures. This is why relational databases predominate.

(note the highlight :))

alex0 · August 17, 2022, 5:26pm

Oh OK, so from the linked PDF:

The page is about the book “The Picture of Dorian Gray” and somewhere in the page the author “Oscar Wilde” is mentioned.

So I suppose the syntax for “Oscar Wilde” would be something like:

[[author::Oscar Wilde]]

That would be treated by Logseq as a link like [[Oscar Wilde]].

But in this example author:: clearly refers to the page The Picture of Dorian Gray. But I think this wouldn’t work in Logseq that is block-based.

I think a block like this in Logseq would make sense:

- [[title::The Picture of Dorian Gray]] is a [[type::book]]
  written by [[author::Oscar Wilde]].

Rendered, with links, as:

The Picture of Dorian Gray is a book written by Oscar Wilde.

But at the same time that block gain the properties:

type:: book
title:: The Picture of Dorian Gray
author:: Oscar Wilde

This is brilliant! Now I really want this implemented!

brsma · August 17, 2022, 5:52pm

A bit better, isn’t it?

Still, while the format above works well for SMW’s page-centric model it does not yet feel completely fit for Logseq’s block model to me. But I have a hunch that with another round of deeper hands-on thinking we could actually come up with a model & syntax that’s truly block-based.

Basically we would somehow need to explicitly declare the left-hand side of the relation, as well. Your solution of using title:: for that purpose might come close. On first sight it’s utterly elegant from my perspective

But there’s a catch (sigh…): does [[title:: The Picture of Dorian Gray]] then create a title:: property for the block (I assume not, because it would break the current file storage model) or for the page [[The Picture of Dorian Gray]]. How do we know then that the other properties declared in that block are either properties of the page [[The Picture of Dorian Gray]] or of the block? This is tricky.

alex0 · August 17, 2022, 9:26pm

They are all properties of a block, that’s just how we add data to the database in Logseq. If a block is the first one of a page its properties become the properties of the page (basically a page is a block with a name). Here there is an example of a page with properties and a block used to add an entry to the database:

This is the title of the page.md
- title:: This is the title if the page
  another-page-property:: ...
- type:: book
  title:: The Picture of Dorian Gray
  author:: Oscar Wilde

So you can just use blocks with all the properties you want, even title::
If you want to use a page to define a DB entry you can do so: but if you use the special property title:: you must be OK with that being the title of the page (and the associated file name).

At the moment if I write this block:

- [[The Picture of Dorian Gray]] is a [[book]] written by [[Oscar Wilde]]

This block link together three pages without specify any kind of relation, they are all just mentions.

If I want to store the above info with the RDB model I would write this block instead:

- type:: book
  title:: The Picture of Dorian Gray
  author:: Oscar Wilde

These two methods are redundant; the former is better for human reading and fits into a longer text; the latter is a RDB entry.

I’m saying that with the proposed syntax it would be possible to specify a DB entry with a sentence, so just one block that fits in a longer text and also encode the relations:

- [[title::The Picture of Dorian Gray]] is a [[type::book]] written by [[author::Oscar Wilde]].

So basically “Oscar Wilde” is the author of a block, this block is defined as a “book” and again this block has a property, “title”, that is “The Picture of Dorian Gray”.

I hope this makes clear how we use Logseq with a RDB model.

Anyway thank you very much for making me aware of this!

alex0 · August 17, 2022, 9:53pm

P.S.

It would be really cool if an algorithm (maybe some Machine Learning) could automatically suggest references and properties.

For example if I write:

- The Picture of Dorian Gray is a book written by Oscar Wilde

it would ask me if I want to turn it into this:

- [[title::The Picture of Dorian Gray]] is a [[type::book]] written by [[Oscar Wilde]]

So that with a query I can for example display a table with all the blocks with type::book and see the above as an entry:

Title	Author
The Picture of Dorian Gray	Oscar Wilde

Even the opposite would be cool: if I have some kind of structured data (MySQL DB, CSV, JSON etc) I could programmatically generate Markdown files, for example:

The Picture of Dorian Gray.md
- [[title::The Picture of Dorian Gray]] is a [[type::book]] written by [[Oscar Wilde]].

and

Oscar Wilde.md
- type:: author
- ## Books
  - Oscar Wilde wrote the following books:
    {{ query (and (property type [[book]]) (property author [[Oscar Wilde]]) }}

brsma · August 17, 2022, 10:43pm

Redirecting to jqlang.github.io can do this easily See GitHub - 0dB/dayone-json-to-obsidian: Update Obsidian vault from Day One (“DayOne”) JSON using command line scripts. for an example how to. I recently converted the JSON export of my pinboard bookmarks to markdown files that way. Having ca 23k bookmarks from ca. 15 years of bookmarking with pinboard lying around as individual MD files turned out to be a somewhat dead end approach, though

alex0 · August 18, 2022, 11:14am

@brsma can I write a proposal for [[property::reference]] syntax or do you have any other doubt?

GaiusScotius · August 18, 2022, 11:40pm

Before any proposals are written, may I make a couple of points?

First and foremost, giving a link a type using the syntax [[property::reference]] will break interoperability between Logseq and other Markdown editors that interpret [[ … ]] as a Wikilink. Logseq uses (fairly standard) markdown files as persistent storage backing a more transient graph database. Moreover, Logseq continuously monitors the markdown files so that if you edit one using an external editor it updates immediately, not on the next restart. Clearly Logseq’s designers thought interoperability sufficiently important to forego some of the benefits of just using a database. If we’re not going to stick with standard markdown, Logseq might as well just use a database.

Second, is the suggestion that the syntax [[property:reference]] constrains the reference to be to a note that is somehow defined to be an appropriate target? In a relational database this can be implemented using referential integrity, i.e. a constraint requiring that a foreign key in one table be an extant primary key in some other table, but this requires a type system and, at least as yet, Logseq is entirely untyped. There are, for example, no constraints on the values you give to purportedly the same property in different notes; it is entirely legitimate, albeit nonsensical, to define the author:: [[Oscar Wilde]] (a link) in one note, author:: Oscar Wilde (a string) in a second; and author:: 42 (a string that look like a number) in a third.

Logseq might well benefit from a type system, but I suggest that typing links and properties are closely related and any proposal should consider both. That said, I don’t see adding a type system as inordinately difficult. In a functional language like Clojure types are ultimately defined as functions; fundamentally all that is needed is a mechanism for calling Clojure code when certain events occur on a block.

alex0 · August 19, 2022, 12:11am

That’s a good point and my reply is don’t use this syntax if you want to keep interoperability, just as we do with tons of custom syntax that Logseq adds to Markdown.

It doesn’t seem the case at all for me. If I open my Logseq graph with Obsidian the pages are a mess of custom syntax. Basically only wikilinks are in common with Obsidian. By your reasoning we shouldn’t use queries or block references for example. Sorry but Logseq is not built for interoperability at all, you need to export your pages in standard Markdown manually.

But I agree that if an user adopts the proposed syntax then even compatibility with Obsidian’s wikilinks is lost and I haven’t a solution other than the following, if it’s possible to implement it without ambiguities and UI/UX issues:

- title:: [[The Picture of Dorian Gray]] is a class:: [[book]] by author:: [[Oscar Wilde]].

But the above properties:: must take as value only the following [[wikilink]] and ignore , that usually separates multiple values. This can be confusing to the user.

The idea is that [[property::value]] is exactly the same as property:: [[value]] but inline. There wouldn’t be a way to write the equivalent of property:: value and I don’t see why it would be needed since Logseq interprets [[value]] and value as the same, as you explained when mentioning types.

And about types, I don’t see what would be the advantage and it would make everything more complicated for the user. For me Logseq does a good job in understanding what the user means (42 is an integer etc).

Yours is an interesting reply anyway, thanks.

GaiusScotius · August 19, 2022, 9:24am

I suspect we use Logseq in different ways, I like using Typora as an editor. Whilst it is odd to see every block as a bullet point, it is at least correctly rendered and links work. Altering the link syntax will break that, which may be perfectly acceptable, and as you say one doesn’t have to use it.

Regarding Logseq’s syntax for queries, you’re right. In my mind it should use the fairly widely accepted triple backtick+language syntax, which is interpreted as a code block for inline execution (see, for example, R Markdown). The advantage is that it’s backwardly compatible with the double backtick syntax for rendering a code block; editors that can’t launch an interpreter just render the code. Given that queries are written as a data structure defining the operations to be undertaken that is passed to a function, it wouldn’t be too difficult to convert {{query (….) }} to (query (…)) — i.e. a Clojure form — within a triple backtick code block. I suspect that’s what’s going on under the hood in any case.

I know this is slightly off topic, but giving Logseq the capability to execute arbitrary Clojure code blocks on the occurrence of various internal events would be enormously useful. It is, however, a matter for another thread I feel.

alex0 · August 19, 2022, 10:11am

Just out of curiosity, how would you escape the execution and just render the code?

Since we are having fun discussing syntax, here there is a possible one:

- ```
  some code in a supported language
  ```
  code-exec:: true

GaiusScotius · August 19, 2022, 7:06pm

@alex0 Apologies for the delay in replying. The approach I’m suggesting is drawn from R Markdown and more fully described here.

The original Markdown spec used indentation with at least four spaces or a tab to delineate a block that should be typeset in a fixed width, typewriter style font. This was common for code snippets, so syntax was designated as a “code block”. As an alternative to indentations, code could be “fenced” using ``` or, in some (most?) interpreters, ~~~.

That was fine, but it meant that all programming languages were shown in the same manner. Programmers, on the other hand, are used to syntax highlighting editors and IDEs, which pick out and emphasise syntactic features of the language features such as keyword. Thus the ``` notation was further extended with a parameter list, bounded by braces { … }, further instructing the markdown interpreter how to display the fenced code.

The first element in the parameter list is a language designator. Thus ```{clojure} means pretty print/ syntax highlight the remainder of the block as Clojure code. There is now a long list f supported languages!

If the parameter list has only one element – the language name – the { } can be dropped, hence ```clojure and ```{clojure} are syntactically identical.

If the parameter list has more than one element, the remaining elements are passed to the whatever formatter the interpreter calls. Typically these arguments control the HTML output (see, for example, Python Markdown). I don’t think there is any standardisation of this.

R Markdown was designed as a tool for creating high quality, data rich documents and reports. It treats an otherwise standard markdown file (with a YAML header block) as a sequence of “chunks”, interpreting standard markdown in the normal way but, but executing rather than formatting ``` fenced code blocks (at lest those in R and some other languages – Python and bash for certain). If you want to show the code rather than execute it, you simply use indentation or (I think, because t’s been a while) ~~~ fencing.

R Markdown’s process is two stage. First fenced (R) code blocks are executed in the context of the interpreter, meaning that they have access to data that has already been loaded and to any YAML data. The means that an initial code chunk can, for example, load data from a file, database or website, later chunks process it and still later chunks display the results of that processing; I’ve done this for some reports that involved Monte-Carlo modelling.

The output (if any) of each code chunk must be valid markdown; typically, as raw HTML is itself valid markdown, the output is an HTML div. These, plus any raw markdown in the file, are then passed to a process called knitr , which “knits” them into a unified markdown file. This process is typically controlled by values passed through the code block’s parameter list, so you frequently see knitr directives that control the captioning, size and positioning of the div created which a code block is executed.

Finally, the second stage markdown/ HTML is passed through pandoc to render it as HTML pages, PDF, LaTex, Word or whatever.

Now, R Markdown (and an extension called Bookdown, which is designed for long form technical documentation, writing theses etc.) have very different objectives from Logseq; they are publishing, not knowledge management, tools. Nevertheless, the ability to dynamically execute code within a markdown document (a block in Logseq) is hugely powerful. Given that Logseq is coded in Clojure and that R and Clojure share an ability to dynamically load and execute code chunks (using the eval function, but I think there may be better ways involving pre-compilation to JVM byte-code and dynamic loading) taking a leaf from R Markdown’s book is something that I feel is well worth investigating.

What would particularly interest me is if such code could have the ability to walk the page/ block graph, extract text and properties for processing, set property values, create links etc. (perhaps calling Logseq’s own functions to do so). Other PKM applications, notably Tinderbox, can do this, but Tinderbox uses it’s own language (Action Script) and a “semi-proprietary” file format (everything is stored in as XML) so it’s pretty much a walled garden. Clojure, on the other hand, opens the door to just about anything written in Java.

Sorry for the long explanation, I hope it was useful.

alex0 · August 19, 2022, 11:14pm

Thank you for the informations, they were interesting for me!

I still don’t understand if you are suggesting Logseq should run Clojure code in ``` ``` or something else involving other languages.

I just understand that you want to keep it compatible with other editors, so it should be “standard” Markdown.

If you are suggesting Logseq should run code, when would it happen? When exiting edit mode and rendering the block maybe? Or exec all the code blocks in page when it’s open? Or manually with buttons?

And how would you display the output? Maybe display it as text in a child (or parent) block? Or do you want the output to replace the text of the block once rendered (as it happens now with queries)?

For example:

```bash
var="This is a [[reference]]"
echo $var
```

Would it render like a block containing this text?

This is a [[reference]]

And do you want the language to be Clojure? To be honest I prefer simpler languages like Python for something like this. I understand that supporting a language other than Clojure would be hard.

GaiusScotius · August 20, 2022, 5:42pm

The gist of my suggestion is that:

text fenced ~~~ or indented should be displayed as code in the conventional fixed width font;
text fenced by *lang* or *{lang code-block-parameters} should be executed prior and its result (if any) rendered in place of the block’s text;
if lang is {clojure …}, the code should be executed in the context of Logseq so that it has access to Logseq’s then current runtime data (pages and blocks) and can perform CRUD operations on it – effectively the code extends Logseq’s built in functionality;
if lang is {anything-other-than-clojure …}, it is invoked with any code-block-parameters in its environment and the code block (if any) passed to it via stdin, the result of the code block’s execution being read by Logseq from the invoked process’ stdout and stderr. This approach allows not only scripts to be interpreted, but other programs to be run with the contents of the code block (if any) passed as input data.

Only after execution would Logseq then interpret the block’s Markdown/ HTML and render the block on screen.

Given that Logseq cannot know what data the code-block is touching, I think it would be difficult to cache the results of execution so I suspect that Logseq will have to run the code every time the block containing it is rendered. Your suggestion of decorating code block with block parameters that can control execution is very sensible; although I would suggest that they would be things like are these results cacheable, width and height constraints for rendering etc. There remains much to be thought of here.

alex0 · August 20, 2022, 10:36pm

If Logseq won’t implement this it could be implemented with an external tool:

- You have read <output-here> books this year.
  - <code>

The external tool would parse the .md file for <code>, exec it and replace the <output-here> in the parent block that would be then rendered by Logseq. Since the code is in a child block it can be collapsed.

This is worth discussing in a dedicated thread, in General Discussion maybe.

alex0 · August 23, 2022, 10:02am

Please continue the discussion about inline properties here:

CC @brsma