Knowledge Management for Tags / Tag Hierarchies

jq can do this easily :slight_smile: See GitHub - 0dB/dayone-json-to-obsidian: Update Obsidian vault from Day One (“DayOne”) JSON using command line scripts. for an example how to. I recently converted the JSON export of my pinboard bookmarks to markdown files that way. Having ca 23k bookmarks from ca. 15 years of bookmarking with pinboard lying around as individual MD files turned out to be a somewhat dead end approach, though :grimacing:

1 Like

@brsma can I write a proposal for [[property::reference]] syntax or do you have any other doubt?

Before any proposals are written, may I make a couple of points?

First and foremost, giving a link a type using the syntax [[property::reference]] will break interoperability between Logseq and other Markdown editors that interpret [[ … ]] as a Wikilink. Logseq uses (fairly standard) markdown files as persistent storage backing a more transient graph database. Moreover, Logseq continuously monitors the markdown files so that if you edit one using an external editor it updates immediately, not on the next restart. Clearly Logseq’s designers thought interoperability sufficiently important to forego some of the benefits of just using a database. If we’re not going to stick with standard markdown, Logseq might as well just use a database.

Second, is the suggestion that the syntax [[property:reference]] constrains the reference to be to a note that is somehow defined to be an appropriate target? In a relational database this can be implemented using referential integrity, i.e. a constraint requiring that a foreign key in one table be an extant primary key in some other table, but this requires a type system and, at least as yet, Logseq is entirely untyped. There are, for example, no constraints on the values you give to purportedly the same property in different notes; it is entirely legitimate, albeit nonsensical, to define the author:: [[Oscar Wilde]] (a link) in one note, author:: Oscar Wilde (a string) in a second; and author:: 42 (a string that look like a number) in a third.

Logseq might well benefit from a type system, but I suggest that typing links and properties are closely related and any proposal should consider both. That said, I don’t see adding a type system as inordinately difficult. In a functional language like Clojure types are ultimately defined as functions; fundamentally all that is needed is a mechanism for calling Clojure code when certain events occur on a block.

1 Like

That’s a good point and my reply is don’t use this syntax if you want to keep interoperability, just as we do with tons of custom syntax that Logseq adds to Markdown.

It doesn’t seem the case at all for me. If I open my Logseq graph with Obsidian the pages are a mess of custom syntax. Basically only wikilinks are in common with Obsidian. By your reasoning we shouldn’t use queries or block references for example. Sorry but Logseq is not built for interoperability at all, you need to export your pages in standard Markdown manually.

But I agree that if an user adopts the proposed syntax then even compatibility with Obsidian’s wikilinks is lost and I haven’t a solution other than the following, if it’s possible to implement it without ambiguities and UI/UX issues:

- title:: [[The Picture of Dorian Gray]] is a class:: [[book]] by author:: [[Oscar Wilde]].

But the above properties:: must take as value only the following [[wikilink]] and ignore , that usually separates multiple values. This can be confusing to the user.

The idea is that [[property::value]] is exactly the same as property:: [[value]] but inline. There wouldn’t be a way to write the equivalent of property:: value and I don’t see why it would be needed since Logseq interprets [[value]] and value as the same, as you explained when mentioning types.

And about types, I don’t see what would be the advantage and it would make everything more complicated for the user. For me Logseq does a good job in understanding what the user means (42 is an integer etc).

Yours is an interesting reply anyway, thanks.

I suspect we use Logseq in different ways, I like using Typora as an editor. Whilst it is odd to see every block as a bullet point, it is at least correctly rendered and links work. Altering the link syntax will break that, which may be perfectly acceptable, and as you say one doesn’t have to use it.

Regarding Logseq’s syntax for queries, you’re right. In my mind it should use the fairly widely accepted triple backtick+language syntax, which is interpreted as a code block for inline execution (see, for example, R Markdown). The advantage is that it’s backwardly compatible with the double backtick syntax for rendering a code block; editors that can’t launch an interpreter just render the code. Given that queries are written as a data structure defining the operations to be undertaken that is passed to a function, it wouldn’t be too difficult to convert {{query (….) }} to (query (…)) — i.e. a Clojure form — within a triple backtick code block. I suspect that’s what’s going on under the hood in any case.

I know this is slightly off topic, but giving Logseq the capability to execute arbitrary Clojure code blocks on the occurrence of various internal events would be enormously useful. It is, however, a matter for another thread I feel.

Just out of curiosity, how would you escape the execution and just render the code?

Since we are having fun discussing syntax, here there is a possible one:

- ```
  some code in a supported language
  code-exec:: true

@alex0 Apologies for the delay in replying. The approach I’m suggesting is drawn from R Markdown and more fully described here.

The original Markdown spec used indentation with at least four spaces or a tab to delineate a block that should be typeset in a fixed width, typewriter style font. This was common for code snippets, so syntax was designated as a “code block”. As an alternative to indentations, code could be “fenced” using ``` or, in some (most?) interpreters, ~~~.

That was fine, but it meant that all programming languages were shown in the same manner. Programmers, on the other hand, are used to syntax highlighting editors and IDEs, which pick out and emphasise syntactic features of the language features such as keyword. Thus the ``` notation was further extended with a parameter list, bounded by braces { … }, further instructing the markdown interpreter how to display the fenced code.

The first element in the parameter list is a language designator. Thus ```{clojure} means pretty print/ syntax highlight the remainder of the block as Clojure code. There is now a long list f supported languages!

If the parameter list has only one element – the language name – the { } can be dropped, hence ```clojure and ```{clojure} are syntactically identical.

If the parameter list has more than one element, the remaining elements are passed to the whatever formatter the interpreter calls. Typically these arguments control the HTML output (see, for example, Python Markdown). I don’t think there is any standardisation of this.

R Markdown was designed as a tool for creating high quality, data rich documents and reports. It treats an otherwise standard markdown file (with a YAML header block) as a sequence of “chunks”, interpreting standard markdown in the normal way but, but executing rather than formatting ``` fenced code blocks (at lest those in R and some other languages – Python and bash for certain). If you want to show the code rather than execute it, you simply use indentation or (I think, because t’s been a while) ~~~ fencing.

R Markdown’s process is two stage. First fenced (R) code blocks are executed in the context of the interpreter, meaning that they have access to data that has already been loaded and to any YAML data. The means that an initial code chunk can, for example, load data from a file, database or website, later chunks process it and still later chunks display the results of that processing; I’ve done this for some reports that involved Monte-Carlo modelling.

The output (if any) of each code chunk must be valid markdown; typically, as raw HTML is itself valid markdown, the output is an HTML div. These, plus any raw markdown in the file, are then passed to a process called knitr , which “knits” them into a unified markdown file. This process is typically controlled by values passed through the code block’s parameter list, so you frequently see knitr directives that control the captioning, size and positioning of the div created which a code block is executed.

Finally, the second stage markdown/ HTML is passed through pandoc to render it as HTML pages, PDF, LaTex, Word or whatever.

Now, R Markdown (and an extension called Bookdown, which is designed for long form technical documentation, writing theses etc.) have very different objectives from Logseq; they are publishing, not knowledge management, tools. Nevertheless, the ability to dynamically execute code within a markdown document (a block in Logseq) is hugely powerful. Given that Logseq is coded in Clojure and that R and Clojure share an ability to dynamically load and execute code chunks (using the eval function, but I think there may be better ways involving pre-compilation to JVM byte-code and dynamic loading) taking a leaf from R Markdown’s book is something that I feel is well worth investigating.

What would particularly interest me is if such code could have the ability to walk the page/ block graph, extract text and properties for processing, set property values, create links etc. (perhaps calling Logseq’s own functions to do so). Other PKM applications, notably Tinderbox, can do this, but Tinderbox uses it’s own language (Action Script) and a “semi-proprietary” file format (everything is stored in as XML) so it’s pretty much a walled garden. Clojure, on the other hand, opens the door to just about anything written in Java.

Sorry for the long explanation, I hope it was useful.


Thank you for the informations, they were interesting for me!

I still don’t understand if you are suggesting Logseq should run Clojure code in ``` ``` or something else involving other languages.

I just understand that you want to keep it compatible with other editors, so it should be “standard” Markdown.

If you are suggesting Logseq should run code, when would it happen? When exiting edit mode and rendering the block maybe? Or exec all the code blocks in page when it’s open? Or manually with buttons?

And how would you display the output? Maybe display it as text in a child (or parent) block? Or do you want the output to replace the text of the block once rendered (as it happens now with queries)?

For example:

var="This is a [[reference]]"
echo $var

Would it render like a block containing this text?

This is a [[reference]]

And do you want the language to be Clojure? To be honest I prefer simpler languages like Python for something like this. I understand that supporting a language other than Clojure would be hard.

The gist of my suggestion is that:

  1. text fenced ~~~ or indented should be displayed as code in the conventional fixed width font;

  2. text fenced by *lang* or *{lang code-block-parameters} should be executed prior and its result (if any) rendered in place of the block’s text;

  3. if lang is {clojure …}, the code should be executed in the context of Logseq so that it has access to Logseq’s then current runtime data (pages and blocks) and can perform CRUD operations on it – effectively the code extends Logseq’s built in functionality;

  4. if lang is {anything-other-than-clojure …}, it is invoked with any code-block-parameters in its environment and the code block (if any) passed to it via stdin, the result of the code block’s execution being read by Logseq from the invoked process’ stdout and stderr. This approach allows not only scripts to be interpreted, but other programs to be run with the contents of the code block (if any) passed as input data.

Only after execution would Logseq then interpret the block’s Markdown/ HTML and render the block on screen.

Given that Logseq cannot know what data the code-block is touching, I think it would be difficult to cache the results of execution so I suspect that Logseq will have to run the code every time the block containing it is rendered. Your suggestion of decorating code block with block parameters that can control execution is very sensible; although I would suggest that they would be things like are these results cacheable, width and height constraints for rendering etc. There remains much to be thought of here.

If Logseq won’t implement this it could be implemented with an external tool:

- You have read <output-here> books this year.
  - <code>

The external tool would parse the .md file for <code>, exec it and replace the <output-here> in the parent block that would be then rendered by Logseq. Since the code is in a child block it can be collapsed.

This is worth discussing in a dedicated thread, in General Discussion maybe.

1 Like

Please continue the discussion about inline properties here:

CC @brsma


isn’t this graph a good insight for you to review the things that you read and note?
maybe it shows that you read too many unrelated random things.
too much noise maybe?

I don’t think a graph is bad per se. If a graph has an additional structure for navigation, it can be very useful.

The problem I see is with Logseq’s graph display only. This is a display issue, Logseq’s internal data model can already do what is suggested here, but the presentation is just not suitable to browse any larger number of tags without any structure.

If you look at the Zotero example on top (very first picture in this thread), the graph presentation is pretty but useless. If users need to get rid of information just to make the Logseq graph presentation usable, then something is wrong with the presentation.

A big part of knowledgemanagement is curating the information. Your example of a lot of random things could be handled by putting them in a suitable (poly)hierarchy to make the information findable.

1 Like

I haven’t read the entire thread yet, excuse me if I’m commenting on a trifle or something that has already been commented on, but for the sake of simplicity, in my opinion, and without going into the issue of visualizations or user interfaces, regarding the hierarchy of categories, I think that:

  • Tag links should point to special pages where there is a template that references not only pages with the same tag, but also special category pages, like the page in question, that have been tagged with that category.

That would provide a taxonomic structure of categories and could perhaps be a good starting point.

Yes, exactly. In principle, Logseq’s data model already supports all of this. All we need is to give every tag page a couple properties (related, narrower, broader etc.) and this is all there is to it.

What is missing is a way to traverse the properties and display the results as a browsable tree.
@alex0 wrote a proposal to do so.

Ultimately some other UI/UX features would be nice, e.g. a way to easily access broader/narrower/related items, a tree view in the sidebar, a way to efficiently manage tags etc.


Great, then. Something that I would like to find, regarding, say, special pages, is something that visually differentiate them from, say, regular pages, in order to minimize the possible eventual confusion. However, I really like the concept that in Logseq everything is a block. In that sense, I would not treat special pages in a special way, beyond that it could include a predefined template, but easily readable and editable by anyone.

That is why I think that solving the issue of dynamic templates is a priority (allowing the templates to work as the macros work, by transclusion), and that the templates only make a substitution when it is expressly indicated at the time of being called. This is imperative to develop a modular model of structured knowledge, both for taxonomic structures of blocks, categories or properties, and to define concepts through semantic queries. Or simply to allow users to edit their templates without that meaning having to fix all the pages in which it was used before.

I will take a look at it, but right now that layer worries me less.

Of course, and once the base is solid and flexible at the same time, as well as scalable, the possibilities shoot and can be very creative and innovative, in this regard. Sure. And it is a very motivating field, I imagine. I myself, without going any further, can have many colorful and extravagant ideas about it, for being a simple jazz musician, but the important thing here, IMHO, is that these two aspects must be independent layers, that its architecture allows all those possibilities, but its not chained to any of them.

1 Like

After writing my last message in this thread, I have been thinking for a while and it has occurred to me that, since the meaning of categories is to determine the classes to which certain entities can belong, in relation to a given ontology, and that the classes allude to the fulfillment of certain properties or attributes; the act of categorizing an entity should entail the assignment of presumed properties to that class.

That is, tagging would, in effect, apply a template that will assign the relevant properties to the entity in question.

In this way, the structure is implicit and consistent with the ontology with which you are working on.

1 Like

@Didac I think the question that needs to be addressed is whether LogSeq is primarily an application for managing linked, unstructured text items that can be optionally annotated with loosely defined structured data (tags, attributes) or primarily a database of structured data, any record within which can be annotated with text. Your suggestion of tags constituting well formed classes is essentially proposing the latter.

The issue of classification ontologies has been discussed at length elsewhere, eg Knowledge Management for Tags / Tag Hierarchies - #16 by GaiusScotius. I still subscribe to the position that synonym sets and the relations beetten them should underpin any classification system that is intended to be both long lived and extensible.

On the issue of associating structured data with concepts (tags), my view is that LogSeq would do well to look at prototypical/ differential inheritance. This is the tack taken by applications like Tinderbox.

The concept is that a note may have a prototype from which it inherits attributes, including tags. It may then add new attributes and/or override inherited values and may itself act as the prototype for further notes.

In this model, tags provide the semantics of notes independent from the structure of any data that a user may wish to associate with notes related to that concept. “Root notes” — call them classes if you wish — define names attributes and default values. These act as prototypes for other notes that “inherit” their attributes and classifications.

Prototypical inheritance — sometimes also called slot based inheritance — is well understood and is seen in several programming languages including JavaScript, Self and Io. In these implementations, however, creating a high level object called, say, Person does not explicitly define what being a Person means. Adding a reference to a separate note that defines the meaning of Person would I suggest, be a very powerful extension.


I think the question that needs to be addressed is whether LogSeq is primarily an application for managing linked, unstructured text items that can be optionally annotated with loosely defined structured data (tags, attributes) or primarily a database of structured data, any record within which can be annotated with text.

@GaiusScotius At the moment, I find it difficult to rule out that this dilemma of “to be or not to be structured” is not a false dilemma, here.

I understand that, in any case, we are talking about a feature request that would make use of the Logseq Core Hooks and that would be located in one of the Plugin layers.

Because I don’t think it is on the table to change the architecture of Logseq Core or its Outliner Module, which already works with structured data and establishes the relationships between the blocks by determining their position in the graph

So, this point seems to be overcome, from what I understand, since the proposal assumes the infrastructure on which it is based. And both options can coexist, being not mutually exclusive, so I don’t see that resolving this issue is a prerequisite for moving forward.

Do you think that so far we could agree on this in general terms? Or, why do you think we should first opt ​​for one or the other option? Because one seems to be an evolution of the other…

1 Like

@brsma, just wanted to thank you for the clarification. This has been quite revealing to me. I’ll be thinking and reading about it for a while, for sure.