Knowledge Management for Tags / Tag Hierarchies

How would you build the hierarchy? Typical users will have hundreds to tens of thousands of tags, we’ll need an efficient way to sort them.

You may need to run a separate program, like Tematres or Vocbench, to manage vocabulary. I have that stuff installed but I don’t have a workflow… so it just sits there. I prefer the Pool Party interface but it’s not for individuals

I don’t see why you think this would be more computationally intensive than the rest.

It’s just like queries but iterative to “follow” a property.

Anyway we shouldn’t really discuss this kind of things here.

This isn’t a forum for discussion, BUT this might be the real, underlying technic feature request that will ultimately enable the behaviour @gax and I have been wanting from the software.

Something like “Make iterative queries so users can construct their own property-based hierarchies”. There may be other use cases for iterative queries too.

So this inappropriately-situated discussion may be background for that real feature request. If @gax agrees, maybe we can ask an admin account holder to move this thread back to general discussion, and write a new, more focused feature request. I think admins can do that here.

Let’s continue the discussion on the other thread Would a rich commitment to hierarchies and classification be an anathema to Logseq culture? - #10 by gax
I’ll post there shortly.

I realise that it’s been some time since this thread opened, but if the topic of tagging is still being discussed may I add a comment.

I’ve searched the entire forum, but it appears nobody has thought of using the model employed by WordNet as a structure for defining tags; I would recommend taking a look, not least because it is very well tested.

For those who may not have encountered it, WordNet is in essence a structure for defining natural language concepts – words, phrases – and relationships between them. Think of it as a “metamodel” for language (not just English, any language). We describe everything using language, so the meta model concepts are generalisable to all fields of endeavour. WordNet is (indeed cannot be) complete in respect of every discipline – history, medicine, biology, chemistry etc. – but it can be extended into them without having to add to its “language metamodel”.

From the point of view of tagging, the key elements of WordNet that I would focus on are synonyms, parts of speech, hyper- and hyponyms, and mero- / homonyms. There is more, but all tags I have come across relate to either things (the tag is a noun) or actions (the tag is a *verb) and these concepts suffice. If there are live examples of parts of speech other than nouns and verbs being used as tags I’d like to hear about them.

WordNet recognises that the same concept may be described different words, i.e. synonyms. In fact its core construct is the “synset”, the group of words that all describe the same thing. A big part of the problem with tagging as an information discovery tool is that different people at different times use different synonyms for the same concept. When we are looking to find something we are basically asking the question, what do I know about X? X is a concept, like a ship or a fight or a battle, not necessarily any specific term (word). In Logseq a tag is a note, and a note can easily define synset (i.e. a list of words), it ought to be relatively straight forward to identify the note containing the given “tag word” and link to it rather than straight to a note having that word as its title.

Words, of course, may have multiple meanings. English (for example) freely uses nouns as verbs; there is a big conceptual difference between a fight and to fight or a fly and to fly. Synsets have descriptions of the concept they define (there are four for fly as a noun and 14 for fly as a verb!). Tagging to the synset rather than the word resolves the ambiguity making subsequent query more precise.

Hyper- and hyponyms are respectively generalisations and specialisations of concepts. Thus “dog” (in the sense of a member of the genus Canis that has been domesticated by man since prehistoric times; occurs in many breeds) has “canine” (any of various fissiped mammals with nonretractile claws and typically long muzzles) as a hypernym and various breeds of dog as hyponyms. Importantly, WordNet differentiates between a concept and an instance of that concept. “Labrador Retriever” is a hyponym of “dog”; my dog “Pepper” is an instance hyponym of “Labrador Retriever”.

Finally – at least from the perspective of tagging – WordNet describes mereological (i.e. part-part of) relationships; thus the concept of an (internal combustion) engine has part meronyms of camshaft and cylinder. Cylinder and camshaft have engine as their holonym.

Anyway, a suggestion that I feel is worth investigating further.


Interesting! I’d never heard of WordNet before.
I found the wikipedia article helpful WordNet - Wikipedia .

There has been another suggestion by @menelic to add properties to every link:

This is something of a superset of this proposal, I wonder if both could be merged.

1 Like

Cosma looks very interesting, I’d not come across it before or Jugglr for that matter. Just goes to show the value of this forum in cross pollinating ideas. Thanks.

1 Like

I think there’s a conceptual problem here in how people store information and take notes rather than a technological one to be solved by features.

It’s key to understand that extensive keyword tagging in PKM graphs is detrimental. Exactly the kind of keyword tagging that you use in databases lke Zotero to find stuff will completely ruin your PKM system (been there, done that :flushed:).

Here is a discussions about that topic that I found extremely helpful:

Key quote:

Extensive content-based tagging is a known anti-pattern because tags create a weak association at best between notes.
By using content-based tags you are making yourself feel that you are creating associations but you are still really shifting the burden to your future self to figure out why the notes are associated.
This becomes a significant problem when you have a larger corpus and your tag ontology begins growing beyond control.

That said, I’d totally vote for qualified semantic links in the form of something like [[qualifier::note]], e. g. [[because::tags only create a weak association between notes]] or [[parent::product management MOC]] Currently I am sometimes using qualifier tags (paired with tag-specific CSS for making them visually stand out) as a workaround. (Disadvantage: separate qualifier tags are not coupled with the referred note, just weakly associated by position.)

PS: tags already are hierarchical if you want to. Because tags and pages are equal in Logseq. That feature is already there.

1 Like

Isn’t this proposal enough to encode more relational information? The references we have now would be weak connections while relations specified by properties would be stronger connections in the knowledge graph.

1 Like

Yes and no. For some of the use cases I read here it totally seems to make sense (as far as I can judge). From my own perspective not so much, though. Here’s why:

In contrast to qualified links page properties stored in the header are not contextual and not specific. I can neither create nor discover them in writing. Moreover, if there is more than one relationship modelled via page properties I cannot know or indicate which of these is meaningful in my note’s context. You need to model, manage, and keep all relationships in the specific page header rather than having them spelled out where they actually matter. This obstructs one of Logseq’s core features, namely emerging connections between notes within context.

I think that we have two significantly different – and moreover: conceptionally incompatible – mental models at play here in how we approach working with knowledge:

  1. Relational Database (RDB): distinct (semi-)structured objects of more or less similar types with specific key/value fields (modeled as tables) storing data and modelling relationships
  2. Graph: unstructured objects of varying types with relationships expressed as edges between graph nodes. (Nodes can have key/value properties, ideally storable anywhere in the node)

Several of the contributors here seem to approach Logseq from a RDB perspective and not from a note graph perspective. Fair enough. But if you approach a graph-based tool with a RDB perspective you won’t get far (and vice versa). You can observe similar tendencies in the Obsidian neighbourhood where people are using DataView to tweak the system into a classical database (with all relevant data stored as key/value table in the header rather than the note body). To me that’s like trying to swim faster by putting on the best running shoes because they work so well on land :wink:

TL;DR: If you need a RDB, use a RDB and not a graph. The only tool I know that does both fairly well is Notion. And it does so by encapsulating the RDB blocks and keeping RDB records and graph nodes separate from each other. That way you can interact with both using an appropriate interface etc. without shoehorning one concept badly into the other.

PS: I totally acknowledge that there’s a warranted need combining both models in one ‘Integrated Thinking Environment’, and being able to use the fitting one depending on which kind of content you deal with. I just strongly object against turning any decidedly graph-based tool into a mishmash that tries to do both at once – and consequently neither well because the fundamental models involuntarily are clashing. That said: how about following Notion in that respect and implementing an RDB model on a block basis (perhaps using CSV instead of MD for storage)?


So if I understand it correctly you want to enrich references with a property. But the syntax you mentioned would create a page for a single statement.

Instead I would extend my approach to blocks:

- property:: <block-id>
  A statement.

for example:

- because:: 1234-5678-1234-5678
  A statement.

The block with ID 1234-5678-1234-5678 contains another statement that in this example is the reasoning that justify the first statement.

What do you think?

Yes, Logseq derives a graph from references but I don’t see why Logseq should prefer this model over the RDB one. I think Logseq just provides very general and abstract tools and each user tries to come up with a data structure that work for them.

But I think both graph UI/UX and RDB UI/UX like queries are not yet ripe in Logseq. I hope Logseq will develop both and the tools to integrate them.

Isn’t this already the case? Queries work on a block basis by default: the results of queries are blocks and their properties can be used in queries to filter or in the visualization (columns of a table at the moment).

No necessarily (see Introduction to Semantic MediaWiki - – where I borrowed the syntax): the first part before the :: separator would actually create a page or block property with the second part as the property’s value.

The significant difference lies in the storage format of the data (and consequently how you can edit/mangle/query/… it). In an RDB you have a fixed set of fields in a fixed tabular structure. When you add or delete a field to the DB schema this impacts all records in the table. And you can only store content that adheres to the RDB table schema (a massive benefit if your data is structured). The only common aspect is that in both cases you can store and query key/value pairs per record.

1 Like

Which is most probably why the database engine on which Logseq is built is a graph database: It needs to be highly malleable :slight_smile:

The first answer here (sql - Comparison of Relational Databases and Graph Databases - Stack Overflow) is explains it better than I could quickly:

There actually is conceptual reasoning behind both styles. Wikipedia on the relational model and graph databases gives good overviews of this.

The primary difference is that in a graph database, the relationships are stored at the individual record level, while in a relational database, the structure is defined at a higher level (the table definitions).

This has important ramifications:

  • A relational database is much faster when operating on huge numbers of records. In a graph database, each record has to be examined individually during a query in order to determine the structure of the data, while this is known ahead of time in a relational database.
  • Relational databases use less storage space, because they don’t have to store all of those relationships.

Storing all of the relationships at the individual-record level only makes sense if there is going to be a lot of variation in the relationships; otherwise you are just duplicating the same things over and over. This means that :point_right:graph databases are well-suited to irregular, complex structures.:point_left: But in the real world, most databases require regular, relatively simple structures. This is why relational databases predominate.

(note the highlight :))


Oh OK, so from the linked PDF:

The page is about the book “The Picture of Dorian Gray” and somewhere in the page the author “Oscar Wilde” is mentioned.

So I suppose the syntax for “Oscar Wilde” would be something like:

[[author::Oscar Wilde]]

That would be treated by Logseq as a link like [[Oscar Wilde]].

But in this example author:: clearly refers to the page The Picture of Dorian Gray. But I think this wouldn’t work in Logseq that is block-based.

I think a block like this in Logseq would make sense:

- [[title::The Picture of Dorian Gray]] is a [[type::book]]
  written by [[author::Oscar Wilde]].

Rendered, with links, as:

The Picture of Dorian Gray is a book written by Oscar Wilde.

But at the same time that block gain the properties:

type:: book
title:: The Picture of Dorian Gray
author:: Oscar Wilde

This is brilliant! Now I really want this implemented!


A bit better, isn’t it? :relaxed:

Still, while the format above works well for SMW’s page-centric model it does not yet feel completely fit for Logseq’s block model to me. But I have a hunch that with another round of deeper hands-on thinking we could actually come up with a model & syntax that’s truly block-based.

Basically we would somehow need to explicitly declare the left-hand side of the relation, as well. Your solution of using title:: for that purpose might come close. On first sight it’s utterly elegant from my perspective :slight_smile:

But there’s a catch (sigh…): does [[title:: The Picture of Dorian Gray]] then create a title:: property for the block (I assume not, because it would break the current file storage model) or for the page [[The Picture of Dorian Gray]]. How do we know then that the other properties declared in that block are either properties of the page [[The Picture of Dorian Gray]] or of the block? This is tricky.

They are all properties of a block, that’s just how we add data to the database in Logseq. If a block is the first one of a page its properties become the properties of the page (basically a page is a block with a name). Here there is an example of a page with properties and a block used to add an entry to the database:

This is the title of the
- title:: This is the title if the page
  another-page-property:: ...
- type:: book
  title:: The Picture of Dorian Gray
  author:: Oscar Wilde

So you can just use blocks with all the properties you want, even title::
If you want to use a page to define a DB entry you can do so: but if you use the special property title:: you must be OK with that being the title of the page (and the associated file name).

At the moment if I write this block:

- [[The Picture of Dorian Gray]] is a [[book]] written by [[Oscar Wilde]]

This block link together three pages without specify any kind of relation, they are all just mentions.

If I want to store the above info with the RDB model I would write this block instead:

- type:: book
  title:: The Picture of Dorian Gray
  author:: Oscar Wilde

These two methods are redundant; the former is better for human reading and fits into a longer text; the latter is a RDB entry.

I’m saying that with the proposed syntax it would be possible to specify a DB entry with a sentence, so just one block that fits in a longer text and also encode the relations:

- [[title::The Picture of Dorian Gray]] is a [[type::book]] written by [[author::Oscar Wilde]].

So basically “Oscar Wilde” is the author of a block, this block is defined as a “book” and again this block has a property, “title”, that is “The Picture of Dorian Gray”.

I hope this makes clear how we use Logseq with a RDB model.

Anyway thank you very much for making me aware of this!



It would be really cool if an algorithm (maybe some Machine Learning) could automatically suggest references and properties.

For example if I write:

- The Picture of Dorian Gray is a book written by Oscar Wilde

it would ask me if I want to turn it into this:

- [[title::The Picture of Dorian Gray]] is a [[type::book]] written by [[Oscar Wilde]]

So that with a query I can for example display a table with all the blocks with type::book and see the above as an entry:

Title Author
The Picture of Dorian Gray Oscar Wilde

Even the opposite would be cool: if I have some kind of structured data (MySQL DB, CSV, JSON etc) I could programmatically generate Markdown files, for example:

The Picture of Dorian
- [[title::The Picture of Dorian Gray]] is a [[type::book]] written by [[Oscar Wilde]].


- type:: author
- ## Books
  - Oscar Wilde wrote the following books:
    {{ query (and (property type [[book]]) (property author [[Oscar Wilde]]) }}

jq can do this easily :slight_smile: See GitHub - 0dB/dayone-json-to-obsidian: Update Obsidian vault from Day One (“DayOne”) JSON using command line scripts. for an example how to. I recently converted the JSON export of my pinboard bookmarks to markdown files that way. Having ca 23k bookmarks from ca. 15 years of bookmarking with pinboard lying around as individual MD files turned out to be a somewhat dead end approach, though :grimacing: