Knowledge Management for Tags / Tag Hierarchies

gax · July 6, 2022, 1:39am

Why do we need Knowledge Management for Tags?

Graphs will quickly have far too many tags
We need a way to organize and browse tags

The Problem

Tagging leads to a large number of unrelated tags (thousands) without any structure
Browsing these tags through a list, graph, or tag cloud, is not very efficient
As shown in another thread by @ChrisYT and @Zdenek_Hurak items imported from other systems (e.g. Zotero) have many overlapping tags, such as “History, 20th Century”, “history”, “History / World”. In practice, this creates graphs that look like this

b9656e9b3e780e5655a9f13e72ce81791f850e56_2_610x4991610×499 70.5 KB

and is practically unusable, the recommended solution being to just delete the tags (and discard the information contained in the tags).

The Solution

Many efficient search strategies exist elsewhere:
- faceted search on [OpenLibrary] (https://openlibrary.org/subjects/architecture) (Search for “Architecture” then then narrow down by Subjects, Places, People, Times,…) or Amazon
  
  pec-facet-nav-amazon11000×738 160 KB
- browsable (poly)hierarchies, like the Dewey system or taxonomies.
  
  TEMPLATE_image0021473×372 18.7 KB
- Graphical visualizations with embedded links
  
  da708128987034e6c0b3b8a0ccac3c0511300×866 136 KB
To efficiently search e.g. this animal taxonomy, we need to encode the relationship between tags
Generally, these hierarchies will often not be strict containment hierarchies, but many elements will have multiple parents, this is called a polyhierarchy:

An-example-of-part-of-a-place-name-poly-hierarchy-Examples-of-the-application-of-the1850×333 26.4 KB
Logseq’s hierarchies of the form [[parent/child/teddy]] are not sufficient for several reasons:
- Each child can only have one parent (teddy can be both in the child, and in the stuffedAnimals category, but currently this can’t be recorded)
- The classification is specified on the pages themselves and can’t be added on later
  - If I tag 100 pages with teddy and later want to add the tag to a hierarchy, I need to edit every single page. Instead, it should be possible to tag pages, and then later classify tags centrally, making all of the original pages findable under the proper hierarchies

What needs to be done?

Logseq needs a way to specify relationships between tags. One approach, widely used widely used, e.g. by libraries, is the Simple Knowledge Organization System (SKOS):
- TagA is a broader/narrower version of TagB
- TagA is related to TagB
These relationships are captured centrally, such that tags can be managed without editing each tagged page individually
We need a user interface to hierarchically browse Logseq pages (not part of this feature request)
We need a user interface to easily add and edit tag hierarchies (not part of this feature request)

Example use case

Historian Bob studies animals in history.
He reads the following book:
Ark Royal : the Life on an Aircraft Carrier at War 1939-41.

Author:	Sir William Jameson
Publisher:	Penzance : Periscope Pub., 2004.

The book has these library classifications:

Dewey 940.545
- Class 900 – History and geography
  - 940History of Europe
LCC
- World History And History Of Europe, Asia, Africa, Australia, New Zealand, Etc.
  - History (General)
  - World War II (1939-1945)
    - Naval operations
      - Anglo-German By engagement, ship, etc., A-Z

The book is tagged with Dewey:History of Europe and LCC:Anglo-German By engagement, ship, etc., A-Z. “Dewey:” is the namespace for the Dewey system, and “LCC” is the namespace for the Library of Congress Classification. These tags be added easily automatically by an improved Zotero plugin.
So just by automatically importing library classifications, we can already browse our books by the Dewey and LCC systems.

The book is also automatically classified by author last/first and year.

The book also mentions a cat, Unsinkable Sam, so
Bob tags the book with animals:F. catus. This makes the book appear in a hierarchical search about animals as well.

So with very little effort (a single manually added tag so far), Logseq can already automatically generate 5 browsable hierarchies:

/Books/ByAuthor/Jameson/William
/Books/ByYear/2004
/Dewey/History and geography/History of Europe
/LCC/World History and …/History (General)/World War II … /Naval Operations/Anglo-German…
/animals/…/…/Mammalia/…/Felinae/…/F. catus

Additionally, Bob has a few lightweight classification schemes that fit his work, so he tags the book with bob:aircraft-carriers and bob:non-fiction, this additionally makes the book available under

/bob/military/navy/aircraft-carriers
/bob/literature/non-fiction/

Further, some plugin might provide a faceted search, so Bob can search under /bob/military and then narrow down by animal type.

Bob told Logseq that aircraft-carriers is narrower than navy which is narrower than military, so Logseq can also generate these search hierarchies automatically.

The process is very lightweight, so Bob can easily tag individual blocks of his notes.
It does not affect the current use of tags either, so Bob does not need to classify all tags from the beginning, he doesn’t even need to use the hierarchical capabilities at all.

The Library of Congress also uses SKOS, similarly, Logseq would provide navigation to broader, narrower, and related terms:

LCC:

Example implementation

As @alex0 mentioned, tags are pages themselves, so in principle, the information about broader and more general tags can be stored directly in the tag page without changing the Logseq data model:

Would a rich commitment to hierarchies and classification be an anathema to Logseq culture?

xample [[Parent]] can have properties like

subcategories:: [[Child 1]], [[Child 2]]
another-hierarchy-subcategories:: [[Child A]], [[Child B]]
extends:: [[X]], [[Y]]
extended-by:: [[Z]]
generalizes:: [[W]]
whatever-you-want:: ...

The problem is that currently no user interface exists for editing these relations across files.

The following example uses Markdown to represent a subset of SKOS. Such a Markdown file could be autogenerated based on the parsed tag files to present all hierarchies in a single document. Changes would then be propagated back to the individual files.

Tags are connected using the relations broader, narrower, broaderTransitive, narrowerTransitive and related
- broader, narrower
  - specify that one tag represents a broader/narrower concept than another
- broaderTransitive, narrowerTransitive
  - specify that one tag represents a broader/narrower concept than another and all its children/parents
  - e.g. A Cat is a narrower concept of a mammal, which automatically makes it a narrower concept of Animal
- related
  - two tags are related, e.g. Apples and ApplePie
This is not to suggest any specific syntax, is just an example how to display the data as Markdown itself in the spirit of Logseq for easy editing.
- Alternatively, Logseq could directly parse SKOS RDF/Turtle description files.
- Several editors exist to create these description files.
- SKOS is a minimal example, other knowledge management systems exist, and in principle Logseq could record arbitrary relations between tags.
- SKOS relationships can have additional metadata added, such as descriptions, translations, or even images, which opens up the possibility of providing an image carousel for the search.
The following relationship is a (small) section of the animal taxonomy.
Sub-items of the list are more narrower terms for their parent items. The lists can me arbitrarily nested. For example. Chordata and Mammalia are both narrower terms for Animalia. For a non-transitive relationship, Chordata would be a narrower description for Animalia, but Mammalia would not.
- The animal taxonomy has the namespace animals to distinguish it from other hierarchies that can exist in parallel. One item can be in multiple hierarchies at the same time
- ```
		  		  semanticRelation::narrowerTransitive
		  		  concept::animals
		  		  - Animalia
		  		    - Chordata
		  		      - Mammalia
		  		        - Carnivora
		  		          - Feliformia
		  		            - Felidae
		  		              - Felinae
		  		                - Felis
		  		                  - F. catus
		  		                  - F. silvestris
		  		  
		  		  
		  ```
```
- If a user tags an item with animals:F. catus, the item will automatically appear in a search for Animalia
- The user does not need to tag with the entire hierarchy ~~animals:Animalia/Chordata/Mammalia/Carnivora/Feliformia/Felidae/Felinae/…~~, as this would duplicate the hierarchy on every item. The tag is only animals:F. catus, from which Logseq can infer that we are dealing with a type of cat.
This is an example of a “related” relationship. All of the tags [frying, deepFrying, airFrying, grilling] are marked as related.
- If a user tags an item with the tag frying, a search for related items will bring up the other 3
```
	  semanticRelation::related
	  concept::cooking
	  frying
	  deepFrying
	  airFrying
	  grilling
	  
```
  - Related tags can also live in different namespaces
```
	  semanticRelation::related
	  cooking:frying
	  nutrition:fat
```

Many thanks to @boisjere and @alex0 for their contributions to this draft.
For more discussion, see this thread: Would a rich commitment to hierarchies and classification be an anathema to Logseq culture? - #25 by boisjere

alex0 · July 6, 2022, 9:50am

In general when asking for some feature you have better chances by keeping it as simple as possible. From your post you make it sounds like a complicated thing while as we said is already there.

Still, I don’t know what UI/UX you would like. I suggested a command like {{tree <property-name>}} to render an indented list of pages/tags. What else would you think will make the UX better? Maybe a graph of pages/tags whose relations are specified by properties? Maybe something like “Linked references” but with these relations?

Basically Logseq UI/UX is built around a relation, “references” (with the [[ ]] or #hashtag syntax). Everything we have in Logseq could be re-implemented with relations defined by the user with properties.

Are we asking to Logseq devs to generalize Logseq’s features around “references” to include other types of relations defined by users using properties? This way it may sound more appealing to devs. You see, Logseq devs like to provide generic powerful features and let the user build a custom workflow around them freely.

Let’s say you are in Graph View; some checkboxes let you choose what kind of relations you want to display. By default is set to “references” only (as it is now). Would’t that be a huge improvement? For me yes!

gax · July 6, 2022, 3:01pm

You are right, it is pretty simple. I probably gave way too much background, but I thought that many people might not be familiar with the subject. I’ve noticed lately that people tend to more and more prefer searches over hierarchies.

Having trees as first class objects would also be my preference. It also needs some way to filter the results (e.g. to show all pages that have a TODO tag within a hierarchy)

I am not a UX specialist, but I would like to have some switch to display items in their parent folder. For example, if you browse the animal taxonomy, many entries will be way at the bottom. If I select mammals, it should immediately have an option display all mammals without opening all leave nodes, or alternatively hid all the empty branches.
Your {{tree ...}} search should contain enough information that the interface can ignore the empty branches (e.g. each node can have a property how many children and other nodes it has to speed up displaying).

alex0 · July 6, 2022, 7:30pm

Relevant:

alex0 · July 8, 2022, 2:30pm

Hey there, I’m experimenting with a convenient way to combine properties and namespaces.

Basically I use properties to store all the data and namespaces only as a shortcut for queries.

Example of data/block place somewhere:

title:: Example
type:: definition
area:: [[Geometry]]

Then in a subpage like [[Geometry/definitions]] I place the following query:

{{query (and (property type definition) (property area geometry)}}

that will display only “definitions” from the area “Geometry”.

The point is that we can have as many properties as we like and each of them can have multiple values.

I manually create only the namespaces that I think are convenient to store useful queries.

I think this could be automated to a certain degree but before I need something to parse key:: value syntax in text files.

alex0 · July 13, 2022, 4:56pm

I think I have figured out the best way to manage “polyhierarchies” with current Logseq i.e. using an indented list of bocks:

	  		  - [[Parent]]
	  		     - [[Child]]
	  		       - [[Teddy]]

Specify relations between pages

Parent.md
<property>:: [[Child]]

Child.md
<property>:: [[Teddy]]

Tree

{{tree <property>}}

displays an indented list of pages by following that <property> specified by the user, i.e.

- [[Parent]]
  - [[Child]]
      - [[Teddy]]

Reverse tree

{{reverse-tree <property>}}

display another indented list but by following <property> in the other direction:

- [[Teddy]]
  - [[Child]]
      - [[Parent]

Breadcrumb

{{breadcrumb <property>}}

used in a page involved in the hierarchy above, for example Child.md, displays:

[[Parent]] > [[Child]] > [[Teddy]]

in case Child.md is involved in a more complex hierarchy it would be:

[[Parent 1]] > [[Child]] > [[Teddy 1]]
                         > [[Teddy 2]]
[[Parent 2]] > [[Child]] > [[Teddy 1]]
                         > [[Teddy 2]]

Reverse breadcrumb

(you got the idea)

Graph view

In addition it would be nice to display in the graph view only certain relations and not other ones (including “reference” that is the only type of relation we have now in the graph view).

For me it would be also useful to display the graph directly on a page with the command:

{{graph <property>}}

and for the graph of the current page only (like the one we already have in the sidebar):

{{page-graph <property>}}

I think this condense the proposal and I hope it would cover all the use cases we have in mind. “Editable” tree looks way more complicated to implement to me and I would be fine without it anyway.

gax · July 19, 2022, 7:34pm

How would you build the hierarchy? Typical users will have hundreds to tens of thousands of tags, we’ll need an efficient way to sort them.

boisjere · July 19, 2022, 8:00pm

You may need to run a separate program, like Tematres or Vocbench, to manage vocabulary. I have that stuff installed but I don’t have a workflow… so it just sits there. I prefer the Pool Party interface but it’s not for individuals

alex0 · July 19, 2022, 8:22pm

I don’t see why you think this would be more computationally intensive than the rest.

It’s just like queries but iterative to “follow” a property.

Anyway we shouldn’t really discuss this kind of things here.

boisjere · July 19, 2022, 8:43pm

This isn’t a forum for discussion, BUT this might be the real, underlying technic feature request that will ultimately enable the behaviour @gax and I have been wanting from the software.

Something like “Make iterative queries so users can construct their own property-based hierarchies”. There may be other use cases for iterative queries too.

So this inappropriately-situated discussion may be background for that real feature request. If @gax agrees, maybe we can ask an admin account holder to move this thread back to general discussion, and write a new, more focused feature request. I think admins can do that here.

gax · July 19, 2022, 8:45pm

Let’s continue the discussion on the other thread Would a rich commitment to hierarchies and classification be an anathema to Logseq culture? - #10 by gax
I’ll post there shortly.

GaiusScotius · August 10, 2022, 9:18pm

I realise that it’s been some time since this thread opened, but if the topic of tagging is still being discussed may I add a comment.

I’ve searched the entire forum, but it appears nobody has thought of using the model employed by WordNet as a structure for defining tags; I would recommend taking a look, not least because it is very well tested.

For those who may not have encountered it, WordNet is in essence a structure for defining natural language concepts – words, phrases – and relationships between them. Think of it as a “metamodel” for language (not just English, any language). We describe everything using language, so the meta model concepts are generalisable to all fields of endeavour. WordNet is (indeed cannot be) complete in respect of every discipline – history, medicine, biology, chemistry etc. – but it can be extended into them without having to add to its “language metamodel”.

From the point of view of tagging, the key elements of WordNet that I would focus on are synonyms, parts of speech, hyper- and hyponyms, and mero- / homonyms. There is more, but all tags I have come across relate to either things (the tag is a noun) or actions (the tag is a *verb) and these concepts suffice. If there are live examples of parts of speech other than nouns and verbs being used as tags I’d like to hear about them.

WordNet recognises that the same concept may be described different words, i.e. synonyms. In fact its core construct is the “synset”, the group of words that all describe the same thing. A big part of the problem with tagging as an information discovery tool is that different people at different times use different synonyms for the same concept. When we are looking to find something we are basically asking the question, what do I know about X? X is a concept, like a ship or a fight or a battle, not necessarily any specific term (word). In Logseq a tag is a note, and a note can easily define synset (i.e. a list of words), it ought to be relatively straight forward to identify the note containing the given “tag word” and link to it rather than straight to a note having that word as its title.

Words, of course, may have multiple meanings. English (for example) freely uses nouns as verbs; there is a big conceptual difference between a fight and to fight or a fly and to fly. Synsets have descriptions of the concept they define (there are four for fly as a noun and 14 for fly as a verb!). Tagging to the synset rather than the word resolves the ambiguity making subsequent query more precise.

Hyper- and hyponyms are respectively generalisations and specialisations of concepts. Thus “dog” (in the sense of a member of the genus Canis that has been domesticated by man since prehistoric times; occurs in many breeds) has “canine” (any of various fissiped mammals with nonretractile claws and typically long muzzles) as a hypernym and various breeds of dog as hyponyms. Importantly, WordNet differentiates between a concept and an instance of that concept. “Labrador Retriever” is a hyponym of “dog”; my dog “Pepper” is an instance hyponym of “Labrador Retriever”.

Finally – at least from the perspective of tagging – WordNet describes mereological (i.e. part-part of) relationships; thus the concept of an (internal combustion) engine has part meronyms of camshaft and cylinder. Cylinder and camshaft have engine as their holonym.

Anyway, a suggestion that I feel is worth investigating further.

gax · August 11, 2022, 7:13pm

Interesting! I’d never heard of WordNet before.
I found the wikipedia article helpful WordNet - Wikipedia .

There has been another suggestion by @menelic to add properties to every link:

This is something of a superset of this proposal, I wonder if both could be merged.

GaiusScotius · August 12, 2022, 8:00am

Cosma looks very interesting, I’d not come across it before or Jugglr for that matter. Just goes to show the value of this forum in cross pollinating ideas. Thanks.

brsma · August 16, 2022, 7:22pm

I think there’s a conceptual problem here in how people store information and take notes rather than a technological one to be solved by features.

It’s key to understand that extensive keyword tagging in PKM graphs is detrimental. Exactly the kind of keyword tagging that you use in databases lke Zotero to find stuff will completely ruin your PKM system (been there, done that ).

Here is a discussions about that topic that I found extremely helpful: https://www.reddit.com/r/ObsidianMD/comments/n7m5gx/backlinks_vs_tags/

Key quote:

Extensive content-based tagging is a known anti-pattern because tags create a weak association at best between notes.
By using content-based tags you are making yourself feel that you are creating associations but you are still really shifting the burden to your future self to figure out why the notes are associated.
This becomes a significant problem when you have a larger corpus and your tag ontology begins growing beyond control.

That said, I’d totally vote for qualified semantic links in the form of something like [[qualifier::note]], e. g. [[because::tags only create a weak association between notes]] or [[parent::product management MOC]] Currently I am sometimes using qualifier tags (paired with tag-specific CSS for making them visually stand out) as a workaround. (Disadvantage: separate qualifier tags are not coupled with the referred note, just weakly associated by position.)

PS: tags already are hierarchical if you want to. Because tags and pages are equal in Logseq. That feature is already there.

alex0 · August 17, 2022, 1:13pm

Isn’t this proposal enough to encode more relational information? The references we have now would be weak connections while relations specified by properties would be stronger connections in the knowledge graph.