Would a rich commitment to hierarchies and classification be an anathema to Logseq culture?

alex0 · July 22, 2022, 9:54pm

If I had to design this from scratch:

every folder in our file system would potentially be a “graph”, even nested in each others
the info about a certain graph would be saved in an hidden .logseq folder in every folder that is a graph
it would be possible to open every Markdown file with Logseq and when doing so it creates a graph starting from that folder (creating the .logseq one)
mentioning pages between different graph could be possible using relative links (../ syntax to go up)
journal would be a plugin to manage one or more specified folders/graphs (even hidden ones inside the ones above)
Logseq wouldn’t be set to “one graph” for each instance, instead it would manage graphs like file managers do with folders

For example:

📂 All Encompassing Graph
  📂 Subgraph 1
    📂 .logseq
    📂 Journal
    📄 Page.md
    ...
  📂 Subgraph 2
    📂 .logseq
    📂 .journal
    📄 Page.md
    ...

and hide relative paths from non-editing mode in Logseq i.e.

[[./journal/2022-07-22.md]]

is displayed as

22 July 2022

GaiusScotius · August 11, 2022, 8:43am

I’ve not long come across Logseq and am interested in it for both knowledge and project management in the professional services (in my case, legal) arena. I posted some thoughts here on structuring information/ notes for easier (re)discovery using concept tagging, but think that perhaps this thread is now more appropriate. Comments welcome!

brsma · August 16, 2022, 7:41pm

See my reply in the related thread: Knowledge Management for Tags / Tag Hierarchies - #19 by brsma

I have still trouble understanding what is your concrete use case for such an extensive hierarchical taxonomy? How do you intend to make all these tags actually productive? (see also The Collector’s Fallacy • Zettelkasten Method – I had some sobering ‘ouch!’ moment reading that… )

GaiusScotius · August 17, 2022, 12:06pm

Your comments are well taken, but I would differentiate between the ability to build a rich classification scheme and actually deploying one at a scale larger than needed. I like the generality of the WordNet metamodel, but that doesn’t mean I want or need to load all 170,000+ entries into my graphs.

As far as a use case is concerned, I’m thinking of knowledge management in the typical SME professional practice, particularly law. There is, I think, a bright future in this area for tools like Logseq by virtue of the fact that they are text based and can readily be distributed using Git or the like. For an industry that purportedly relies on knowledge, “knowledge management” remains pretty problematic. I have experienced enough failures to appreciate that the biggest problem with current systems is that of “not knowing what you know”, a problem that grows with time and scope. Put another way, discovery is a key issue.

Systems like SharePoint and its ilk allow one save documents in folder hierarchies and to build controlled and uncontrolled vocabularies with which to classify (i.e. to tag) them, but, at least in my opinion, they in fail in two ways. First, although not directly related to this discussion, they don’t support document linking to the degree that Logseq (or on macOS, Hook) does. Second, which is relevant, educating users into how taxonomies are structured and how they should be used is a real difficulty and making them do it consistently even more so. What makes it so difficult is the “discount rate” we apply to our time. How much effort am I prepared to invest now so that you can find something later, perhaps years later. For most people indexing as you enter is an adjunct to their role, it’s just not worth a significant investment of their time, especially when you’re being evaluated on the basis of billable hours. For academics and researchers that’s clearly not the case, their role is to link, classify, assimilate and derive, but I would suggest that for the majority of knowledge workers the value to them of future discovery by others is rather small.

I’m my mind the trick to knowledge discovery is to make life as easy as possible for those entering information. Don’t try to force users into sticking to a rigidly defined vocabulary; don’t get in their way; by all means guide them (say with properties), but accept anything that they think appropriate.

Generally the difficult question is something along the lines of “what do we know about X? “. Here X is typically a concept, not a specific thing like the contract we wrote for some client. It’s the sort of question that library indices are designed to facilitate, albeit for physical information that is not directly searchable…

Digital information has the advantage that it can be directly searched using full text indexing. Full text searches fail, however, if the word(s) that I use to envisage X differ from those actually in the document. The same applies to tagging. The word or brief phrase that comes to my mind when I think of a concept is not necessarily that which you used when you tagged the document years ago. It is highly likely, however, that whatever word(s) I think of will be a synonym of that you actually used.

That, however, requires a means of equating terms/ phrases to concepts, which is where WordNet synsets come in. As far as hyper- / hyponyms are concerned, I view them as enabling a sense of scale, of zooming in or out. I can start with a high level concept and quickly refine a search by looking at what hyponyms have actually been used, or if I happen to start with a hyponym that hasn’t been used, pull back to a hypernym and see which of its hyponyms have been used. Other linguistic relationships provide different ways of navigating the search space.

Hope this explains better!

brsma · August 17, 2022, 2:18pm

Thanks a lot! Actually I was asking @gax for clarification (whose use case seems to lie more on the academic side of knowledge work), but I consider yours very interesting. At least to me, being on the business side of (P)KM, as well and facing at least partly similar challenges (one of my major professional concerns besides organising my own work as a manager: enabling cross-functional business intelligence across my teams in order to help everyone make the best decisions). Yet, law with its large cultural history of elaborate reference systems is still a different kind of beast compared to (digital) product.

educating users into how taxonomies are structured and how they should be used is a real difficulty and making them do it consistently even more so.

the trick to knowledge discovery is to make life as easy as possible for those entering information. Don’t try to force users into sticking to a rigidly defined vocabulary; don’t get in their way; by all means guide them (say with properties), but accept anything that they think appropriate

+1 to both of that. (And to your observation re: puzzling lack of digitisation in knowledge-based industries)

At the same time I would like to propose that fuzzy information retrieval is more of a software problem rather than something to implement manually in your graph. DEVONthink (my universal vault for all kinds of documents and reference materials) is quite good at surfacing related documents, for example. Besides, your last paragraph reads like the informal description of an algorithm that shouldn’t be this hard to implement using WordNet’s API or a similar service in other languages. Add a smart user interface and you have a nice product for legal services that should even be commercially quite viable (assuming this is a common pain point). As for Logseq this actually could be a nice search plug-in.

gax · August 17, 2022, 4:51pm

Thanks @brsma for your comments! The quote about the downsides of tagging is worth reading:

Extensive content-based tagging is a known anti-pattern because tags create a weak association at best between notes.

By using content-based tags you are making yourself feel that you are creating associations but you are still really shifting the burden to your future self to figure out why the notes are associated.

This becomes a significant problem when you have a larger corpus and your tag ontology begins growing beyond control.

Once you decide to tag based on subject you have to keep expanding the subjects you tag.

Then every time you add a tag later you have to decide if you will go back and re-tag all applicable prior notes, which quickly becomes untenable.

But if you don’t do that then your tagging system becomes untrustworthy, because it is not returning all notes that it should, so you start developing workarounds to compensate for the faulty tagging system, which increases the friction of using the system.

To overcome these limitations, we need (poly)hierarchical tags that can be edited independent of individual notes.

My use case for tagging is classification of knowledge (instead of building relationships between the content at a block level, which comes at a later stage).
I have many thousands of items (e.g. articles, books etc.) that need to be classified by subject areas. Luckily in the case of articles, they fit nicely into a small group of polyhierarchies.

Backlinks are great for building relationships between individual blocks of these pages (e.g. a fine-grained link between items, such as “cites: [[other article]]”, “contradicts: [[other article]]”), but they are not good for sorting the knowledge on a large scale.

An additional advantage of tags over backlinks (for classification vs. linking) is that most (all?) fields of research already have well established classification hierarchies. When a biologist talks about cats, it is clear where such page will sit in the biological taxonomy. It is also easy to develop limited taxonomies for the purpose of a project.
This makes sharing of knowledge easy, any biologist will find it very natural to browse a graph based on the well-known taxonomy of animals. Also, a new student could e.g. import a scheme from someone else and use it to get started in his own research.

Taxonomies can be edited, exported, imported, shared without touching the individual pages, which addresses many of the issues in the quote above.

A lot of effort has been put into existing ontologies, and articles and books have typically been classified, so a large fraction of Logseq pages could be automatically organized without touching any of the pages themselves.

Logseq could easily build a searchable classification just by pulling subject classifications from library catalogs, e.g. in the Unsinkable Sam case. As @GaiusScotius suggested, this does not need to be limited to simple taxonomies, but can be extended to general ontologies like WordNet.

brsma · August 17, 2022, 4:57pm

I see. Wouldn’t that be much easier using a more traditional relational database rather than a graph?

gax · August 17, 2022, 5:39pm

[quote=“brsma, post:47, topic:8327, full:true”]

I think that both approches would work together very naturally. I really like Logseq’s concept of graphs built by links and backlinks for working with information on a block-level. This is definitely the way to go to refine the information. It wouldn’t make any sense to use tags for this (I think that was one of the criticisms of the articles you linked). Once you get to the level of collating and synthesizing information, links are far superior to tags. One can also push this further into the link direction and augment individual links with information, as suggested by @menelic.

So I don’t want to get rid of the graph at all, but I want to augment the graph with some structured hierarchical way to get to the relevant nodes.

I have about 10k items in my literature database, I would love to import this into Logseq to be able to leverage Logseq’s graph, but importing the 10k items would lead to a graph that looks like the Milky Way and be pretty much unusable. On the other hand, the items in Zotero are already heavily tagged and sorted into Zotero Collections, and as nearly all of them are library items we can get a lot of data for free from existing databases. These tags and collections also come with a hierarchy, but currently Logseq can’t use this information.
Even without doing any tagging this could be used to build a very impressive browser to get to specific locations on the graph.

brsma · August 17, 2022, 6:12pm

Would you really need to import the data into your Logseq graph or might it actually be sufficient to just reference the items in Zotero easily? Especially given that with all the available integrations, the interoperability, etc. you would probably still keep them in there. ⇒ Which is your single point of truth?

From what you are writing so far it seems to me that you might rather want to bridge Logseq and Zotero instead of duplicating your literature database to your graph (and, moreover, keeping it in sync – while trying to stay sane ;)).

What if you thought about it more from an outcome perspective rather than from a data/artefact perspective? While I start to get a rough picture I still do not fully understand how you are going to use the literature data in Logseq. What’s the purpose of e.g. “get(ting) to specific locations on the graph”? What’s the context in which that would be useful and how so? What happens then?

I feel that when you manage to clarify your intent and make it as concrete and tangible as possible, the solution should yield itself easily

gax · August 17, 2022, 7:03pm

Great discussion, thanks for your comments!
This is the workflow I am targeting, more in-depth discussion is at Scientific Workflows with Zotero (currently this doesn’t work, due to relative paths not working):

I capture items into Zotero (mostly journal articles and books with attached documents).
Each item comes with lots of tags, both my own and automatic, but I’ve also heavily sorted items into hierarchical Zotero Collections (these are 1:n relationships, one item can be in multiple collections)
I annotate the item. Currently I do so in Zotero, but I would like to switch to Logseq to use a link graph because Zotero’s annotation mechanism, which has only tags at the page level, seem to be exactly what you are advocating against.
Once there is a way to import notes from Zotero to Logseq, I’d be happy to stop using Zotero notes to stay sane, but of course it would be wonderful if we get to the point of a 2-way integration.
In Logseq, each Zotero-item would have a single page associated with it, this page holds the annotations copied from the pdf, screenshots etc.
Not all data ingress in from Zotero, I might also create pages for conferences, videos etc. directly.
Once I have extracted the important information (one page per article), I create other pages as needed and heavily link back to the original pages. These pages could be for a subject area, a topic, or for an article that I am writing. These pages heavily rely on the grap for linking between different blocks.
All of the pages, both imported and manually created, very naturally sit in well-defined natural hierarchies that are meaningful to me as well as to other people in the same field.
I would like to browse the graph by these hierarchies. In the Knowledge Management for Tags proposal, I gave the following example for searches that can be automatically generated from library records:
- - /Books/ByAuthor/Jameson/William
  - /Books/ByYear/2004
  - /Dewey/History and geography/History of Europe
  - /LCC/World History and …/History (General)/World War II … /Naval Operations/Anglo-German…
  - /animals/…/…/Mammalia/…/Felinae/…/F. catus
I would also like to use my existing tags and collections. Zotero collections are (poly)hierarchical tags, but Logseq doesn’t yet understand that one tag can be a generalization of another. For example, selecting “animals” in Zotero will shows all sub-collections, e.g. mammals and cats, but Logseq can’t use this information.
Also, I would like to edit tags and their relationships independently of the locations where they are used. I have plenty of automatic tags in Zotero that I still need to classify, I would like to be able to place these tags onto my hierarchy so that I can easily reference the pages and blocks that are thus tagged, including future imports that use these tags.

Should I import all my existing information from Zotero or just reference it? That is a difficult question, but it is not material to my issue. I already have a huge amount of notes in Zotero, and it would be nice to eventually get this information into the Logseq graph, but even if I just add the items as I go through them one at a time I would very quickly run into the same problem. Currently @Aryan’s Zotero plugin is set up is to create a (Logseq) page when a Zotero item is cited, so referencing already is importing. I think this is a reasonable solution.

mnp456 · July 6, 2023, 11:08am

This topic appears to be somewhat related to the concept of Hierarchical Navigable Small World Graphs (HNSW) used for Approximate Nearest Neighbor Search in vector databases. Given an item how to quickly traverse a graph hierarchy to find the most similar items. Bothe the graph and the hierarchy are required.

Janeer · July 7, 2023, 1:32pm

I believe that a rich commitment to hierarchies and classification can actually enhance the Logseq culture. Also I like that users can adapt Logseq to their own preferences and needs.

gax · July 7, 2023, 9:48pm

@mnp456 Thank you for the article! I understand that the HSNW is first an algorithm for efficient calculation of approximate k-nearest neighbors.

Do the individual layers have any meaning to the user as well, e.g. would it be useful to manually navigate this graph?

Zyrohex · July 11, 2023, 6:50pm

You should give remnote a try. I was a long time Logseq user for about 2 years, but I struggled daily with the no hierarchy and no visual representation of the structure of my knowledge from a high level (e.g. breaking down highly complex systems into their simpler concepts).

The biggest difference is Remnote treats everything as a rem (equivelant to a block in Logseq). Rem’s can be turned into a page and if a page contains pages inside of it, it becomes a folder. The contents of your pages and folders are displayed inline so there’s no having to open a page to see the contents, plus you get the same functionaltiy of bidirectional and tags you get in Logseq, except it separates the two in the reference panel so you see things that are “tagged” vs things that are “bidirectional links”.

Each rem is also unique in its position in the outline so this means you can have two rem’s with the same name but have completely different meanings based upon where they lie in the structure:

So the end result is you can come up with some fairly large structures that give you the birds eye view of the entirety of your knowledge graph, or narrowed down by clicking the bullets to focus on specific branches.

Plus remnote supports offline graphs so your data is on your system. It is in a DB file so you will have to export it out to markdown or some other format if you plan to use it for other stuff but overall I have been very happy with it.

alex0 · August 20, 2023, 5:57pm

@gax @boisjere

Check this new plugin:

This might be one of the best Logseq plugins ever: it organizes favourites hierarchically using the tags:: property for pages.

The hierarchy can be mixed with the built-in one based on namespaces.

Basically it’s poly-hierarchies that expand the namespace mono-hierarchy.

Ngungu · August 21, 2023, 9:54am

There is also Generate explicit hierarchy out of properties by @mentaloid.