Would a rich commitment to hierarchies and classification be an anathema to Logseq culture?

I think we’ve been talking past each other. I agree with @alex0’s suggestion for adding tree and graph searches and for storing the information.
I did not think about designing hierarchies efficiently, which @boisjere’s program can do.
For now, I’d be happy to even design the hierarchy by hand in Markdown.

I was solely talking about efficiently classifying existing nodes. While theoretically this can be done on each individual page, it is too cumbersome to classify hundreds or many thousands of nodes this way.

What I would like to see is this:

The process should be designed to be as efficient as possible, such that it is realistic to sort a few thousand yet unclassified tags into the hierarchy.

3 Likes

If I understand it correctly you want to be able to select multiple pages (from the graph or from a list) and assign/remove the same properties at the same time. This could be useful in general and another feature request.

Also, the advantage of using text files to store data is that you can write scripts or entire programs to generate or manipulate them programmatically so you could manage properties/tags with another tool.

1 Like

Exactly. I don’t worry about the taxonomy generation, there are a couple tools available and in the worst case in can be done in a text editor. In contrast to the number of tags, this is a small number.

What I would like to see is:

  • a way to efficiently add properties. I think it makes sense to have this in Logseq UI to leverage the existing search functionality on the graph. It should have a (poly) hierarchical representation of the nodes on the left, so that we don’t just move items between random tag clouds. It would be easy to parse the properties from the pages, but at the moment I can’t think about a more efficient way than something similar to the UI I suggested. I am open for suggestions.
  • tree search, as suggested by you. This also needs to be a core part of Logseq.
1 Like

I’m getting indications that it is a bit contrary to the culture. The coolest feature about Roam and Logseq is bidirectionally linked block-based graph traversal. Early evangelists are super-excited about this, and kind of dump on earlier (millennia old) ways of organizing knowledge.

I get it as a way of highlighting what’s new, but I think the baby’s going out with the bathwater.

The main evangelist who is trying to balance both the old and the new ways of managing information is Nick Milo. He loves minimal foldering, Maps of Content, and Datascopes, as ways of maintaining enough high-level knowledge about your content to enter the graph with some positional awareness.

For those of us who need that initial moment of orientation, doing pure block-based graph traveral sucks, quite frankly.

Early evangelists are full of excitement about the moment they “got it”, and they’re trying to help everyone else “get it”, but aren’t widening the funnel much for those of us who prefer to do things a bit differently.

Historically, this has been very associated with the culture around Roam, but it happens a bit around Logseq too.

I’m the kind of person who is very mentally clear at the 50,000 ft overview level of my information, and I need to orient myself to my knowledge landscape before I decide where to land and hunt. But early evangelists of block-based PKM tools are sometimes most excited about ruling that out.

I want the new ways of exploring information, but I want to choose my starting place, somewhere besides the Daily Journal, or an un-indented list of Favourites.

So my question remains genuine. I’m not sure if a rich commitment to hierarchies and classification is kosher, for Logseq’s developers.

  • Is it an exciting way to differentiate themselves from Roam and support a wider market of users?
  • Or is it an irritating distraction and people like me should go and struggle with Obsidian
  • Obsidian isn’t outline-based, but which has a culture supporting soft hierarchies of knowledge

I decided to stop looking for tools, and chose Logseq, because of that “Hierarchies” section on a page. It felt like I was “coming home” to a tool that combined the ancient magic of classification with the new magic of graph traversal.

But I fear that the “Hierarchies” feature will be considered to be an aberration that they will leave to wither and then phase out - ironically because people may use it too much, the wrong way.

Not everyone will have that “conversion experience” to pure block-based graph traversal, and they’ll choose to focus on the true believers, and not the larger masses.

I anticipated this kind of issue. I wanted to know I could discuss hierarchies and classification with some clarity and energy. I didn’t want to disrupt this community, if in fact I was wrong about what Logseq plans to become.

So I’m still unsure if this is the place for me. It’s a bit sad, because I don’t want to be in “tool-choosing hell” again… but I don’t feel good when my needs are deligitimized (Different ways to structure data - #27 by boisjere) - especially because I love this tool, and I think my needs reflect those of future users, farther along the technology adoption curve, “across the chasm”.

3 Likes

If I had to design this from scratch:

  • every folder in our file system would potentially be a “graph”, even nested in each others
  • the info about a certain graph would be saved in an hidden .logseq folder in every folder that is a graph
  • it would be possible to open every Markdown file with Logseq and when doing so it creates a graph starting from that folder (creating the .logseq one)
  • mentioning pages between different graph could be possible using relative links (../ syntax to go up)
  • journal would be a plugin to manage one or more specified folders/graphs (even hidden ones inside the ones above)
  • Logseq wouldn’t be set to “one graph” for each instance, instead it would manage graphs like file managers do with folders

For example:

📂 All Encompassing Graph
  📂 Subgraph 1
    📂 .logseq
    📂 Journal
    📄 Page.md
    ...
  📂 Subgraph 2
    📂 .logseq
    📂 .journal
    📄 Page.md
    ...

and hide relative paths from non-editing mode in Logseq i.e.

[[./journal/2022-07-22.md]]

is displayed as

22 July 2022
3 Likes

I’ve not long come across Logseq and am interested in it for both knowledge and project management in the professional services (in my case, legal) arena. I posted some thoughts here on structuring information/ notes for easier (re)discovery using concept tagging, but think that perhaps this thread is now more appropriate. Comments welcome!

1 Like

See my reply in the related thread: Knowledge Management for Tags / Tag Hierarchies - #19 by brsma

I have still trouble understanding what is your concrete use case for such an extensive hierarchical taxonomy? How do you intend to make all these tags actually productive? (see also The Collector’s Fallacy • Zettelkasten Method – I had some sobering ‘ouch!’ moment reading that…:see_no_evil: )

Your comments are well taken, but I would differentiate between the ability to build a rich classification scheme and actually deploying one at a scale larger than needed. I like the generality of the WordNet metamodel, but that doesn’t mean I want or need to load all 170,000+ entries into my graphs.

As far as a use case is concerned, I’m thinking of knowledge management in the typical SME professional practice, particularly law. There is, I think, a bright future in this area for tools like Logseq by virtue of the fact that they are text based and can readily be distributed using Git or the like. For an industry that purportedly relies on knowledge, “knowledge management” remains pretty problematic. I have experienced enough failures to appreciate that the biggest problem with current systems is that of “not knowing what you know”, a problem that grows with time and scope. Put another way, discovery is a key issue.

Systems like SharePoint and its ilk allow one save documents in folder hierarchies and to build controlled and uncontrolled vocabularies with which to classify (i.e. to tag) them, but, at least in my opinion, they in fail in two ways. First, although not directly related to this discussion, they don’t support document linking to the degree that Logseq (or on macOS, Hook) does. Second, which is relevant, educating users into how taxonomies are structured and how they should be used is a real difficulty and making them do it consistently even more so. What makes it so difficult is the “discount rate” we apply to our time. How much effort am I prepared to invest now so that you can find something later, perhaps years later. For most people indexing as you enter is an adjunct to their role, it’s just not worth a significant investment of their time, especially when you’re being evaluated on the basis of billable hours. For academics and researchers that’s clearly not the case, their role is to link, classify, assimilate and derive, but I would suggest that for the majority of knowledge workers the value to them of future discovery by others is rather small.

I’m my mind the trick to knowledge discovery is to make life as easy as possible for those entering information. Don’t try to force users into sticking to a rigidly defined vocabulary; don’t get in their way; by all means guide them (say with properties), but accept anything that they think appropriate.

Generally the difficult question is something along the lines of “what do we know about X? “. Here X is typically a concept, not a specific thing like the contract we wrote for some client. It’s the sort of question that library indices are designed to facilitate, albeit for physical information that is not directly searchable…

Digital information has the advantage that it can be directly searched using full text indexing. Full text searches fail, however, if the word(s) that I use to envisage X differ from those actually in the document. The same applies to tagging. The word or brief phrase that comes to my mind when I think of a concept is not necessarily that which you used when you tagged the document years ago. It is highly likely, however, that whatever word(s) I think of will be a synonym of that you actually used.

That, however, requires a means of equating terms/ phrases to concepts, which is where WordNet synsets come in. As far as hyper- / hyponyms are concerned, I view them as enabling a sense of scale, of zooming in or out. I can start with a high level concept and quickly refine a search by looking at what hyponyms have actually been used, or if I happen to start with a hyponym that hasn’t been used, pull back to a hypernym and see which of its hyponyms have been used. Other linguistic relationships provide different ways of navigating the search space.

Hope this explains better!

1 Like

Thanks a lot! Actually I was asking @gax for clarification :wink: (whose use case seems to lie more on the academic side of knowledge work), but I consider yours very interesting. At least to me, being on the business side of (P)KM, as well and facing at least partly similar challenges (one of my major professional concerns besides organising my own work as a manager: enabling cross-functional business intelligence across my teams in order to help everyone make the best decisions). Yet, law with its large cultural history of elaborate reference systems is still a different kind of beast compared to (digital) product.

educating users into how taxonomies are structured and how they should be used is a real difficulty and making them do it consistently even more so.

the trick to knowledge discovery is to make life as easy as possible for those entering information. Don’t try to force users into sticking to a rigidly defined vocabulary; don’t get in their way; by all means guide them (say with properties), but accept anything that they think appropriate

+1 to both of that. (And to your observation re: puzzling lack of digitisation in knowledge-based industries)

At the same time I would like to propose that fuzzy information retrieval is more of a software problem rather than something to implement manually in your graph. DEVONthink (my universal vault for all kinds of documents and reference materials) is quite good at surfacing related documents, for example. Besides, your last paragraph reads like the informal description of an algorithm that shouldn’t be this hard to implement using WordNet’s API or a similar service in other languages. Add a smart user interface and you have a nice product for legal services that should even be commercially quite viable (assuming this is a common pain point). :slight_smile: As for Logseq this actually could be a nice search plug-in.

1 Like

Thanks @brsma for your comments! The quote about the downsides of tagging is worth reading:

Extensive content-based tagging is a known anti-pattern because tags create a weak association at best between notes.

By using content-based tags you are making yourself feel that you are creating associations but you are still really shifting the burden to your future self to figure out why the notes are associated.

This becomes a significant problem when you have a larger corpus and your tag ontology begins growing beyond control.

Once you decide to tag based on subject you have to keep expanding the subjects you tag.

Then every time you add a tag later you have to decide if you will go back and re-tag all applicable prior notes, which quickly becomes untenable.

But if you don’t do that then your tagging system becomes untrustworthy, because it is not returning all notes that it should, so you start developing workarounds to compensate for the faulty tagging system, which increases the friction of using the system.

To overcome these limitations, we need (poly)hierarchical tags that can be edited independent of individual notes.

My use case for tagging is classification of knowledge (instead of building relationships between the content at a block level, which comes at a later stage).
I have many thousands of items (e.g. articles, books etc.) that need to be classified by subject areas. Luckily in the case of articles, they fit nicely into a small group of polyhierarchies.

Backlinks are great for building relationships between individual blocks of these pages (e.g. a fine-grained link between items, such as “cites: [[other article]]”, “contradicts: [[other article]]”), but they are not good for sorting the knowledge on a large scale.

An additional advantage of tags over backlinks (for classification vs. linking) is that most (all?) fields of research already have well established classification hierarchies. When a biologist talks about cats, it is clear where such page will sit in the biological taxonomy. It is also easy to develop limited taxonomies for the purpose of a project.
This makes sharing of knowledge easy, any biologist will find it very natural to browse a graph based on the well-known taxonomy of animals. Also, a new student could e.g. import a scheme from someone else and use it to get started in his own research.

Taxonomies can be edited, exported, imported, shared without touching the individual pages, which addresses many of the issues in the quote above.

A lot of effort has been put into existing ontologies, and articles and books have typically been classified, so a large fraction of Logseq pages could be automatically organized without touching any of the pages themselves.

Logseq could easily build a searchable classification just by pulling subject classifications from library catalogs, e.g. in the Unsinkable Sam case. As @GaiusScotius suggested, this does not need to be limited to simple taxonomies, but can be extended to general ontologies like WordNet.

3 Likes

I see. Wouldn’t that be much easier using a more traditional relational database rather than a graph?

[quote=“brsma, post:47, topic:8327, full:true”]

I think that both approches would work together very naturally. I really like Logseq’s concept of graphs built by links and backlinks for working with information on a block-level. This is definitely the way to go to refine the information. It wouldn’t make any sense to use tags for this (I think that was one of the criticisms of the articles you linked). Once you get to the level of collating and synthesizing information, links are far superior to tags. One can also push this further into the link direction and augment individual links with information, as suggested by @menelic.

So I don’t want to get rid of the graph at all, but I want to augment the graph with some structured hierarchical way to get to the relevant nodes.

I have about 10k items in my literature database, I would love to import this into Logseq to be able to leverage Logseq’s graph, but importing the 10k items would lead to a graph that looks like the Milky Way and be pretty much unusable. On the other hand, the items in Zotero are already heavily tagged and sorted into Zotero Collections, and as nearly all of them are library items we can get a lot of data for free from existing databases. These tags and collections also come with a hierarchy, but currently Logseq can’t use this information.
Even without doing any tagging this could be used to build a very impressive browser to get to specific locations on the graph.

1 Like

Would you really need to import the data into your Logseq graph or might it actually be sufficient to just reference the items in Zotero easily? Especially given that with all the available integrations, the interoperability, etc. you would probably still keep them in there. ⇒ Which is your single point of truth?

From what you are writing so far it seems to me that you might rather want to bridge Logseq and Zotero instead of duplicating your literature database to your graph (and, moreover, keeping it in sync – while trying to stay sane ;)).

What if you thought about it more from an outcome perspective rather than from a data/artefact perspective? While I start to get a rough picture I still do not fully understand how you are going to use the literature data in Logseq. What’s the purpose of e.g. “get(ting) to specific locations on the graph”? What’s the context in which that would be useful and how so? What happens then?

I feel that when you manage to clarify your intent and make it as concrete and tangible as possible, the solution should yield itself easily :slight_smile:

Great discussion, thanks for your comments!
This is the workflow I am targeting, more in-depth discussion is at Scientific Workflows with Zotero (currently this doesn’t work, due to relative paths not working):

  • I capture items into Zotero (mostly journal articles and books with attached documents).
  • Each item comes with lots of tags, both my own and automatic, but I’ve also heavily sorted items into hierarchical Zotero Collections (these are 1:n relationships, one item can be in multiple collections)
  • I annotate the item. Currently I do so in Zotero, but I would like to switch to Logseq to use a link graph because Zotero’s annotation mechanism, which has only tags at the page level, seem to be exactly what you are advocating against.
  • Once there is a way to import notes from Zotero to Logseq, I’d be happy to stop using Zotero notes to stay sane, but of course it would be wonderful if we get to the point of a 2-way integration.
  • In Logseq, each Zotero-item would have a single page associated with it, this page holds the annotations copied from the pdf, screenshots etc.
  • Not all data ingress in from Zotero, I might also create pages for conferences, videos etc. directly.
  • Once I have extracted the important information (one page per article), I create other pages as needed and heavily link back to the original pages. These pages could be for a subject area, a topic, or for an article that I am writing. These pages heavily rely on the grap for linking between different blocks.
  • All of the pages, both imported and manually created, very naturally sit in well-defined natural hierarchies that are meaningful to me as well as to other people in the same field.
  • I would like to browse the graph by these hierarchies. In the Knowledge Management for Tags proposal, I gave the following example for searches that can be automatically generated from library records:
      • /Books/ByAuthor/Jameson/William
      • /Books/ByYear/2004
      • /Dewey/History and geography/History of Europe
      • /LCC/World History and …/History (General)/World War II … /Naval Operations/Anglo-German…
      • /animals/…/…/Mammalia/…/Felinae/…/F. catus
  • I would also like to use my existing tags and collections. Zotero collections are (poly)hierarchical tags, but Logseq doesn’t yet understand that one tag can be a generalization of another. For example, selecting “animals” in Zotero will shows all sub-collections, e.g. mammals and cats, but Logseq can’t use this information.
  • Also, I would like to edit tags and their relationships independently of the locations where they are used. I have plenty of automatic tags in Zotero that I still need to classify, I would like to be able to place these tags onto my hierarchy so that I can easily reference the pages and blocks that are thus tagged, including future imports that use these tags.

Should I import all my existing information from Zotero or just reference it? That is a difficult question, but it is not material to my issue. I already have a huge amount of notes in Zotero, and it would be nice to eventually get this information into the Logseq graph, but even if I just add the items as I go through them one at a time I would very quickly run into the same problem. Currently @Aryan’s Zotero plugin is set up is to create a (Logseq) page when a Zotero item is cited, so referencing already is importing. I think this is a reasonable solution.

2 Likes

This topic appears to be somewhat related to the concept of Hierarchical Navigable Small World Graphs (HNSW) used for Approximate Nearest Neighbor Search in vector databases. Given an item how to quickly traverse a graph hierarchy to find the most similar items. Bothe the graph and the hierarchy are required.

1 Like

I believe that a rich commitment to hierarchies and classification can actually enhance the Logseq culture. Also I like that users can adapt Logseq to their own preferences and needs.

1 Like

@mnp456 Thank you for the article! I understand that the HSNW is first an algorithm for efficient calculation of approximate k-nearest neighbors.

Do the individual layers have any meaning to the user as well, e.g. would it be useful to manually navigate this graph?

image

You should give remnote a try. I was a long time Logseq user for about 2 years, but I struggled daily with the no hierarchy and no visual representation of the structure of my knowledge from a high level (e.g. breaking down highly complex systems into their simpler concepts).

The biggest difference is Remnote treats everything as a rem (equivelant to a block in Logseq). Rem’s can be turned into a page and if a page contains pages inside of it, it becomes a folder. The contents of your pages and folders are displayed inline so there’s no having to open a page to see the contents, plus you get the same functionaltiy of bidirectional and tags you get in Logseq, except it separates the two in the reference panel so you see things that are “tagged” vs things that are “bidirectional links”.

Each rem is also unique in its position in the outline so this means you can have two rem’s with the same name but have completely different meanings based upon where they lie in the structure:

So the end result is you can come up with some fairly large structures that give you the birds eye view of the entirety of your knowledge graph, or narrowed down by clicking the bullets to focus on specific branches.

Plus remnote supports offline graphs so your data is on your system. It is in a DB file so you will have to export it out to markdown or some other format if you plan to use it for other stuff but overall I have been very happy with it.

1 Like

@gax @boisjere

Check this new plugin:

This might be one of the best Logseq plugins ever: it organizes favourites hierarchically using the tags:: property for pages.

The hierarchy can be mixed with the built-in one based on namespaces.

Basically it’s poly-hierarchies that expand the namespace mono-hierarchy.

4 Likes

There is also Generate explicit hierarchy out of properties by @mentaloid.

1 Like