Zotero referencing is ballooning up my graph

Hi all,

I have really been enjoying the ease of the new Zotero functionality for academic work when taking notes. However, the nature of the tagging is now blowing up my graph:

Whilst this looks all well and good from the birds eye view, zooming in one sees that there is an explosion of what I think are irrelevant nodes/nodes automatically added during referencing:

Clearly this is due to the amount of active tags that are made when referencing an item from Zotero:

Does anyone have any suggestions how to tame this such that I only/mainly get my own created tags?

Thanks in advance!

5 Likes

Did you ever figure out a fix? I’m having the same problem with irrelevant auto-generated pages - including for pages created for the “rights” field.

2 Likes

Hey - no I didn’t have a chance to play around. In the end I decided to add things manually rather than using the Zotero import. Perhaps in the future the devs could expand the Zotero import settings to include which fields to include, seeing as there are already options for attachments etc.

Edit: I made a thread in the feature request forum Expanded Zotero import selection

2 Likes

I think in default mode, Zotero automatically add tags to your articles (items in the tags:: in your example. To me this is pretty useless and I prefer to add I own tags so I turned off this feature. This should removed quite a lot of tags when you add article to Logseq.
There’re still many other extra information such as a lot of authors, etc. And need an official fix from Logseq team. This has been requested before

3 Likes

I just confirm I encounter (and suffer from) the same problem. Some Zotero papers/books contain tags entries. All these introduced new pages in Zotero. Visualizations of my graphs are now essentially useless.

But it is not only the explicit tags. The heaviest nodes in my graph are now Abstract and Attachments and alike.

This issue is bothering me indeed, that is why I am trying to refresh this thread.

Automatically imported (and mostly unuseful) tags within Zotero are the culprit

Obviously one of the culprits is the tags withing the Zotero record for the given publication. Upon referencing this publication in Logseq through \zotero, its tags are now within Logseq workspace and clutter the graph heavily.


An easy fix for using Logseq from now on is to delete the imported tags in the publication records (still) in Zotero (before importing them to Logseq). Mostly they are useless because very trivial. Even if somebody wants to use tags in Zotero, it is more useful to come up with one’s own tags.

But not only them

But this will not solve all the problems because the heaviest nodes in my Logseq graph are now Abstract, Annotation and alike. Authors’ names also come frequently as the (undesirable nodes). I cannot think of a way to prevent importing these as Logseq pages/nodes.

How to remove the mess (the automatically created but unwanted pages) safely?

Anyway, whether imported into Logseq as Zotero’s tags or something else, is there a systematic (even if manual) way to get rid of these from my Logseq graph? With a few hundred nodes, this is still doable. Shall I just manually delete the corresponding pages from the pages listing in Logseq?

1 Like

It’s an issue well-know and heavily discussed but as far as I know (really little, I’m just an active user) there will b no update of the zotero integration for the moment. I think they will improve it at some time.

I’ve tried to use it as well. I’ve started by deleting manually all entries that I did not wanted. But at the end, I finally found it useless. I’m using Zotero only to manage citations and bibliography. So when I do a publish, I collect at the end all the doi or whatever and then add it to my longform (using overleaf or zettlr). To not loose any footnote on the way if not write at the moment where I think of It, I do it directly on logseq with the footnote markdown hability and a bibkey citation.

Example :

Something is about [^1]

[^1]: about is a work [AuthorDate, chapter, page]

Like that when I move all to zettlr it can automatically links those bibtext entries to my zotero database, creating my bibliography. Then, if need it, I export to latex and finish all on overleaf.

To resume, I will start my research by adding in zotero all stuff that can be usefull.
Then on logseq I’ve created my own template and database of entries that regroup all the stuff that I effectively read and annotate.
When I write, I use bibtext and footnotes on logseq.
At the end, I select on Zotero what I’ve effectively use and export all to Zettlr to do the layout.

1 Like

You have certainly made me rethink what I expect from the interconnection between Logseq and Zotero. Perhaps not that much. I certainly do not plan to use Logseq for writing a paper, not even a draft of a paper (for that I will resort to a favorite LaTeX editor with all its bells and whistles). Instead, I (currently) use Logseq just to make notes while learning new stuff – a personal knowledge management system indeed.

And while writing manually something like (Luenberger, 1969) will mostly work fine for me for this purpose (well, sort of), the possibility of typing /Zotero followed by Luen (and relying on Zotero’s search within my collection) is quite convenient. And the results are consistent and full details of the publication (for the purpose of unique identification) are available, just in case the short citation does not ring a bell in a year or two.

Maybe just using just the citation keys produced by BetterBibtex to identify the citation uniquely with the record in Zotero collection (the path provided by Zettlr) would be perfectly sufficient for my purpose. I actually do not need to import all those details of the publication into Logseq workspace and create pages corresponding to the authors, keywords and whatsoever. I just wanted to identify the referenced work uniquely (and conveniently).

After investigating this a bit, I have found solutions to some problems at least. Not all.

First, Zotero can be configured not to add tags automatically. This option can be set in the General settings tab. By default it is on (unless I set it on by myself and forgot it already).

Second, if some publications in existing Zotero collection contain some tags already, and if these were created automatically and you are not happy with them (having used Zotero for many years on a daily basis I have never found them useful but they just did not bother me too much), you can delete them, see the paragraph on Automatic Tags at the bottom of the page on Tags and Collections.

Third, surprisingly the previous procedure aimed at deleting the tags will (probably) leave some tags in the collection. Obviously not all tags qualify as automatic and the procedure only deletes those tags that were created automatically. Honestly, I do not understand this issue at all because I have never created a single tag in Zotero manually. But obviously Zotero makes some further distinction among the tags (I do not know, maybe tags extracted from the PDF and tags obtained from the web page). Anyway, those remaining tags can be deleted manually. This procedure is also described in the above linked page. But it is as simple as right-clicking on the tag and chosing Delete Tag from the context menu.

Fourth, the annoying limitation of the above procedure is that it only works on a single tag. With a few hundred tags in the whole collection this would be a nightmare. And no, selecting all tags is not currently possible in pure Zotero. There is, however, an extension to Zotero called Zutilo, which enables choosing multiple tags. It is just that the corresponding functionality must be explicitly enabled (by default most functionalities are disabled or actually hidden). The functionality is called Remove Tags. Select all papers in the Zotero collection, right-click and choose Remove all tags… Done.

So, all these are steps on the Zotero side, before you actually import some reference using /Zotero command. But what if you have already imported quite a few Zotero references into your graph (as I did), which cluttered the space with all those pages such as Attachments and Abstract and paper and alike? Here, on the contrary I only have questions.

Will it suffice to remove the double square brackets from the pages created for the imported publications? For example, instead of [[Abstract]] leave just Abstract? Is this all that needs to be done?

I am tempted to automate the process of stripping all those files started with @ (corresponding to the pages created for the imported references) of the left and right double brackets (using sed). A single-line code and I am done. But cannot I destroy my Logseq graph/database when editing these files directly using an external tool? In particular, I do not quite understand what the role of pages-metada.edn file is. But certainly with a few dozens of imported references I prefer avoiding the need to edit all those pages manually in Logseq.

1 Like

To me the solution, on paper, is simple. We need to have the possibility to create our own template with precisely what we want to import and olny this.

The main problem to me having use Zotero, is that is messing up with my properties and lead to some ugly query table.

That’s why, I have prefer to finally not use it, and to have only a template for literature note that I will manually fill quickly when I add a reference in my logseq.

For each reference I will automatically add z litterature note even if no time to write my own thinking in detail, and if it’s just there as ref. I have status prop, where I will say if I have to dig it up more.

Page metadata file is very important. It contains the id of each block you created and is logbook. Tha ability we have to change a page title without effort to do it for each link, is in part because of him.

But in worstcase scenario, if you loose it, logseq is able to create a new juste anylizing your graph. So it’s not very big deal to mess with it, if you’re not playing with your notes at the same time

What you can, it just to extract to another folder those pages where you want to do a batch op. If you remove the bracket, it will then be unlink and connection will disappear on graph view, but not the page that has been created before, you will need to do it too. More, if after “abstract” (as example) you have two date separated by a coma, it will treat those information as new paged ank link your note to them.

2 Likes

Should anyone find themselves in a similar positions as myself, that is, desiring to remove the tags from the Logseq pages created by importing references from Zotero, and to replace all links to pages by pure text (for example replacing [[Abstract]] by Abstract), here goes my usage of sed command (from within the corresponding pages (sub)directory):

find . -type f -name "@*.md" -exec sed -i 's/\]\]//g' {} +
find . -type f -name "@*.md" -exec sed -i 's/\[\[//g' {} +
find . -type f -name "@*.md" -exec sed -i '/tags/d' {} +
find . -type f -name "@*.md" -exec sed -i '/tag/d' {} +

The graph now looks much more reasonable. Of course, I still have to delete the undesired pages like Abstract that have already been created, but I think there is no way to automate it, and I will just go through my graph and delete these manually.

3 Likes

Thanks all for digging into this and for your suggestions. I also decided to go about making pages manually in the end, rather than relying on the Zotero import shortcut, but this of course adds extra work.

This seems extremely useful in clearing up the clutter, thanks for sharing!

I am still quite confused. As described in my posts above, I have deleted the page links in the bibliographic records created automatically during import from Zotero. This way I replaced [[Abstract]] with Abstract, for example. But I also removed the links from the names of the authors as listed within the authors property. Finally, I manually deleted pages corresponding to the authors. I did this step within Logseq. So far so good.

Now after switching between two synchronized computers I did refresh and reindexing and looking into the full list of pages in Logseq, the pages corresponding to authors are back there.

At first I thought that this must be some sync issue (I am actually struggling with syncing too), but I checked the pages corresponding to the imported publications and they seem to stay stripped of the links as I wanted. So I removed the pages generated for the authors, but after a while they are back again. As viewed in Logseq.

Apparently I am still missing some understanding of Logseq. If I now look into the corresponding directory/folder on my computer where Logseq data reside, there are no files corresponding to the authors. But within Logseq the All pages gives me a list that does contain the authors.

In fact, in this list all the authors’ pages have at least 1 back link. And these seem to go from the publication record. But now comes the (another) confusing part. See the two screenshots below.

In the editing mode it is obvious that the authors property in the publication record is stripped of the double square brackets [[ ]]. Good. But as soon as I switch to the display mode (or whatever it is called) just by clicking elsewhere, the authors are displayed “clickable”.

How come?

To summarize, I have two questions:

  1. How come that in the list of pages in Logseq (clicking All pages), some pages are displayed but no corresponding files are created in the directory (as viewed using a system browser).
  2. How come that some block contains a property authors followed by a simple unlinked text, for example authors:: Stephen J. Wright, Benjamin Recht, and yet their names are clickable (and corresponding author pages are listed as in the question 1)?

I ended up figuring it out.

Having the following structure : someword:: thing, morethings, evenmorethings will interpret someword as a property and thing, morethings and evenmorethings as its value. The “[[” and “]]” are actually optional.

My solution was to replace via sed the “::” by a “:”.

That will do for the time being.

1 Like

It is actually related to this: Automatically created pages from the keys in block properties?