Scientific Workflows with Zotero

gax · June 18, 2022, 7:23pm

I’d like to start a discussion on possible Logseq-Zotero workflows for scientific research.

The goal is to work towards a complete workflow that leverages the strengths of Zotero, Logseq, and a word processor. The first question is where to draw the boundaries between the different programs.

These are the typical steps that a researcher might follow:

Capture references
- Collect articles by searching Scopus, Google Scholar etc.
- Zotero is excellent, it is unlikely Logseq will be able to compete with the Zotero connector machinery
Manage references
- Maintain a database of articles
- Zotero seems to be the standard, even though others might prefer JabRef or similar
- This is not a space for Logseq to compete in
Annotate documents
- Read individual documents and add notes, capture screenshot, images etc.
- Zotero 6.0 added a great PDF reader and note editor, which still has some limitations
  - Math formulas are not supported
  - Only pdf is supported, no epub, html, djvu
  - Code snippets are not supported
  - There is no linking/referencing system that comes anywhere close to
    Logseq’s capabilities
  - This is nearly a draw between Zotero and Logseq, but Logseq has a slight edge:
    - Zotero has the advantage of closer integration with the literature database
    - Logseq has the edge with respect to annotation and information management
    - Zotero is not very open, annotations are stored in a database and currently there is no easy way to export them
    - If Logseq was to provide more formats (epub, html, djvu), it could be far superior
Assemble information
- Combine information extracted from multiple individual documents
- Add own research notes
- Logseq was designed for this and is vastly superior.
- It is highly unlikely Zotero will ever be competitive in this space
Outline new article
- Create an outline of a new article
- Similar to 4., but some differences
  - Needs ability to easily reference external materials, own diagrams etc.
  - Export of content to next stage needs to be seamless and not lose any information
  - While Logseq is an amazing outliner, export is not perfect. Need an easy way to copy and paste outlines into Word, including images and references. Ideally Logseq would export a .docx file with the reference information stored in field codes (for Zotero bibliographies), or as \cite{} fields (for BibTeX).
- Candidates for outlining are Logseq and Word.
Write articles
- Currently most people are using Word and LaTeX
- Many constraints exist to fit into existing workflows (Templates from publishers, coworkers not used to other formats, need Word collaboration features etc.)
- While there are some attempts for scientific writing in Markdown (see e.g. Scientific Writing with Markdown | Jaan Tollander de Balsch), formatting requirements (footnotes, references, templates, typesetting) go beyond capabilities of basic Markdown
- For many fields, Word (or LaTeX) will remain the default option for a long time

How to split workflow between Zotero and Logseq?

The first big question is where to switch from Zotero to Logseq in the workflow. Zotero is superior for collecting and managing references (1. and 2.) and Logseq is superior for annotation and information assembly (3. and 4.).

While Zotero now has a solid annotation feature, I think it makes sense to annotate in Logseq instead, as this allows to seamlessly include the annotation in other documents, which would not be possible in Zotero.

Has anyone done an in-depth comparison between Zotero and Logseq PDF annotation? Are there any downsides of Logseq?

How to transfer data from Zotero to Logseq?

The next question is how to integrate Zotero and Logseq for a workflow that uses Zotero for collecting and managing references, and Logseq for annotating documents.

Options for integrating Logseq with Zotero and other reference managers:

Loose integration through files: zotero writes a .bib or .csl-jason file and logseq opens these files for citing
- Advantages
  - Simple, automatically updated export to files has already been implemented in BetterBibtex
  - Loose coupling with Zotero, if Zotero is down everything still works
    - Would also work with JabRef and other reference managers
- Disadvanteges
  - No automatic creation of back-links from Zotero
Tight integration with a custom Zotero client plugin: A plugin that runs in the Zotero client provides direct access to the Zotero database through a local web server. The plugin could provide bidirectional coupling and Logseq could modify Zotero items.
- Advantages
  - No need for .bib export
  - Can automatically add a note to a Zotero item that links back to all Logseq pages that reference the item
- Disadvantages
  - Zotero currently has no client-API
    - Currently only option is to install local server into Zotero using the debug-bridge and then send js commands
  - Overly tight integration with Zotero: if Zotero is down or there is a problem with the plug-in Logseq doesn’t work either.
Integration using the Zotero web-API
- Not an option
  - Expensive, needs unlimited Zotero subscription for any realistically-sized library
  - No privacy, need to sync entire Zotero collection and annotations to cloud
  - Doesn’t work when offline or when Zotero is down
  - High latency, documents (potentially very large) and information not sourced locally

While it is tempting to try to set up a direct integration with the Zotero client, the lack of a supported client-API makes this approach somewhat sketchy. At the moment, the only realistic option is to use Zotero+BetterBibtex to write automatically updated .bib files, which can then be imported by Logseq. Probably BetterBibtex needs to export a more complete set of information for each file, including the item identifiers, so that Logseq can automatically add zotero://select links, but this is a minor issue.

Did I miss any options for Zotero integration?

How to get outlines from Logseq into Word/TeX?

The third question is how to turn a Logseq outline into a complete article. Most likely, Word and LaTeX will stay with us for a while. While Logseq can export to html and hopefully soon pandoc, this process isn’t very robust and doesn’t seem to work well for e.g. images, formulas, and references. Realistically, one will need to manually re-enter all references, formulas, and images into the pasted text. It might be best to do the outlining directly in Word.

Has anyone any experience actually outlining an article in Logseq and transferring the content to Word?

Any thoughts on other workflows?

hotaro · June 21, 2022, 12:02am

Thank you for the well-done right up! I can’t say that am a power-user in Zotero yet so perhaps others can fill in their experiences, but here are my thoughts on some of your questions:

How to split workflow between Zotero and Logseq

…
Has anyone done an in-depth comparison between Zotero and Logseq PDF annotation? Are there any downsides of Logseq?

Zotero 6 Annotation Pros:
- Text Search
- Can edit highlight annotations
- Highlight annotations have appropriate spacing between lines (there is an extra space between words at the end of a line and the first word in the next line which is missing from logseq highlight annotations).
- Can highlight images
- Can export annotations to Zotero’s new note format (at the cost of cloud space if you have image annotations), and then export to markdown (I have not tested how it works with image annotations)
- Very stable (no data loss or links breaking)
- MarkdownDBConnect plugin in Zotero can link to Obsidian, logseq and other software to add an icon on articles in the Zotero Database. This helps differentiate between articles I have created a note for in Logseq and others that I haven’t made a note for yet. It’s simple to setup especially if you’re using citekeys as your markdown file names.
- If I annotate the pdf file directly, the annotations show up in the sidebar similar to if I made the highlight in Zotero. However, I can’t edit the highlighted text.
Zotero 6 Cons:
- As you mentioned, there is no note or article linking feature which logesq is best for.
- Without exporting annotations are stuck inside Zotero. However there are hyperlinks at the end of each annotation that can open local Zotero when we need to see the context.
Logseq PDF Annotation Pros:
- Highlights text and images that can be easily referenced anywhere in logseq.
- Highlight annotations can be edited to include anything that can be rendered in logseq (mathjax, code, bold, italics, links…)
- Zotero settings in logseq allows for importing of links to pdfs from our Zotero database.
Logseq PDF Annotation Cons:
- No pdf text search
- Image Highlights don’t work with Zotero PDFs with spaces in the name: github issue. There is a small fix for that currently in the issue comments, but requires file renaming with Zotfile.
- Current Zotero plugin in logseq isn’t customizable like the one in Obsidian (no customizable template for yaml properties) which results in creating too many pages for all the authors. The search option is also very slow compared to Obsidian and shows less information (missing the authors, year of publication). Otherwise it does what it needs to do.
- UI zoom scaling resets while editing or resizing the logseq window. When that happens the view also resets to the beginning of the file.
- If the pdf file has highlights already, they do show up in the logseq pdf viewer, but they don’t fill up the logesq annotation file, unlike in zotero.

The most stable and consistent workflow I would think is to take all my notes on Zotero, and then export the notes and images to markdown. Unfortunately I’m more used to taking notes and summarizing as I read which leads to me annotating in logseq more. The caveat here being I need to screenshot figures and diagrams instead of linking in image annotation since that is still buggy at the moment.

If the logseq team fixes zoom scaling bug, and image highlight bug with zotero pdfs, then I think the workflow where zotero is used to capture articles and logesq for annotation and linking would work well. Especially for those who work mostly with text and less with figures/ diagrams.

Note: I use Windows 10

hotaro · June 21, 2022, 12:28am

How to transfer data from Zotero to Logseq?

… Did I miss any options for Zotero integration?

Another method I’ve seen floating about is to use Obsidian’s Zotero Integration Plugin to make the markdown file in Logseq with a custom template (see the post). It’s essentially the same as the ‘loose-integration’ approach you described.

How to get outlines from Logseq into Word/Tex?

I can see the value of outlining in logseq itself because it’ll keep a record of where I used my ideas and and what new connections I can make. I find I like to make the outlines in the software where I will make my full draft. Every time I make an outline in Logseq, I end up rewriting it anyway (for the reasons you pointed out).

Then again I have less experience with the output part of the workflow, so perhaps someone else could chime in?

nhanjkl · June 21, 2022, 4:39am

I think many of us are setting up Zotero with Zotfile and are able to use Zotero for free.

The set up is based on this article Zotero hacks: unlimited synced storage and its smooth use with rmarkdown • Ilya Kashnitsky with quite a bit of tweaking. The underlying mechanism is to use Zotero proprietary sync for everything except attachment files because it is free for this purpose, and all the attachments such as PDFs can be sync by using Zotfile and a 3rd party sync service (like Google Drive).

Setting up Zotero to play nicely with Logseq in order to preserve the annotations when moving the Logseq Graph around is another headache. There’re some effort to improve this UX in the work (see the Discord thread here) but no idea when this will be done.

I’m writing this just to argue that cost should not be a reason to not use “Integration using the Zotero web-API”. Your other 3 reasons are valid.

there’s a pull request to address this, but apparently there’s some incompatibility with the old implementation, and no idea when it will be done feat(pdf): fix formatting of copied text

Highlighting figures has been very stable for me and I do a lot of this. Maybe there’s something wrong in your setup. My issue with PDF annotations in Logseq is that there are many moving parts that can go wrong (usually in the file name). You can get help with that in Discord by others, or tag me at @Nhan.

gax · June 22, 2022, 4:31am

Thank you for your thoughts and for your in-depth comparison!

You mentioned quite a few issues with doing annotations in Logseq that I wasn’t aware of. They are not unsolvable, so let’s hope that they will be fixed soon.

I’ve accumulated a lot of annotations in Zotero (using the old notes and now the new annotations), but it feels very limited. Having the ability to add block-level tags is quite nice.

I’ll need to have a closer look at the MarkdownDBConnect plugin, this type of plugin could solve the backlink issue for the loosely coupled approach via a bib file.

gax · June 22, 2022, 4:56am

You are right about Zotero storage. I think it is also possible to directly sync the storage folder with syncthing or similar, just the database itself has to be sync’ed through the Zotero server.

I had a look at how the Zotero annotations are stored in the database: The annotations are stored individually in the sqlite file. Images are stored as regular Items in the storage folder. So most likely most users will be able to stay under the free tier if they sync the storage folder manually.

For me it is still not an option to upload all my database to the Zotero cloud due to privacy concerns, but it might be ok for some.

Personally, I’d like to move away from Zotero for anything beyond collecting and managing items. The architecture of Zotero is too closed for my taste. Moving items around is surprisingly difficult if not impossible, for example, moving items between libraries resets the created date, which would mess up my workflow. Also, Zotero’s tagging and filtering is lacking compared to Logseq, no hierarchies etc.

geo_fan · June 23, 2022, 6:07pm

Hi there, a scientist is here. A heavy user of Zotero, Zettlr, etc. Very recently new Zotero plugin was announced, seems that the author is keeping it well updated. It is still not well known, but look promising for fast outlining and linking when working with PDF’s.
Thanks for interesting discussion!

David_Annetts · June 29, 2022, 12:15am

Another card carrier here.

I’d agree with the previous responses: a well thought out writeup of issues surrounding what is potentially a very useful workflow.

I’d hesitate to call myself a power-user in any of the programs under review (LibreOffice / LaTeX / zotero / logseq), despite a reasonable amount of experience in all.

For me, tight integration between zotero & logseq would be ideal. It strikes me that a useful avenue to pursue might be along the lines of zotero plugins for Libre(MS)office, which appear to reference local storage.

An equally workable solution would be for logseq to be able to import .bib files, much like LaTeX’ bibliography. This would obviate the need to work with large bibliographies.

For me, logseq’s ability to directly reference PDF’s in notes is a game changer

It might also be worthwhile asking what you require of each component of your workflow. I don’t require much more from logseq other than concept linkage & and export of a few dot points. I don’t require much more from zotero than to store references for searching. Any writing that needs to be done, I’m doing in the end program (LibreOffice or LaTeX, as the case may be) so that I can leverage the strengths of each component. However, it is useful to export a series of dot points with notes and references through (eg) pandoc (such as Zettlr) to the end program.

$0.02.

gax · June 29, 2022, 4:26pm

As far as integration goes, I found out that Zotero is not very open and that it is quite difficult to get access to the data locally.
I looked at the office integration a while ago, and it was very complex and limited protocol, that was also completely different between MS Word (COM-based, I think) and OO. There is also this protocol:

Overall, I am torn about the Zotero integration. I see that Zotero is developing quite slowly and I feel that relying on Zotero internals might be dangerous in the long run. My library has become very large, and the Zotero citation picker has become extremely slow, a problem shared by many users.

For some reason, Zotero does not provide a local API to access the database, so there is no official way to interact with a local Zotero instance (which is needed for privacy reasons and to work offline). Zotero also plans to switch to Electron, a switch which might or might not affect any plugins Logseq would rely on

For these reasons, I feel that the safest route is to go through .bib files (which would also open workflows with other reference managers).

An option for a tight integration could be to have a scanner that goes through the Logseq documents, finds any links to zotero, then opens Zotero and adds linked documents back to the markdown files. If the Zotero plugin goes down for whatever reason, it wouldn’t stop Logseq from working. I think this would be the best and most stable solution, short of an officially supported local API that exposes the full database (similar to Calibre’s API and the Content Server).

I agree with you that writing needs to be done in a word processor or LaTeX for the time being.

nhanjkl · June 29, 2022, 6:17pm

I’m new to Zotero so I don’t know much about it. Is it that you feel the development is slow or is this relative to another reference manager? Do you have an alternative in mind?

Could you give a few links of example of workflow using .bib file? I don’t know anything and would like to learn about this.

Zotero also plans to switch to Electron

They’ve talked about it for 5 years and the latest is “won’t be […] anytime soon” ha ha.

gax · June 29, 2022, 10:01pm

[quote="nhanjkl, post:11, topic:8205]

I’m new to Zotero so I don’t know much about it. Is it that you feel the development is slow or is this relative to another reference manager? Do you have an alternative in mind?

[/quote]
Zotero is a great program and I don’t see anything coming even remotely close, but still I have the feeling that Zotero is starting to lag behind. I am sure many problems are due to technical debt from being tied to the browser platform, this also makes it difficult to interface with 3rd party software. If you compare Zotero to Calibre, the latter has a much more vibrant developer company that has created a huge amount of plugins.
Over the years, I have run into many limitations of Zotero, such as

no easy way to transfer items between libraries while maintaining all information
no way to support complex workflows
search is very slow
too much emphasis on cloud sync, which has privacy issues
citation picker is very slow
no supported local API
tag system is primitive compared to how it should be.
no way to automatically populate collections based on tags (search folders have to hierarchy)
no automatic renaming of tags.
Zotero notes are great, but they lack Logseq’s features for assembling the information into other documents. Can’t tag individual blocks in Zotero’s Notes, tags are per note.
The new note support is great, but it still doesn’t support TeX, and currently there is no good way to export notes. Writing a note is a substantial investment (many hours per article), and I don’t like my notes to end up in a format that I can’t export properly. I don’t want to rely on a plugin either that might stop working in a few years when they move to Electron.

All of these issues could be addressed with a couple lines of Python, but the lack of a local API makes this difficult and one has to rely on the unofficial debug-bridge or write a Zotero plugin.
The Zotero development is also not very open, they have a mailing list, but no public roadmap.
I don’t want to be too critical of Zotero, like I said, it is a unique program, but I am still worried about putting too much of my intellectual work into the Zotero ecosystem.

[quote="nhanjkl, post:11, topic:8205]
Could you give a few links of example of workflow using .bib file? I don’t know anything and would like to learn about this.
[/quote]

There is a plugin for Better Bibtex that automatically writes a bib file and keeps it sync’ed. It still misses some information that would be useful (such as Zotero ID’s for zotero://select links, but probably the author would be willing to add those).
Logseq could then parse this file. This has some major advantages, it still works if Zotero is down and it doesn’t rely on the cloud, so no latency or privacy issues.
I wrote some more comments here.

[quote="nhanjkl, post:11, topic:8205]

Zotero also plans to switch to Electron

They’ve talked about it for 5 years and the latest is “won’t be […] anytime soon” ha ha.
[/quote]

That’s a good example for the lack of openness. Three years ago it was supposed to happen within half a year and now it has been postponed forever without much of an explanation. I don’t care about the GUI, but if the switch eventually happens it might break add-ons. I am also not very inclined to write add-ons for this reason.

Tigersen · June 30, 2022, 1:23pm

I recommend a zotero plug-in called “Zotero IF pro max”. For highlighting content marked up by Zotero’s own PDF reader, it supports automatic generation and export of markdown files, with or without highlighting colors. The location of the exported file is the location of Logseq’s data. It is designed for Obsidian, but Logseq is also applicable.

The problem is that it’s a Chinese plugin, and that you have to pay for it. I’m not sure if it’s available in English. If you guys would like to try using translation software, I’m sure it would be very helpful. (Zotero IF Pro Max 首次使用须知

Ken_Arnold · July 18, 2022, 8:26pm

I just noticed GitHub - sawhney17/logseq-citation-manager — has anyone tried it?

gax · July 18, 2022, 9:41pm

It works great! I have issues that Logseq doesn’t work with relative links (see Comprehensive Zotero Plugin - #42 by Luhmann ), but that is a Logseq bug.
It might be related due to me having the Zotero storage folder in a different location.

yangjincai · August 16, 2022, 8:38am

zotero-better-notes is great on this.
I take all my notes in zotero with zotero-better-notes, and then export markdown and sync them under Logseq folder.
Each note has a link to the reference pdf in zotero.
You can open the pdf from the note in Logseq with one click.
It work great.

yangjincai · August 16, 2022, 8:42am

geo_fan also mentions zotero plugin zotero-better-notes above, Scientific Workflows with Zotero - #8 by geo_fan

Tigersen · August 23, 2022, 1:27pm

I’ve tried zotero-better-notes, but the markdown file exported to logset is like, all of my annotations are all in one block. It’s kind of annoying I have to say

Flaunster · August 26, 2022, 8:10am

@yangjincai what export settings do you use from zotero-better-notes? (see snapshot below).
I screenshot an arbitrary selection, but I feel like whatever combination I try, the links aren’t working within LogSeq. But this is an amazing project, I hope I can get it working.

yangjincai · August 29, 2022, 5:47am

Hi, @Flaunster , I use this export setting.

and if you want the [[bi-directional links]] work, you need to remove the random tag (avoid conflict) in export file names.
Zotero → Edit → Note Template Editor → ExportMDFileName:

related discussion in zotero-better-notes issue125.

Flaunster · August 30, 2022, 1:12am

Thank you @yangjincai !!! You just saved me untold hours trying to figure that out.

Also, the Zotero-better-notes plug-in sync is unidirectional…so accidentally overwriting notes seems over (especially over time when you forget about the syn and revisit a paper).

It’s frustrating because this solution is SO close to working if it could just sync both ways. Do any developers out there have a sense of how much work this would require to develop bi-directional sync? Like would it be an arm and leg to hire a freelancer or just a leg?