Scientific Workflows with Zotero

I’d like to start a discussion on possible Logseq-Zotero workflows for scientific research.

The goal is to work towards a complete workflow that leverages the strengths of Zotero, Logseq, and a word processor. The first question is where to draw the boundaries between the different programs.

These are the typical steps that a researcher might follow:

  1. Capture references

    • Collect articles by searching Scopus, Google Scholar etc.
    • Zotero is excellent, it is unlikely Logseq will be able to compete with the Zotero connector machinery
  2. Manage references

    • Maintain a database of articles
    • Zotero seems to be the standard, even though others might prefer JabRef or similar
    • This is not a space for Logseq to compete in
  3. Annotate documents

    • Read individual documents and add notes, capture screenshot, images etc.
    • Zotero 6.0 added a great PDF reader and note editor, which still has some limitations
      • Math formulas are not supported
      • Only pdf is supported, no epub, html, djvu
      • Code snippets are not supported
      • There is no linking/referencing system that comes anywhere close to
        Logseq’s capabilities
      • This is nearly a draw between Zotero and Logseq, but Logseq has a slight edge:
        • Zotero has the advantage of closer integration with the literature database
        • Logseq has the edge with respect to annotation and information management
        • Zotero is not very open, annotations are stored in a database and currently there is no easy way to export them
        • If Logseq was to provide more formats (epub, html, djvu), it could be far superior
  4. Assemble information

    • Combine information extracted from multiple individual documents
    • Add own research notes
    • Logseq was designed for this and is vastly superior.
    • It is highly unlikely Zotero will ever be competitive in this space
  5. Outline new article

    • Create an outline of a new article
    • Similar to 4., but some differences
      • Needs ability to easily reference external materials, own diagrams etc.
      • Export of content to next stage needs to be seamless and not lose any information
      • While Logseq is an amazing outliner, export is not perfect. Need an easy way to copy and paste outlines into Word, including images and references. Ideally Logseq would export a .docx file with the reference information stored in field codes (for Zotero bibliographies), or as \cite{} fields (for BibTeX).
    • Candidates for outlining are Logseq and Word.
  6. Write articles

    • Currently most people are using Word and LaTeX
    • Many constraints exist to fit into existing workflows (Templates from publishers, coworkers not used to other formats, need Word collaboration features etc.)
    • While there are some attempts for scientific writing in Markdown (see e.g. Scientific Writing with Markdown | Jaan Tollander de Balsch), formatting requirements (footnotes, references, templates, typesetting) go beyond capabilities of basic Markdown
    • For many fields, Word (or LaTeX) will remain the default option for a long time

How to split workflow between Zotero and Logseq?

The first big question is where to switch from Zotero to Logseq in the workflow. Zotero is superior for collecting and managing references (1. and 2.) and Logseq is superior for annotation and information assembly (3. and 4.).

While Zotero now has a solid annotation feature, I think it makes sense to annotate in Logseq instead, as this allows to seamlessly include the annotation in other documents, which would not be possible in Zotero.

Has anyone done an in-depth comparison between Zotero and Logseq PDF annotation? Are there any downsides of Logseq?

How to transfer data from Zotero to Logseq?

The next question is how to integrate Zotero and Logseq for a workflow that uses Zotero for collecting and managing references, and Logseq for annotating documents.

Options for integrating Logseq with Zotero and other reference managers:

  • Loose integration through files: zotero writes a .bib or .csl-jason file and logseq opens these files for citing
    • Advantages
      • Simple, automatically updated export to files has already been implemented in BetterBibtex
      • Loose coupling with Zotero, if Zotero is down everything still works
        - Would also work with JabRef and other reference managers
    • Disadvanteges
      • No automatic creation of back-links from Zotero
  • Tight integration with a custom Zotero client plugin: A plugin that runs in the Zotero client provides direct access to the Zotero database through a local web server. The plugin could provide bidirectional coupling and Logseq could modify Zotero items.
    • Advantages
      - No need for .bib export
      • Can automatically add a note to a Zotero item that links back to all Logseq pages that reference the item
    • Disadvantages
      • Zotero currently has no client-API
        - Currently only option is to install local server into Zotero using the debug-bridge and then send js commands
      • Overly tight integration with Zotero: if Zotero is down or there is a problem with the plug-in Logseq doesn’t work either.
  • Integration using the Zotero web-API
    • Not an option
      • Expensive, needs unlimited Zotero subscription for any realistically-sized library
      • No privacy, need to sync entire Zotero collection and annotations to cloud
      • Doesn’t work when offline or when Zotero is down
      • High latency, documents (potentially very large) and information not sourced locally

While it is tempting to try to set up a direct integration with the Zotero client, the lack of a supported client-API makes this approach somewhat sketchy. At the moment, the only realistic option is to use Zotero+BetterBibtex to write automatically updated .bib files, which can then be imported by Logseq. Probably BetterBibtex needs to export a more complete set of information for each file, including the item identifiers, so that Logseq can automatically add zotero://select links, but this is a minor issue.

Did I miss any options for Zotero integration?

How to get outlines from Logseq into Word/TeX?

The third question is how to turn a Logseq outline into a complete article. Most likely, Word and LaTeX will stay with us for a while. While Logseq can export to html and hopefully soon pandoc, this process isn’t very robust and doesn’t seem to work well for e.g. images, formulas, and references. Realistically, one will need to manually re-enter all references, formulas, and images into the pasted text. It might be best to do the outlining directly in Word.

Has anyone any experience actually outlining an article in Logseq and transferring the content to Word?

Any thoughts on other workflows?

10 Likes

Thank you for the well-done right up! I can’t say that am a power-user in Zotero yet so perhaps others can fill in their experiences, but here are my thoughts on some of your questions:

How to split workflow between Zotero and Logseq


Has anyone done an in-depth comparison between Zotero and Logseq PDF annotation? Are there any downsides of Logseq?

  • Zotero 6 Annotation Pros:
    • Text Search
    • Can edit highlight annotations
    • Highlight annotations have appropriate spacing between lines (there is an extra space between words at the end of a line and the first word in the next line which is missing from logseq highlight annotations).
    • Can highlight images
    • Can export annotations to Zotero’s new note format (at the cost of cloud space if you have image annotations), and then export to markdown (I have not tested how it works with image annotations)
    • Very stable (no data loss or links breaking)
    • MarkdownDBConnect plugin in Zotero can link to Obsidian, logseq and other software to add an icon on articles in the Zotero Database. This helps differentiate between articles I have created a note for in Logseq and others that I haven’t made a note for yet. It’s simple to setup especially if you’re using citekeys as your markdown file names.
    • If I annotate the pdf file directly, the annotations show up in the sidebar similar to if I made the highlight in Zotero. However, I can’t edit the highlighted text.
  • Zotero 6 Cons:
    • As you mentioned, there is no note or article linking feature which logesq is best for.
    • Without exporting annotations are stuck inside Zotero. However there are hyperlinks at the end of each annotation that can open local Zotero when we need to see the context.
  • Logseq PDF Annotation Pros:
    • Highlights text and images that can be easily referenced anywhere in logseq.
    • Highlight annotations can be edited to include anything that can be rendered in logseq (mathjax, code, bold, italics, links…)
    • Zotero settings in logseq allows for importing of links to pdfs from our Zotero database.
  • Logseq PDF Annotation Cons:
    • No pdf text search
    • Image Highlights don’t work with Zotero PDFs with spaces in the name: github issue. There is a small fix for that currently in the issue comments, but requires file renaming with Zotfile.
    • Current Zotero plugin in logseq isn’t customizable like the one in Obsidian (no customizable template for yaml properties) which results in creating too many pages for all the authors. The search option is also very slow compared to Obsidian and shows less information (missing the authors, year of publication). Otherwise it does what it needs to do.
    • UI zoom scaling resets while editing or resizing the logseq window. When that happens the view also resets to the beginning of the file.
    • If the pdf file has highlights already, they do show up in the logseq pdf viewer, but they don’t fill up the logesq annotation file, unlike in zotero.

The most stable and consistent workflow I would think is to take all my notes on Zotero, and then export the notes and images to markdown. Unfortunately I’m more used to taking notes and summarizing as I read which leads to me annotating in logseq more. The caveat here being I need to screenshot figures and diagrams instead of linking in image annotation since that is still buggy at the moment.

If the logseq team fixes zoom scaling bug, and image highlight bug with zotero pdfs, then I think the workflow where zotero is used to capture articles and logesq for annotation and linking would work well. Especially for those who work mostly with text and less with figures/ diagrams.

Note: I use Windows 10

2 Likes

How to transfer data from Zotero to Logseq?

Did I miss any options for Zotero integration?

Another method I’ve seen floating about is to use Obsidian’s Zotero Integration Plugin to make the markdown file in Logseq with a custom template (see the post). It’s essentially the same as the ‘loose-integration’ approach you described.

How to get outlines from Logseq into Word/Tex?

I can see the value of outlining in logseq itself because it’ll keep a record of where I used my ideas and and what new connections I can make. I find I like to make the outlines in the software where I will make my full draft. Every time I make an outline in Logseq, I end up rewriting it anyway (for the reasons you pointed out).

Then again I have less experience with the output part of the workflow, so perhaps someone else could chime in?

2 Likes

I think many of us are setting up Zotero with Zotfile and are able to use Zotero for free.

The set up is based on this article Zotero hacks: unlimited synced storage and its smooth use with rmarkdown • Ilya Kashnitsky with quite a bit of tweaking. The underlying mechanism is to use Zotero proprietary sync for everything except attachment files because it is free for this purpose, and all the attachments such as PDFs can be sync by using Zotfile and a 3rd party sync service (like Google Drive).

Setting up Zotero to play nicely with Logseq in order to preserve the annotations when moving the Logseq Graph around is another headache. There’re some effort to improve this UX in the work (see the Discord thread here) but no idea when this will be done.

I’m writing this just to argue that cost should not be a reason to not use “Integration using the Zotero web-API”. Your other 3 reasons are valid.

there’s a pull request to address this, but apparently there’s some incompatibility with the old implementation, and no idea when it will be done feat(pdf): fix formatting of copied text

Highlighting figures has been very stable for me and I do a lot of this. Maybe there’s something wrong in your setup. My issue with PDF annotations in Logseq is that there are many moving parts that can go wrong (usually in the file name). You can get help with that in Discord by others, or tag me at @Nhan.

2 Likes

Thank you for your thoughts and for your in-depth comparison!

You mentioned quite a few issues with doing annotations in Logseq that I wasn’t aware of. They are not unsolvable, so let’s hope that they will be fixed soon.

I’ve accumulated a lot of annotations in Zotero (using the old notes and now the new annotations), but it feels very limited. Having the ability to add block-level tags is quite nice.

I’ll need to have a closer look at the MarkdownDBConnect plugin, this type of plugin could solve the backlink issue for the loosely coupled approach via a bib file.

You are right about Zotero storage. I think it is also possible to directly sync the storage folder with syncthing or similar, just the database itself has to be sync’ed through the Zotero server.

I had a look at how the Zotero annotations are stored in the database: The annotations are stored individually in the sqlite file. Images are stored as regular Items in the storage folder. So most likely most users will be able to stay under the free tier if they sync the storage folder manually.

For me it is still not an option to upload all my database to the Zotero cloud due to privacy concerns, but it might be ok for some.

Personally, I’d like to move away from Zotero for anything beyond collecting and managing items. The architecture of Zotero is too closed for my taste. Moving items around is surprisingly difficult if not impossible, for example, moving items between libraries resets the created date, which would mess up my workflow. Also, Zotero’s tagging and filtering is lacking compared to Logseq, no hierarchies etc.

Hi there, a scientist is here. A heavy user of Zotero, Zettlr, etc. Very recently new Zotero plugin was announced, seems that the author is keeping it well updated. It is still not well known, but look promising for fast outlining and linking when working with PDF’s.
Thanks for interesting discussion!

1 Like