Converting / importing existing text and markdown notes into Logseq

  • I have two main formats (very loosely speaking) of personal knowledge management notes that date back to about when the term “personal knowledge managementwas coined, apparently.
    • 10,000 notes that started in the late, great Sbook5 by Simson L. Garfinkel and then grew for a good decade in Tomboy. They have all already been converted to Zim wiki using a command-line tool i have lost track of that did a key piece of work in getting them a lot of the way to what Logseq will want, replacing the hash note filenames with Example_Note.txt and crucially putting the cross-references into exactly the [[Note Title]] format. Overall the contents look like this:
      • Content-Type: text/x-zim-wiki
        Wiki-Format: zim 0.26
        Creation-Date: Not found
        Modification-Date: Not found
        
        ====== Example Note ======
        
        https://example.com
        
        via Friend at [[Another Note Title]]
        
      • Zim wiki never worked out for me so these have remained pretty much static since then, found when i need them with fsearch and occasionally edited with a text editor. I am really excited about them becoming a living, breathing part of my brain again.
      • So i can happily throw out all the metadata at the top (including the title?), change the filenames to not have underscores, and (ideally i guess) convert the content’s paragraphs (double line breaks) to list items and be all set?
    • 3,000 notes that are all in markdown and Logseq is already pointing at the directory that contains them and, to Logseq’s credit, is not freaking out too much other than the occasional “Large block will not be editable or searchable to not slow down the app, please use another editor to edit this block.” message. (Poor Tomboy, if it had taken this approach i might still be able to use it today. The format here is more all over the place and there are probably not enough consistent interlinking of any kind to be worth trying to preserve, but i figure i should at least fix the note titles so i’m not making a bunch of references to [[example-organization]] when i should be using [[Example Organization]].
      • As close as there is to a typical one would have a filename like example-organization.md and contents that look like this:
        • # Example organization
          
          > We manufacture examples.
          
          [About Example Organization](https://example.com/about)
          
          123 Something Street
          Nowheresville, MA 12345
          
      • Another large number of these would be really cool to convert into journals. Probably not worth the effort, but in case anyone has quick code hands or recommendations, they all have titles like 2021-03-19-near-north-camp-defense-thoughts.md and then content that typically does not have a title or really any other formatting to speak of:
        • militant defenders appeared to be the key
          
          but possibly could have held off without arrests (only physical contact with police being shield wall for instance)
          
          
          
          
          An unjust economy built on genocide, theft, and exploitation so extreme it required the invention of racism has left thousands of people without housing
          
        • As with all of these, any formatting that it has would be standard markdown.
      • The important thing is these are all plain markdown files, but some have # Heading 1 titles and others have none at all.
      • A small matter, i have followed a format of c: 2019-01-17 for created date if that can easily be converted into some kind of useful metadata in Logseq but that’s really not important at all.
    • Tools for making text and markdown files more Logseq friendly:
2 Likes

Already posted above the scripts i found— off in the other resources category of the Awesome Logseq listing.

I could not find any documentation on importing except for this empty page which is presumably meant to document the Import feature in the three-dots menu. That, for its part, only has “Import existing notes” and promises “If they are in a JSON, EDN or Markdown format Logseq can work with them.” but only has these buttons:

  • RoamResearch Import a JSON Export of your Roam graph
  • EDN / JSON Import an EDN or a JSON Export of your Logseq graph
  • OPML Import OPML files

Nothing for plain text or plain markdown. And besides, as covered above, Logseq can already see my markdown files but i still need to improve the formatting and filenames.

Feature request to import from markdown that mentions the need to add bullets and to move assets to an assets directory and fix references to assets, which i realize now i have a small need for also:

But fixing up frontmatter and to some extent fixing filenames will continue to require customized scripts for random formats.

Other import questions from the forums:

There is clearly the need to have some conversion tools that people can modify to help them with getting their files formatted for Logseq.

1 Like

Here is my simple python script (in progress) for my 15,000 notes and addresses, removing unused Zimwiki frontmatter and converting the title to Logseq-amenable metadata:

1 Like

That script is now not so simple and does everything i need for the ~15,000 batch of notes. I planned to keep it simple and then use Longdown, but did not for two reasons:

  1. It broke my frontmatter (properties/metadata)
  2. The decisions it made for when to make a Logseq bullet and when not were kind of opposite for the way i made notes; when i have two spaces between paragraphs it is probably a single section of text that does not need to be bulletted, and when i have a bunch of lines with no space they probably are sequential points that should be bulletted. So i deleted blank spaces and bulletted everything (except where i had """ which i sometimes used to offset quotations; i did not try to turn those into Markdown block quotes because no real reason to.

Important to note for the ~3,000 described second in tho original post above, all i did was put the markdown files into Logseq, or rather, i started a Logseq project in the notes folder, so all the legacy notes are in the main directory and all the new Logseq ones are in Pages.

For what turned out to be the more than 17,000 notes that were originally in Tomboy and converted to ZimWiki .txt files, i moved the Notebooks folder which had ~10,000 of its own notes and then additional subfolder notebooks, here is what i did:

Had to run grep -axv '.*' *.txt to find a rogue file that blew up python and iconv -f UTF-8 -t UTF-8//IGNORE Example_File_broken.txt >Example_File_fixed.txt to fix it (moving the file first so it would end up with the correct filename), but this frighteningly un-artful script logseq_helpers/zimwiki_txt_to_logseq_md.py at main - agaric/logseq_helpers - Agaric's Forgejo did the job for me getting the 17,343 Tomboy → Zimwiki notes into Logseq.

Simply CDd into each folder i wanted to convert and ran:

python /home/mlncn/Projects/agaric/logseq_helpers/zimwiki_txt_to_logseq_md.py

and then checked some of the results, if things looked good then rm *.txt (or if not, removed the markdown, tweaked the script, and ran again).

If you have code comments with this formatting:

/**
 * Comment.
 */

it does mess that up with turning the standalone asterisks into indented logseq bullets, but i was not supposed to have code in this set of notes so i let it slide.

Thank you for making and sharing this. I have two old zimwiki notebooks that I have been meaning to port. I will try it this weekend.

1 Like

@etc i’m afraid you will need to change the start of my script quite a bit— because my notes were originally in Tomboy, my Zimwiki frontmatter was all identical, so my script confirms it is there (and identical) in a very clunky matter and throws it away— yours probably won’t be identical, and you’ll probably want to pull out at least the creation date.

1 Like