Converting / importing existing text and markdown notes into Logseq

mlncn · May 27, 2024, 10:19pm

I have two main formats (very loosely speaking) of personal knowledge management notes that date back to about when the term “personal knowledge management” was coined, apparently.
- 10,000 notes that started in the late, great Sbook5 by Simson L. Garfinkel and then grew for a good decade in Tomboy. They have all already been converted to Zim wiki using a command-line tool i have lost track of that did a key piece of work in getting them a lot of the way to what Logseq will want, replacing the hash note filenames with Example_Note.txt and crucially putting the cross-references into exactly the [[Note Title]] format. Overall the contents look like this:
  - ```
  Content-Type: text/x-zim-wiki
  Wiki-Format: zim 0.26
  Creation-Date: Not found
  Modification-Date: Not found
  
  ====== Example Note ======
  
  https://example.com
  
  via Friend at [[Another Note Title]]
```
- Zim wiki never worked out for me so these have remained pretty much static since then, found when i need them with fsearch and occasionally edited with a text editor. I am really excited about them becoming a living, breathing part of my brain again.
- So i can happily throw out all the metadata at the top (including the title?), change the filenames to not have underscores, and (ideally i guess) convert the content’s paragraphs (double line breaks) to list items and be all set?
- 3,000 notes that are all in markdown and Logseq is already pointing at the directory that contains them and, to Logseq’s credit, is not freaking out too much other than the occasional “Large block will not be editable or searchable to not slow down the app, please use another editor to edit this block.” message. (Poor Tomboy, if it had taken this approach i might still be able to use it today. The format here is more all over the place and there are probably not enough consistent interlinking of any kind to be worth trying to preserve, but i figure i should at least fix the note titles so i’m not making a bunch of references to [[example-organization]] when i should be using [[Example Organization]].
  - As close as there is to a typical one would have a filename like example-organization.md and contents that look like this:
    - ```
    # Example organization
    
    > We manufacture examples.
    
    [About Example Organization](https://example.com/about)
    
    123 Something Street
    Nowheresville, MA 12345
```
- Another large number of these would be really cool to convert into journals. Probably not worth the effort, but in case anyone has quick code hands or recommendations, they all have titles like 2021-03-19-near-north-camp-defense-thoughts.md and then content that typically does not have a title or really any other formatting to speak of:
  - ```
  militant defenders appeared to be the key
  
  but possibly could have held off without arrests (only physical contact with police being shield wall for instance)
  
  
  
  
  An unjust economy built on genocide, theft, and exploitation so extreme it required the invention of racism has left thousands of people without housing
```
  - As with all of these, any formatting that it has would be standard markdown.
  - The important thing is these are all plain markdown files, but some have # Heading 1 titles and others have none at all.
  - A small matter, i have followed a format of c: 2019-01-17 for created date if that can easily be converted into some kind of useful metadata in Logseq but that’s really not important at all.
- Tools for making text and markdown files more Logseq friendly:
  - dundalek/longdown: Convert longform markdown files to outline format used by Logseq
  - Tools for formats more specific than text or markdown, but which may be better built and better to build on?
  - What have i missed? Other recommendations or simple little scripts people have used and would like to share now?

mlncn · May 27, 2024, 11:05pm

Already posted above the scripts i found— off in the other resources category of the Awesome Logseq listing.

I could not find any documentation on importing except for this empty page which is presumably meant to document the Import feature in the three-dots menu. That, for its part, only has “Import existing notes” and promises “If they are in a JSON, EDN or Markdown format Logseq can work with them.” but only has these buttons:

RoamResearch Import a JSON Export of your Roam graph
EDN / JSON Import an EDN or a JSON Export of your Logseq graph
OPML Import OPML files

Nothing for plain text or plain markdown. And besides, as covered above, Logseq can already see my markdown files but i still need to improve the formatting and filenames.

Feature request to import from markdown that mentions the need to add bullets and to move assets to an assets directory and fix references to assets, which i realize now i have a small need for also:

Import standard markdown to Logseq format - Feedback / Feature Requests - Logseq

But fixing up frontmatter and to some extent fixing filenames will continue to require customized scripts for random formats.

Other import questions from the forums:

Import process to import 3.5k notes from Apple Notes and 15 years of Things App - Questions & Help - Logseq
- This one is by far the closest to what i need, and had a lot of engagement, but did not end up with a link to any working script.
Advice for converting many long Word Docs to LogSeq? - Questions & Help - Logseq
- Another very similar usecase, some good discussion but no code yet.
How can I import my notes from Dendron to Logseq? - Questions & Help - Logseq
- Some good working-out-in-public from a linked script, though the format it needs to come from is at my best guess quite different from my need.
Some useful info for importing stuff into Logseq - General - Logseq
- Good blog post about a mixed manual and machine-aided approach.
How do i import another DB of .txt files into Logseq? - Questions & Help - Logseq
- This one has a helpful response pointing to a Mac tool that would do part of what i need for the 10,000, changing the file extension from .txt to .md.
PLEASE - Need help importing my graph from Reflect Notes - Questions & Help - Logseq
- One me-too response, no help.
OPML Import not working. Is there a workaround to import from Dynalist? - Feedback / Bug Reports - Logseq
- Not that related; no resolution.
How to Import Text File into Journals? - Questions & Help - Logseq
- Different scenario (a single file rather than many) but some code that still may be useful!
Remnote import feature - Feedback / Feature Requests - Logseq
- Did not even get into the details of how but some nice discussion on why to leave proprietary data silos.
Migrating from big ZIM wiki to Logseq, are there scaling or other issues to expect? - Questions & Help - Logseq
- So someone was able to import Zim wiki files but did not say how, and stopped using Logseq because the graph view froze with their data/computer.
Import/Convert WikidPad-Pages - Questions & Help - Logseq
- No responses.
What does the ‘Import files from the local directory’ button do? - Questions & Help - Logseq
- No responses.
How to import markdown files without re-indexing? - Questions & Help - Logseq
- Turned out someone writing an import script found a bug (unfortunately the import script was not shared).
Import Markdown files into Logseq failed - Questions & Help - Logseq
- At least two people reporting that Logseq is not “seeing” markdown files in its directory. Interestingly i have not had that problem but it is possibly one that conversion to more Logseq-friendly formatting would fix.
Can’t import because of duplicate filenames? - Feedback - Logseq
- Not my need but points to the need for an import tool that can combine folder names with filenames to flatten the directory structure while creating unique filenames.
Import from a spreadsheet into logseq (each row a page, as outline not as tables) / one time automated linking? - Questions & Help - Logseq
- Some cool help here, but pretty outside my needs.
A lot of empty pages after import - Feedback / Archive - Logseq
- Some really useful debugging pointing to the importance, it would seem, of keeping filenames to plain letters (no accents) if possible to avoid troubles, and an (apparently unfulfilled) intention to share a migration guide.
Missing content after import (not entirely) - Feedback / Bug Reports - Logseq
- One me-too but no useful debugging like above, but still points to the need for more help around importing existing markdown files.

There is clearly the need to have some conversion tools that people can modify to help them with getting their files formatted for Logseq.

mlncn · June 23, 2024, 4:29am

Here is my simple python script (in progress) for my 15,000 notes and addresses, removing unused Zimwiki frontmatter and converting the title to Logseq-amenable metadata:

mlncn · June 24, 2024, 1:47pm

That script is now not so simple and does everything i need for the ~15,000 batch of notes. I planned to keep it simple and then use Longdown, but did not for two reasons:

It broke my frontmatter (properties/metadata)
The decisions it made for when to make a Logseq bullet and when not were kind of opposite for the way i made notes; when i have two spaces between paragraphs it is probably a single section of text that does not need to be bulletted, and when i have a bunch of lines with no space they probably are sequential points that should be bulletted. So i deleted blank spaces and bulletted everything (except where i had """ which i sometimes used to offset quotations; i did not try to turn those into Markdown block quotes because no real reason to.

mlncn · June 25, 2024, 4:42pm

Important to note for the ~3,000 described second in tho original post above, all i did was put the markdown files into Logseq, or rather, i started a Logseq project in the notes folder, so all the legacy notes are in the main directory and all the new Logseq ones are in Pages.

For what turned out to be the more than 17,000 notes that were originally in Tomboy and converted to ZimWiki .txt files, i moved the Notebooks folder which had ~10,000 of its own notes and then additional subfolder notebooks, here is what i did:

Had to run grep -axv '.*' *.txt to find a rogue file that blew up python and iconv -f UTF-8 -t UTF-8//IGNORE Example_File_broken.txt >Example_File_fixed.txt to fix it (moving the file first so it would end up with the correct filename), but this frighteningly un-artful script logseq_helpers/zimwiki_txt_to_logseq_md.py at main - agaric/logseq_helpers - Agaric's Forgejo did the job for me getting the 17,343 Tomboy → Zimwiki notes into Logseq.

Simply CDd into each folder i wanted to convert and ran:

python /home/mlncn/Projects/agaric/logseq_helpers/zimwiki_txt_to_logseq_md.py

and then checked some of the results, if things looked good then rm *.txt (or if not, removed the markdown, tweaked the script, and ran again).

If you have code comments with this formatting:

/**
 * Comment.
 */

it does mess that up with turning the standalone asterisks into indented logseq bullets, but i was not supposed to have code in this set of notes so i let it slide.

etc · June 28, 2024, 7:00am

Thank you for making and sharing this. I have two old zimwiki notebooks that I have been meaning to port. I will try it this weekend.

mlncn · June 28, 2024, 11:14am

@etc i’m afraid you will need to change the start of my script quite a bit— because my notes were originally in Tomboy, my Zimwiki frontmatter was all identical, so my script confirms it is there (and identical) in a very clunky matter and throws it away— yours probably won’t be identical, and you’ll probably want to pull out at least the creation date.

etc · August 10, 2024, 3:33pm

So… by this weekend I meant this weekend

After removing the frontmatter checks:

it does as advertised.
it is fast!

Items I would need to work on:

It does not seem to go into subfolders (or I am doing something wrong).
- I quickly run it recursively through subfolders with a quick bash script, but additional changes are needed to flatten the structure into a single folder and rename the files such that logseq understands the namespace structure.
formatting of headings
some solution to manage assets included within a note

Thanks again for sharing your script!

Kurkurator · January 7, 2025, 10:32pm

Thank you - this is wonderful and serves as a great encouragement for migrating my external brains from zim to logseq.