Advice for converting many long Word Docs to LogSeq?

I have hundreds of Word docs/Google Docs I’d like to just import into LogSeq. About 10 of them are 100+ pages as I used to just keep an doc open every day and use it as my running journal. Anyone have any tips for how to import these into LogSeq? I think they’d be much more useful inside LogSeq.

Ideas I have had:

  • Export the Docs to plaintext using the Word doc’s export function. Merge a bunch into one big text doc and then run a FIND + REPLACE to add dashes before every line to put it in LogSeq format and run the doc through some FIND + REPLACE algorithm I come up with to try to clean it up a bit.
  • Use something like PandaDoc (https://pandoc.org/) and then run through some sort of similar algorithm to clean these too

Or hopefully someone has come up with something else that is way awesomer than this drudgery!

1 Like

After reading your question, the first thing that came to my mind is just using pandoc to convert to Markdown.

You can convert each of them to a .md file (a logseq page). You’d need to put a dash (- ) at every paragraph if you want each paragraph to be a block.

If these documents are static and won’t be edited anymore, you could export them as pdf and just put the pdf in logseq

1 Like

I like the PDF idea. I was dreading the transfer process from Google Docs/MS Word

Thanks!

I am working on importing Google Docs into Logseq 0.10.8.
Seems to work quite well by copying all the Google Docs text, without highlighting the ToC nor the footnotes.

This is then easily pasted into a new Logseq 0.10.8 page.

The manual clean up work needed, which would be great if somehow automated, is :

  1. tab/indent heading H2-H6, H1 is fine on the left margin of the page;

  2. tab/indent text under headings H1-H6;

  3. Citations looks like they’ll have to be manually added in using [^1] → bottom of page → [^1]: → type footnote text → increment numbers manually.
    Automated citations/footnotes would be handy.