A lot of empty pages after import

macedotavares · October 22, 2021, 8:21am

Hi,

I’ve imported 2000+ notes from Obsidian. Logseq took around 30 minutes to parse them. I’ve then started to check them randomly, to see if I found anything wrong.

The problem

A lot of pages appear to be empty, even though their files are not. When this happens, the options to show the file in the directory and to open them in the default app, are unavailable.

The workaround

If I write something in them (inside Logseq), I get a warning about different content on disk, and I have the option to restore it.

UPDATE: This is no use, because whenever I Refresh the graph the pages are erased again.

What I’ve tried

Restarting, re-indexing and refreshing.

The cause?

I haven’t figured out a pattern for this. Initially, I thought that it happened to notes with diacritics in the title, but that doesn’t seem to be a sufficient condition.

My setup

macOS 11.6
Logseq Desktop v. 0.4.4

UPDATE

I’ve listed all the filenames with diacritics (500+) and I’m going through their respective Logseq pages. The vast majority of them seem to suffer from the same problem.

On the upside, I think I’ve pinned down part of the problem:

When I search Logseq for the title, I get two (seemingly) identical entries. One is empty and the other has the original content. The root of the problem seems to be two different Unicode endpoints for the accented characters.

For example, one í is U+00ED (í Latin Small Letter I with Acute) and the other is the unaccented i combined with the accent in a character sequence.

I kind of hope that it’s something on Logseq’s side, as I have no idea of how to fix it.

UPDATE 2

I’ve been experimenting with Python’s unicodedata module, namely its normalize function. I’ve ran all the files through a script that “NFC”-normalizes both filename and contents, and then started a new graph from scratch. The problem persisted.

Then, I noticed that the diacritics in the file names were still decomposed. Apparently, macOS decomposes them automatically, so it seems that there’s nothing I can do on that front.

Maybe normalising everything inside Logseq would solve it. Does this make any sense?

Oleg_Lustenko · October 22, 2021, 8:41pm

Thanks for sharing your migration in public! Such feedback is very helpful!

macedotavares · October 22, 2021, 10:04pm

Thanks!

I intend to share a proper migration guide when it’s over (and maybe even a script).

In the meanwhile, I’ll have to wait for this bug to be fixed.