How important is pages-metadata.edn?

I have noticed this single-line file that keeps growing as I add pages and does not seem to contain more information than creation and modification timestamps for my pages. I am wondering how important it is for logseq to work properly, i.e. would I lose any information if I removed it and re-indexed my graph?

I am asking this because I use Syncthing to share my data across various machines and this file often gets conflicts. Also I suspect it could cause some data loss if some page gets updated in the background and the metadata does not match anymore. Basically I am thinking of asking Syncthing to ignore it so each machine works with its own file, but first would like to understand a bit better what it is for to avoid breaking something.

This is very closely related to the issues discussed in Format pages-metadata.edn before saving.

FWIW, I have stomped on versions of this file by simply picking one version in a Git merge and never had any problems. As we discuss in that linked suggestion, however, there is no way to reliably get that creation / modification info back. This will presumably mess up things like your recent pages list, etc. But your notes all keep working AFAICT.

2 Likes

Thanks for the answer, the introduction of newlines in that file has made it much easier to manage with version control, so my initial problem is mostly moot thanks to that. :+1:

One thing that puzzles me a bit about this file though is that creation/modification dates should already be provided by the filesystem (e.g. ctime/mtime on *nix). So why duplicate it. Having everything in one file might also not scale well, I suppose the whole file is rewritten every time another file is modified to update the metadata, which scales linearly with the number of files on disk.

While I can’t speak for the devs here, file creation and modification data is not reliable across arbitrary approaches to sync across machines (simple example - someone might use cp -r, but it’s also impossible to track behavior of the many systems for networked files). So, I grok why a “it’s just files” approach like logseq might not want to rely on filesystem metadata.

I’ve also noticed that pages-metadata.edn is responsible for many of my merge conflicts. So far, I’ve arbitrarily chosen the RHS or LHS side, but I suspect that in time, there may be information loss with this approach. Is there a guide somewhere on how to properly set up and use Git with more than one copy of Logseq Desktop?

See also pages-metadata.edn data structure does not scale with distributed workflow · Issue #3907 · logseq/logseq.

I wrote a small script for resolving conflicts of pages-metadata.edn using python.
The strategy for merging is to take minimum of the both files for created-at timestamp and maximum for updated-at timestamp.
Please take a look if you are interested:

Thanks @juzbox for sharing this script! For non-technical users, like myself, what directory should this .py script be placed in? Any other steps (beyond making executable) users should be aware of? Or cross-platform differences (I’m syncing my LogSeq graph via git/GitHub between two machines, one Linux, and one Windows)? This whole thing is a stopgap measure until LogSeq’s sync service comes online.

Thanks @Flaunster for your interest in my script.
The python script can be placed anywhere. (I personally put it in my Logseq graph’s root directory, where you can find subdirectories like assets, journals, 'logseq, pages`.)
If no argument is supplied to the script, the script assumes the current working directory as the Logseq graph’s root directory. You can also supply the graph’s root directory as the first argument to the script.

E.g. you can run either:

$ cd path/to/your/logseq/graph/directory
$ python path/to/fix-pages-metadata-conflicts.py

or else

$ python path/to/fix-pages-metadata-conflicts.py path/to/your/logseq/graph/directory

That was certainly easy! Have added as executable to the root directory, so far so good. Thank you again.

UPDATE: Working well on Ubuntu 20.04 through LogSeq version 0.74 (installed through flatpak hub). Thanks again!

That is amazing! Thank you!
I use git for sync between android phones (with Termux) and Linux computers (with Logseq’s own auto commit system), and was about to add pages-metadata to my .gitignore before stumbling upon this. Certainly going to try it first.
There is one question tough: In android, the sync process is manual (I press buttons and scripts get run), so I know how to add your script into those (there must be an easy way to use python inside Termux);
But, how can we integrate your script into logseq’s auto commit system? do we just put the command line calling it inside ‘.git/hooks/pre-commit’?

Not sure if this will work but I run my python script that access multiple logseq graphs (I have several) in the Pydroid 3 app on Android 11

It seems like in the latest changelog it was announced that pages-metadata.edn is no longer being used. I would really like to know what is being done instead currently to persist that metadata (creation and modification).

2 Likes

I agree, @danilofaria : how are dates being stored now? This seems like a big, unanswered question. I have written about his here:

My current understanding is that they’re simply using the file’s metadata. The limitation of that is that if you’re using git then you won’t always have the right metadata since git does not synchronize file metadata, only file content. The other limitation is that for pages without files, the metadata is reset once the app is restarted.

@danilofaria @BenjiFrank
Hope my reply on GitHub is somewhat helpful: "Created At" date updating after Re-Index · Issue #8556 · logseq/logseq · GitHub