How important is pages-metadata.edn?

I have noticed this single-line file that keeps growing as I add pages and does not seem to contain more information than creation and modification timestamps for my pages. I am wondering how important it is for logseq to work properly, i.e. would I lose any information if I removed it and re-indexed my graph?

I am asking this because I use Syncthing to share my data across various machines and this file often gets conflicts. Also I suspect it could cause some data loss if some page gets updated in the background and the metadata does not match anymore. Basically I am thinking of asking Syncthing to ignore it so each machine works with its own file, but first would like to understand a bit better what it is for to avoid breaking something.

This is very closely related to the issues discussed in Format pages-metadata.edn before saving.

FWIW, I have stomped on versions of this file by simply picking one version in a Git merge and never had any problems. As we discuss in that linked suggestion, however, there is no way to reliably get that creation / modification info back. This will presumably mess up things like your recent pages list, etc. But your notes all keep working AFAICT.

2 Likes

Thanks for the answer, the introduction of newlines in that file has made it much easier to manage with version control, so my initial problem is mostly moot thanks to that. :+1:

One thing that puzzles me a bit about this file though is that creation/modification dates should already be provided by the filesystem (e.g. ctime/mtime on *nix). So why duplicate it. Having everything in one file might also not scale well, I suppose the whole file is rewritten every time another file is modified to update the metadata, which scales linearly with the number of files on disk.

While I can’t speak for the devs here, file creation and modification data is not reliable across arbitrary approaches to sync across machines (simple example - someone might use cp -r, but it’s also impossible to track behavior of the many systems for networked files). So, I grok why a “it’s just files” approach like logseq might not want to rely on filesystem metadata.

I’ve also noticed that pages-metadata.edn is responsible for many of my merge conflicts. So far, I’ve arbitrarily chosen the RHS or LHS side, but I suspect that in time, there may be information loss with this approach. Is there a guide somewhere on how to properly set up and use Git with more than one copy of Logseq Desktop?

See also pages-metadata.edn data structure does not scale with distributed workflow · Issue #3907 · logseq/logseq.