Format pages-metadata.edn before saving

The logseq/pages-metadata.edn file is now formatted as a single line which means it’s impossible to track git history in the file (every change to any page causes the one line to change => full git rewrite of the file).

I suggest pretty-printing it before saving, ideally sorting the array and object keys deterministically. This will make the changes to the pages-metadata.edn limited to only the pages we changed instead of always changing the whole one line.

This also makes it extremely hard to merge two git branches - which unfortunately happens a lot because logseq often fails to commit pages-metadata.edn on exit.

This problem makes it impossible to run logseq on two different machines if you’re using git to manage logseqdata. Sorting is definitely required since this is essentially a hash table dump.

I have simply used the version from one branch, and no problems yet. I’m sure it introduces small issues, but I need to use this on multiple machines, so…

I’ve created a PR with a first iteration of the formatting: enhance: pretty print pages-metadata.edn by viktomas · Pull Request #2909 · logseq/logseq

I’m just now coming back to this - but I’m wondering also if there is data in pages-metadata.edn that is not somehow available via the markdown? If so, we could just NOT TRACK pages-metadata.edn in Git?

1 Like

pages-metadata.edn contains timestams about when the page was created and updated and the page Markdown file doesn’t contain these timestamps. (AFAIK)

1 Like

Interesting… I agree, you’re not gonna get this easily from Git-sync’d files. And definitely not from some other forms of transport.

This is the kind of thing that you COULD track well enough with Git if Logseq would create a dependency on Git… but that seems like a bad idea also. However, it could be done in a more “git-compatible” way. We did this at Gigantum for tracking Jupyter and RStudio notebook activity. The idea is not so weird for a Clojure programmer or a Git programmer - you have the notion of a record where history is immutable - you simply leave previous records alone.

Of course, this might get expensive to crawl once you have a long history - but then something like pages-metadata.edn could be constructed and updated as the current state of the system.

Again, this is all stuff I’ve worked on in a more data science context. I’d be happy to do a design session and maybe even help implement. Not sure if it’d be worth it at this point (or how to suggest it where it’d be considered), but this feels like one weak point in an otherwise really tight system.

3 Likes

Cross-linking pages-metadata.edn data structure does not scale with distributed workflow · Issue #3907 · logseq/logseq.

1 Like

Linking here some information from another discussion that might be useful as a temporary solution, for those like me that absolutely need the sync going. Looking forward to see a definitive enchancement tough