Format pages-metadata.edn before saving

viktomas · September 11, 2021, 5:17pm

The logseq/pages-metadata.edn file is now formatted as a single line which means it’s impossible to track git history in the file (every change to any page causes the one line to change => full git rewrite of the file).

I suggest pretty-printing it before saving, ideally sorting the array and object keys deterministically. This will make the changes to the pages-metadata.edn limited to only the pages we changed instead of always changing the whole one line.

davclark · September 15, 2021, 11:00pm

This also makes it extremely hard to merge two git branches - which unfortunately happens a lot because logseq often fails to commit pages-metadata.edn on exit.

Jeff_Sparkes · September 22, 2021, 4:59pm

This problem makes it impossible to run logseq on two different machines if you’re using git to manage logseqdata. Sorting is definitely required since this is essentially a hash table dump.

davclark · September 28, 2021, 7:20pm

I have simply used the version from one branch, and no problems yet. I’m sure it introduces small issues, but I need to use this on multiple machines, so…

viktomas · October 2, 2021, 9:11am

I’ve created a PR with a first iteration of the formatting: enhance: pretty print pages-metadata.edn by viktomas · Pull Request #2909 · logseq/logseq

davclark · October 13, 2021, 5:13pm

I’m just now coming back to this - but I’m wondering also if there is data in pages-metadata.edn that is not somehow available via the markdown? If so, we could just NOT TRACK pages-metadata.edn in Git?

viktomas · October 13, 2021, 5:26pm

pages-metadata.edn contains timestams about when the page was created and updated and the page Markdown file doesn’t contain these timestamps. (AFAIK)

davclark · October 15, 2021, 6:24pm

Interesting… I agree, you’re not gonna get this easily from Git-sync’d files. And definitely not from some other forms of transport.

This is the kind of thing that you COULD track well enough with Git if Logseq would create a dependency on Git… but that seems like a bad idea also. However, it could be done in a more “git-compatible” way. We did this at Gigantum for tracking Jupyter and RStudio notebook activity. The idea is not so weird for a Clojure programmer or a Git programmer - you have the notion of a record where history is immutable - you simply leave previous records alone.

Of course, this might get expensive to crawl once you have a long history - but then something like pages-metadata.edn could be constructed and updated as the current state of the system.

Again, this is all stuff I’ve worked on in a more data science context. I’d be happy to do a design session and maybe even help implement. Not sure if it’d be worth it at this point (or how to suggest it where it’d be considered), but this feels like one weak point in an otherwise really tight system.

codekiln · January 15, 2022, 1:14am

Cross-linking pages-metadata.edn data structure does not scale with distributed workflow · Issue #3907 · logseq/logseq.

harmonia · June 29, 2022, 1:36am

Linking here some information from another discussion that might be useful as a temporary solution, for those like me that absolutely need the sync going. Looking forward to see a definitive enchancement tough