Is there still a bi-directional approach of DB-Markdown or only export to Markdown remains?

FlorianF · March 28, 2024, 11:49am

UPDATE: Why the database version and how it’s going on?

I am a big fan of open-format text-based databases, not sure if for any good reasons. But I would like to be able to access my journals and knowledge base from Logseq as well as from other apps/scripts/LLMs.

Placing all my data inside a database might be ok from some perspectives but I don’t really like it if it drops this bi-directional approach of automatically saving all my data (including the metadata I have set myself, maybe not the metadata of plugins, etc) to an open text-based format -be it AsciiDoc, Markdown, reST, even JSON like with MongoDB maybe- at least at set intervals and also having a way to re-read (also at set intervals) the text files for any update that came via a different means than using Logseq GUI.

While I am not very optimistic about this I had to ask maybe I get some good news somehow

mentaloid · March 28, 2024, 1:05pm

The stated goal is to:
- make the database the so-called “single source of truth”
  - Big gains in speed, reliability and features.
- support import/export to files in various formats
  - Only to the degree that these formats are compatible.
  - Auto-import/export in intervals is feasible but secondary.
The degree and timing of achieving that goal remains to be seen.

FlorianF · March 28, 2024, 2:18pm

ok but do we know the nature of this database? Will it be SQL/NoSQL? Will it be something related to Datomic or maybe even something like MongoDB/JSON?

mentaloid · March 28, 2024, 4:40pm

Ideally, this should be an implementation detail.
In practice however:
- the choices are (very) limited to databases that are:
  - primarily: public domain, stable, serverless, mobile (for both android and ios)
  - secondarily: without dependencies, established, encryptable, with small footprint
- the currently used db is here

alex0 · March 28, 2024, 5:02pm

AFAIK Logseq will keep using Datascript, a reimplementation of Datomic, but on top of SQLite, storing graph content there instead of MD/Org files.

francescob · April 4, 2024, 4:19pm

Will this make sync through cloud services (gdrive, dropbox etc.) dangerous?

mentaloid · April 4, 2024, 4:28pm

The danger is now, as everyone tries to update the files ignoring everyone else. When the database becomes the point of reference, only Logseq will be updating it directly, thus having true control of the updating process.

francescob · April 4, 2024, 4:40pm

I would argue that’s the opposite, given how cloud sync works (not Logseq sync), once you have a single file being updated and the in memory state differs from multiple devices, the whole database will be replaced, potentially erasing recent changes.

Main concern is that Logseq sync seems to still have a few rough edges (that’s why I haven’t enabled it), but once there is a database, all the data will live there, instead of having separate files. With separate files if data is lost it will be limited to some data, but with a database, all data is lost at once

Picture me very concerned

mentaloid · April 4, 2024, 4:53pm

I’m not sure what you mean. To sync the database itself as a single file? That would defeat the whole effort. In my understanding, syncing will use some intermediate files, letting Logseq control what gets updated in the database and when.

francescob · April 4, 2024, 5:44pm

Let me try to explain.

Right now, logseq data is stored in many small files.
There are 3 ways to sync these files:

Git
Cloud (Dropbox/Google Drive/Nextcloud etc.)
Logseq Sync

I’ll skip Git since I never used it for Logseq, so I’m not familiar with how it would work.

In case of Logseq Sync, there are some rough edges, various people reported data loss, so I have to avoid it until it’s stable.
Cloud sync is based on one assumption: edit things only on one side, because the “file reload” might be late and what’s in memory might be outdated.

In a bad case scenario where the user has Logseq running on 2 machines and writes in one and then right through the other, with multiple files the worst case scenario is data loss for that one file that was edited.
With Sqlite however, the database is a single file, so the data loss is way more dangerous, since it could involve corruption too (and as such, complete data loss).

I’m looking forward the migration to a database, it makes a lot of sense, but I’m also very stressed since my notes are really important to me.

mentaloid · April 4, 2024, 6:07pm

Sounds like rewriting the whole database on every single change.
- I don’t think that this is how it is meant to be used.
In any case, taking backups of everything important is the only sane approach.
- This is true for every software.
- Losing even a tiny important file is not an acceptable loss.

Zvi_Boshernitzan · April 4, 2024, 10:27pm

Is the primary driver for db as ground truth about issues of filesystem concurrency making it hard to keep the bugs out?
Or is it primarily about the semantics of logseq not being restricted to what is representable cleanly in markdown?

mentaloid · April 4, 2024, 10:45pm

In my understanding, the primary driver is to support advanced features (real-time collaboration etc.) Granted that advanced features are worthy only if their quality is acceptable.

alex0 · April 6, 2024, 10:16am

Sqlite is one of the best ways to store data and preferred to plain text files for both performance and reliability.

There is even a compressed format (like Zip, Rar, etc) based on Sqlite that is very performant:

alex0 · April 6, 2024, 10:22am

In theory one could write a program that reads the SQLite DB, uses graph traversing algorithms with Logseq DB schema and mount a virtual file system that every application can access. For sure this is possible as read-only, I am not sure how much feasible is to edit the files and so the DB.

FlorianF · April 6, 2024, 6:15pm

Ok, for markdown we have Hugo (at least) to take a bunch of files and make them a webpage / intranet website /etc. I would be happy with SQLite if there was a reliable way to “publish” journals & pages as webpages. There are for sure CLI ways to alter and read the db file so it now becomes better suited for tehnically-inclined people outside the app itself. From what I can find online, there are way less tools to work on SQLite database than there are tools to work on Markdown files.

alex0 · April 6, 2024, 6:24pm

It’s because each SQLite file has its own schema. By knowing the schema, one can write SQL queries to extract specific data.

Reading and writing a SQLite file is easier to do in a programmatic way because it’s what DBs are all about: storing data for programs. Instead plain text files like Markdown need to be parsed and the fact you can implement parsing and features on top of it in different ways, leads to many different tools. With SQLite you just need the official SQLite command line tool (or a library for your programming language of choice) and the queries specific for your SQLite DB.

FlorianF · April 6, 2024, 7:01pm

Yes, that’s what I meant saying that it narrows down outside-logseq usage to devs, compared to Markdown. There are some ways to use it for laymen also but they are very niche and the tools are not at all as mainstream as Markdown editors.

For one, I would be interested to publish SQLite to a webpage like I do now for Markdown with Hugo. Also I am interested in having a way for an LLM to interact with an SQLite database in a RAG paradigm. I would like this also from within Logseq but also from outside, for both future-proofing and functionality - maybe a voice-based way to “discuss” with the database.

All in all, SQLite seems like a good idea, I just need to see what and how it stores the data inside so querry results are useful if done from outside…

alex0 · April 6, 2024, 8:40pm

What I mean is that even if Logseq won’t support Markdown export, someone will write it by reading the SQLite file and it should be more reliable than writing a parser that turn Logseq’s custom Markdown file into standard Markdown.

Aman9das · April 7, 2024, 12:23pm

Auto-import/export in intervals would be great!