Why the database version and how it's going?

tienson · April 29, 2024, 8:51am

Hi everyone, we know a lot of people have questions about the coming database version, like why we’re developing it, and why it takes so long.

We apologize for spending almost all the time on developing it without communicating it well with the community. This post will try to answer some of those questions.

Please don’t hesitate to leave a comment if you have questions!

Context

Everyone loves plain-text files, and with tools like Git and Obsidian, we can use them in conjunction with Logseq. However, there are some limitations:
- Building real-time collaboration on top of Markdown files is extremely challenging, for example:
  - Creating a new block requires rewriting the entire Markdown file.
  - Renaming a page updates all files that reference it.
- The structure data support is limited compared to a database, lacking features like persistent IDs, timestamps, and more.
Templates and properties make it easy to add new books, papers, and more, but they’re difficult to maintain and collaborate on.
Our vision is to create a better environment for learning and collaboration. The current app falls short of our goals, with limitations including:
- No web support (except for limited support on https://demo.logseq.com).
- Data loss when using Logseq sync with multiple clients.
- Poor performance with large graphs.
- Unreliable undo functionality.
- No built-in publishing support for pages.
We received so much love and support from you guys that it’s unacceptable that Logseq still loses data. We wanted to do better so we started to build a solid foundation for the future: the database version, the goals are:
- Be stable, improved data stability, reliable undo/redo
- Be performant, fast to open, fast to type
- Be joyful, anyone can create any workflow with the new classes and properties
We’ve also decided to develop the new database version with real-time collaboration (RTC) in parallel, as implementing RTC with offline support is extremely complex. By considering RTC early in the design process, we can minimize the risks of having to change our implementation later on.

Challenges

Storage
- The new database version should be accessible across multiple platforms, including Web, Electron, and Mobile.
- It should be capable of handling large-scale data, effortlessly supporting up to 50,000 pages.
- Your data should be safe, they should never be erased by browsers
- To facilitate advanced querying, the new database version should offer support for Datalog queries.
- Furthermore, it should provide flexibility by allowing users, plugins or even other apps to create custom classes and properties.
An intuitive UX for classes and properties
- Writing should be a delightful experience
RTC should work offline
- We’re committed to local-first, where users have full control over their data
RTC should support End-to-End encryption
- Yes, privacy-first

Projects status (roadmap)

Database [85%]
- Trello
RTC [70%]
- Trello

FAQ

Are you going to deprecate Markdown files support?
- No, we’ll continue to support both file-based and database-based graphs, with a long-term goal of achieving seamless two-way sync between the database and markdown files. This will allow you to leverage the benefits of the database version while still being able to use other tools.
Why is it taking so long?
- When we began, there was no existing solution that met our requirements for a persistent database, so we had to build one from scratch.
- We initially explored CRDT for real-time collaboration with offline support, but ultimately found that current solutions didn’t meet our needs.
- We spent significant time in refining the user experience for classes and properties.
- Our goal is to ensure that the new database version doesn’t affect the existing version’s functionality.
Is the database version open-source?
- Yes, you can find it on GitHub:https://github.com/logseq/logseq/pull/9858.
Is the database version free?
- Yes, all local features will be free to use. We’ll only charge for features that rely on our servers, such as real-time collaboration.

Future plan

We plan to start pre-alpha testing with the database version in 2 ～ 3 months, initially inviting a small group of users to help us improve it. As it becomes more stable, we’ll expand the testing group to include more users
We’ll also extend invitations to a select group of users and companies to test our real-time collaboration feature once it’s ready for feedback.

FlorianF · April 29, 2024, 9:01am

I’m delighted to hear this … thanks for sharing!
Also hopeful for better markdown compatibility (ex: CommonMark)

tienson · April 29, 2024, 9:09am

For sure! Exporting to markdown will be improved a lot with the database version.

FlorianF · April 29, 2024, 9:34am

if there will be no “native” writing to the markdown files as a back-end of Logseq data I suppose that exporting/importing will be the only way to get into markdown the contents of the database or to read into the database the modified content of the markdown files. As import/export suggest a voluntary action I hope this will be configurable to be done automatically (at intervals or maybe even live?)…

tienson · April 29, 2024, 10:02am

We plan to experiment with two-way sync (either real-time or periodically) between the db and markdown files once the db is more stable.

zizhuo · April 29, 2024, 10:32am

I want to know if I can find a specific text and replacement in DB version . Now I finish it by editing the md file of logseq through VScode.

Bader · April 29, 2024, 10:35am

Thank you for the insightful update. It is impressive to see the remarkable progress made so far, and I eagerly anticipate the release of the DB version. Best of luck!

Siferiax · April 29, 2024, 10:37am

Very nice!

This makes me excited.

I look forward to the future. I think it will be great!
I’m only a little nervous of the transition but if the DB version is just as solid as the current version of Logseq, then I’m completely sold! I can’t think of not having Logseq in my life

mentaloid · April 29, 2024, 11:27am

In SQL-supporting databases it is possible to use some external client to execute a statement looking like this:

UPDATE blocks SET content = REPLACE(content, 'specific text', 'new text');

tienson · April 29, 2024, 11:31am

Find and replace will be built-in.

tienson · April 29, 2024, 11:36am

Afaik the DB version is much more stable than the current version because we’ve been focusing on avoiding any issue that can result in data invalidation, we’ll also try our best to reduce any issues that can lead to data loss.

tienson · April 29, 2024, 11:45am

Thanks for the suggestion!

This doesn’t work for the db version because the graph data is stored as key-value pairs (id->serialized node in a tree) in a table, the safe way to update contents might likes this:

;; Get all blocks
(def blocks-with-content
  (d/q
    '[:find ?b ?content
      :where [?b :block/content ?content]]
    db))

;; Replace "From" with "To"
(def tx-data
  (map (fn [block-id content]
         (let [new-content (clojure.string/replace content "From" "To")]
           [:db/add block-id :block/content new-content]))
    blocks-with-content))
(d/tranact! conn tx-data)

mentaloid · April 29, 2024, 12:27pm

I think that @zizhuo’s question was twofold:

Whether the database version will support replacing.
- Which got a positive answer.
  - No good reason for not supporting it.
- Actually Logseq could implement replacing even in its current version.
  - What editors like VScode do is relatively simple, though not always safe.
Whether in the database version will still be possible to directly edit the storage.
- This ability is an important feature to many users.
  - Especially those coming from text-files alternatives.
  - Their worry is about getting too much dependent on Logseq’s implementation choices.
- If Logseq’s new database is SQL-compatible, there should be a way to edit it directly, even if:
  - it takes more effort to manipulate the serialized data
  - it is unsafe in potentially breaking Logseq’s assumptions
- The point is that, as long as Logseq remains open, the possibilities are plenty.

FlorianF · April 29, 2024, 1:00pm

with the current implementation, if I modify a template or a custom command I can’t have all templates in the markdown-based “db” updated accordingly (if even for an additional property I inserted in a template block). With VSCode it’s quite easy to do regex and to refactor any amount of complex find and replace stuff. I wish -but don’t hold my breath- that Logseq will have an easy way for users to do regex outside of advanced queries (like right in the Search - Ctrl+k - command or in some sort of visual query builder geared towards non devs…

liuhancheng.cn · April 29, 2024, 1:23pm

Curious about who will be qualified or how you select “group of users”? Do the users have chances to apply for particapating?

tienson · April 29, 2024, 1:25pm

Thanks for the question, we’ll start with our active contributors and sponsors.

liuhancheng.cn · April 29, 2024, 1:31pm

Thanks for your quick reply!
I have no idea that whether i can be catorized as “actve contributors” (2 PRs, 1 merged and 1 WIP) or not, But as a heavy user of logseq, I really want to participate in early testing and am willing to explore and fix problems encountered during testing.

tienson · April 29, 2024, 1:37pm

Thank you for your contributions!
Can you send me a direct message with your email address? I’ll add you to the testers when it’s ready.

SpiderMatt · April 29, 2024, 1:43pm

I’m also very excited for the DB version as the improved stability will be very welcome. But as referenced in another comment, I’m a bit worried about lock-in. Will the new database be open source and allow for SQL queries? What’s the reason Logseq couldn’t go with a solution like SQLite?

Also, regarding sync, will that be one of the paywalled features that “rely on [Logseq] servers”? I’m wondering if there will be options for third-party syncing, possibly similar to how Joplin implements it (offering people the ability to host their own notes or pay to use Joplin Cloud). Or will people looking for free syncing have to stick with the markdown version?

Jordan_Garrison · April 29, 2024, 2:09pm

As a follow on to this, will this also be true for org file support?