Why the database version and how it's going?

Hi everyone, we know a lot of people have questions about the coming database version, like why we’re developing it, and why it takes so long.

We apologize for spending almost all the time on developing it without communicating it well with the community. This post will try to answer some of those questions.

Please don’t hesitate to leave a comment if you have questions!

Context

  • Everyone loves plain-text files, and with tools like Git and Obsidian, we can use them in conjunction with Logseq. However, there are some limitations:

    • Building real-time collaboration on top of Markdown files is extremely challenging, for example:

      • Creating a new block requires rewriting the entire Markdown file.
      • Renaming a page updates all files that reference it.
    • The structure data support is limited compared to a database, lacking features like persistent IDs, timestamps, and more.

  • Templates and properties make it easy to add new books, papers, and more, but they’re difficult to maintain and collaborate on.

  • Our vision is to create a better environment for learning and collaboration. The current app falls short of our goals, with limitations including:

    • No web support (except for limited support on https://demo.logseq.com).
    • Data loss when using Logseq sync with multiple clients.
    • Poor performance with large graphs.
    • Unreliable undo functionality.
    • No built-in publishing support for pages.
  • We received so much love and support from you guys that it’s unacceptable that Logseq still loses data. We wanted to do better so we started to build a solid foundation for the future: the database version, the goals are:

    • Be stable, improved data stability, reliable undo/redo
    • Be performant, fast to open, fast to type
    • Be joyful, anyone can create any workflow with the new classes and properties
  • We’ve also decided to develop the new database version with real-time collaboration (RTC) in parallel, as implementing RTC with offline support is extremely complex. By considering RTC early in the design process, we can minimize the risks of having to change our implementation later on.

Challenges

  • Storage

    • The new database version should be accessible across multiple platforms, including Web, Electron, and Mobile.
    • It should be capable of handling large-scale data, effortlessly supporting up to 50,000 pages.
    • Your data should be safe, they should never be erased by browsers
    • To facilitate advanced querying, the new database version should offer support for Datalog queries.
    • Furthermore, it should provide flexibility by allowing users, plugins or even other apps to create custom classes and properties.
  • An intuitive UX for classes and properties

    • Writing should be a delightful experience
  • RTC should work offline

    • We’re committed to local-first, where users have full control over their data
  • RTC should support End-to-End encryption

    • Yes, privacy-first

Projects status (roadmap)

FAQ

  • Are you going to deprecate Markdown files support?

    • No, we’ll continue to support both file-based and database-based graphs, with a long-term goal of achieving seamless two-way sync between the database and markdown files. This will allow you to leverage the benefits of the database version while still being able to use other tools.
  • Why is it taking so long?

    • When we began, there was no existing solution that met our requirements for a persistent database, so we had to build one from scratch.
    • We initially explored CRDT for real-time collaboration with offline support, but ultimately found that current solutions didn’t meet our needs.
    • We spent significant time in refining the user experience for classes and properties.
    • Our goal is to ensure that the new database version doesn’t affect the existing version’s functionality.
  • Is the database version open-source?

  • Is the database version free?

    • Yes, all local features will be free to use. We’ll only charge for features that rely on our servers, such as real-time collaboration.

Future plan

  • We plan to start pre-alpha testing with the database version in 2 ~ 3 months, initially inviting a small group of users to help us improve it. As it becomes more stable, we’ll expand the testing group to include more users

  • We’ll also extend invitations to a select group of users and companies to test our real-time collaboration feature once it’s ready for feedback.

43 Likes

I’m delighted to hear this :slight_smile: … thanks for sharing!
Also hopeful for better markdown compatibility (ex: CommonMark)

9 Likes

For sure! Exporting to markdown will be improved a lot with the database version.

5 Likes

if there will be no “native” writing to the markdown files as a back-end of Logseq data I suppose that exporting/importing will be the only way to get into markdown the contents of the database or to read into the database the modified content of the markdown files. As import/export suggest a voluntary action I hope this will be configurable to be done automatically (at intervals or maybe even live?)…

3 Likes

We plan to experiment with two-way sync (either real-time or periodically) between the db and markdown files once the db is more stable.

8 Likes

I want to know if I can find a specific text and replacement in DB version . Now I finish it by editing the md file of logseq through VScode.

1 Like

Thank you for the insightful update. It is impressive to see the remarkable progress made so far, and I eagerly anticipate the release of the DB version. Best of luck!

3 Likes

:pray: :smiling_face_with_three_hearts:
Very nice!

This makes me excited.

I look forward to the future. I think it will be great!
I’m only a little nervous of the transition :stuck_out_tongue: but if the DB version is just as solid as the current version of Logseq, then I’m completely sold! I can’t think of not having Logseq in my life :heart:

9 Likes

In SQL-supporting databases it is possible to use some external client to execute a statement looking like this:

UPDATE blocks SET content = REPLACE(content, 'specific text', 'new text');
1 Like

Find and replace will be built-in.

10 Likes

Afaik the DB version is much more stable than the current version because we’ve been focusing on avoiding any issue that can result in data invalidation, we’ll also try our best to reduce any issues that can lead to data loss.

8 Likes

Thanks for the suggestion!

This doesn’t work for the db version because the graph data is stored as key-value pairs (id->serialized node in a tree) in a table, the safe way to update contents might likes this:

;; Get all blocks
(def blocks-with-content
  (d/q
    '[:find ?b ?content
      :where [?b :block/content ?content]]
    db))

;; Replace "From" with "To"
(def tx-data
  (map (fn [block-id content]
         (let [new-content (clojure.string/replace content "From" "To")]
           [:db/add block-id :block/content new-content]))
    blocks-with-content))
(d/tranact! conn tx-data)
2 Likes

I think that @zizhuo’s question was twofold:

  • Whether the database version will support replacing.
    • Which got a positive answer.
      • No good reason for not supporting it.
    • Actually Logseq could implement replacing even in its current version.
      • What editors like VScode do is relatively simple, though not always safe.
  • Whether in the database version will still be possible to directly edit the storage.
    • This ability is an important feature to many users.
      • Especially those coming from text-files alternatives.
      • Their worry is about getting too much dependent on Logseq’s implementation choices.
    • If Logseq’s new database is SQL-compatible, there should be a way to edit it directly, even if:
      • it takes more effort to manipulate the serialized data
      • it is unsafe in potentially breaking Logseq’s assumptions
    • The point is that, as long as Logseq remains open, the possibilities are plenty.
7 Likes

with the current implementation, if I modify a template or a custom command I can’t have all templates in the markdown-based “db” updated accordingly (if even for an additional property I inserted in a template block). With VSCode it’s quite easy to do regex and to refactor any amount of complex find and replace stuff. I wish -but don’t hold my breath- that Logseq will have an easy way for users to do regex outside of advanced queries (like right in the Search - Ctrl+k - command or in some sort of visual query builder geared towards non devs…

1 Like

Curious about who will be qualified or how you select “group of users”? Do the users have chances to apply for particapating?

Thanks for the question, we’ll start with our active contributors and sponsors.

Thanks for your quick reply!
I have no idea that whether i can be catorized as “actve contributors” (2 PRs, 1 merged and 1 WIP) or not, But as a heavy user of logseq, I really want to participate in early testing and am willing to explore and fix problems encountered during testing.

1 Like

Thank you for your contributions! :heart:
Can you send me a direct message with your email address? I’ll add you to the testers when it’s ready.

1 Like

I’m also very excited for the DB version as the improved stability will be very welcome. But as referenced in another comment, I’m a bit worried about lock-in. Will the new database be open source and allow for SQL queries? What’s the reason Logseq couldn’t go with a solution like SQLite?

Also, regarding sync, will that be one of the paywalled features that “rely on [Logseq] servers”? I’m wondering if there will be options for third-party syncing, possibly similar to how Joplin implements it (offering people the ability to host their own notes or pay to use Joplin Cloud). Or will people looking for free syncing have to stick with the markdown version?

1 Like

As a follow on to this, will this also be true for org file support?

4 Likes