Specification for public graph discovery. Decentralized social network on logseq

andrewzhurov · July 1, 2022, 2:30pm

That pushed me to realization, that we may have name-based addressing locally, only using content-addressing when we publish our graphs.
For that, we could have mapping between names and hashes.
And when we want to publish a part of our graph, mapping from :db/id to CIDs would be created, CIDed version would be uploaded.
And same goes the other way, when we are to download a part of somebody’s graph, we would save mapping between CIDs and :db/ids they resolved locally.
Huh…

I will give this thought another spin on a fresh head.

alex0 · July 1, 2022, 6:40pm

I’m not sure this address your concerns in general but maybe it could be good enough to fetch the content of a block from an online graph when referencing it, cache that content, provide some UI to later fetch again that block content and ask the user if they want to update the cached content?

andrewzhurov · July 2, 2022, 5:36pm

That sounds like a more generic description of the flow above.
As I give it more thought, it seems to be a fine solution.

To sum up:

local graph uses :db/ids
global graph uses CIDs (“frozen”)
on upload:
– frozen version is derived out of a live one (CIDs instead of :db/ids)
– CID to :db/id mapping saved locally (it is of no use outside of a local graph)
– frozen version uploaded
– versioning saved locally and globally (for others to discover that block 1.1 superseeds block 1.0)
on download:
– frozen version downloaded, derived live version (with :db/ids instead of CIDs)
– mapping saved locally
– versioning downloaded
– your blocks, that have a new version (e.g., uploaded from another device), are autoupgraded
– your blocks, that reference other’s block that has a new version, are proposed to be upgraded

Some reasoning on authorship and authorization below.

In order to track who’s the author we may have version history akin to git.

Why to track author?
- To autoupgrade your reasoning.
  - E.g., when you commit from two different devices, you want to keep your stuff up-to-date across those devices.
- To autoupgrade your reasoning that depends on your upgraded reasoning.
  - Why to upgrade dependent reasoning?
    - To have graph explicitly up-to-date.
  - Why to autoupgrade your dependent reasoning?
    - To save you time, as most likely your reasoning that you publish is up-to-date within itself.
      - Still, that could be an explicit action. One may not take into an account all dependent reasoning when making a change.
- For one to filter blocks, as there can be a mirriad of them, but you may want to see specific ones, e.g. from somebody specifically.
How to track author?
- With data model similar to git’s :commit/author, having commit signed by author’s key
  - I.e., having authorship on data itself.
  - Data and the fact that it has been authorized can be freely shared across peers.
- Content is authorized if it comes from author’s place (e.g., DNS, IPNS)
  - Not robust, since as soon as that place stops serving content - it’s no more authorized, and taking away stuff is not good.
  - Place needs to be always accessible.

alex0 · July 2, 2022, 10:48pm

What do you mean by “who is the author”? I thought we were talking about references between graphs, not collaborating on the same graph.

Basically when I see a reference to another graph I just expect to be able to follow a link and get to the block on the original graph. Then how that graph was made (it’s a personal graph, the documentation of some project, etc) doesn’t matter, I would eventually figure out who is the person/organization that made the graph.

Instead if you are looking for collaboration on the same graph I would suggest to look into the Matrix protocol since it’s focused on ensuring consistency of data that users of a decentralized network can edit with a system of permissions.

Examples:

andrewzhurov · July 3, 2022, 1:54am

By “reference” I mean a usual logseq link that leads to a page, created via [[]], to a block, created via (()) and a link from a child block to a parent block (based on that hierarchy of blocks are rendered).
The thing is that a reference leads to data about a block, not an html page, and this data can be interpreted by a logseq app to render a page.
It’s like publishing source of your graph in, say, .md, but a step further, as we publish logseq data model (in, say, json, or whatever).

Example logseq data.

Such a graph

* A
** AA
** ((AA))

is represented in ~ such a logseq data

{:db/id <A>
 :block/content "A"}
{:db/id <AA>
 :block/content "AA"
 :block/parent <A>}
{:db/id <(AA)>
 :block/content "((AA))"
 :block/parent <A>
 :block/refs [<AA>]}

So now we use references to ref both to your blocks and blocks of others.
And, in order to display other’s blocks, they are being downloaded and are of no difference than yours.

Or should they differ?
Perhaps you wouldn’t like referencing other’s block, walking through that block into other’s graph, walking around that interesting graph, and having other’s blocks that you walked be listed in your graph, no different from yours.

A solution is to track authorship of a block.

In context of linking to other’s reasoning we need to be sure that it won’t change, as that may corrupt your reasoning.
Matrix permits change, that is why I think it is not the best fit for the job.

Listed Matrix projects are interesting, I haven’t know of them, thanks.

alex0 · September 12, 2022, 12:18pm

Hey there, maybe in the meanwhile we could start an initiative to “import” other remote graphs/pages in our personal one, like the following:

Alice Graph
  /pages
  ...

is published as usual at alice.com.

Bob run a simple script that

downloads Alice’s /pages folder
edit its files names by adding alice.com%3A (to create a namespace)
edit the references in those files by adding alice.com/, for example [[alice.com/Something]]
move those files in Bob’s graph in a subfolder like /pages/alice.com
check alice.com for updated pages and eventually download and edit them again

Eventually, if Alice adopted some kind of standard, her pages could have properties like

page-authors:: [[me@alice.com]]
publishing-date:: 2022-02-02

that are not edited by the script so that Bob can for example query for pages written by [[me@alice.com]].

Not very powerful but at least a simple script and a standard could be enough while a proper solution is discussed and developed.

sindoc · November 1, 2022, 11:03pm

Hello, I’m a new member of your community and became fascinated by this topic.

@Alex_QWxleA’s answer, back on June 20th really attracted my attention. I also noticed that the discussion goes in a different direction than the one I would have hoped, so here’s me reviving some of the old convos. This is all in good faith, as a humble member of your community as I am thankful for giving me Logseq.

Maybe some big picture notes on sharing structured graphs, where it could be useful and its current pitfalls with Logseq’s current seralisations:

Although, I doubt sharing structured graphs becoming even remotely mainstream in the foreseeable future. As in, would they be adopted at the level of current social networks? While that would be awesome by the way, it just doesn’t feel like humanity can handle such a big task at the moment. The only thing that might push some communities to take position against this recent (as of Nov 1, 2022) question of how to tap into the wisdom of the crowd, as rapidly as possible. Then, I wonder, what can Logseq’s role be, in that. And frankly, it seems to big of a battle to position Logseq as a social networking platform, as some of the examples mentioned here, indicate.

Those solutions won’t scale, and I mean functional scaling in this case. No particular vocabulary, grammar, or model is governing those properties, etc., which means that there’s no gurantee that a meaningful message sent from a graph to another can easily be misinterpreted. If Logseq does want to do what @Alex_QWxleA is saying here, you may want to start looking into thinking of a neat way of mapping each Logseq block to an RDF resource description. Semantic Web was a mouvement that took a while to mature but the formalisms they have come up with by now, seem colossal, and they are being adopted by the elite in the enterprise world as well.

SKOS can be a good first example I think. It’s one of the simplest, yet very useful ontologies that have been developed. Essentially, it provides a very simple yet powerful language for organising concepts, by terms used to label them or taxonomies used to categorise them. One of the direct applications of this is a person’s ability to easily document their view on existing concepts for instance. Take Wikipedia as an example. Wikipedia editors aren’t supposed to reflect their personal views in encyclopedia articles. However, when a mere mortal knowledge artist is publishing a personal graph, well, they may want to refer to concepts that are already nicely defined in Wikipedia in a much more native way than it is possible today, and then tell their story, whatever it maybe about those concepts.

Any information system has master, reference, transactional, analytical, and operational data, correct? More or less. Well, Logseq can be the platform that is used to create transactions (each block being a transaction), where the master and reference data used to document those transactions may come more naturally for the shared resources on the Web via URLs, which then links the power of Logseq, which is in its ability to create structured content to the power of Semantic Web, that has been building up for years and there are enough decent implementations out there, to start considering its usage.

Sorry for the long message. I hope you’ll find the time to read it.

And BTW, you see how I said Logseq is responsible for creating those transactions, well, then social networks such as those that implement an open protocol like ActivityPub, well can be used to broadcast those transactions using RSS, another standard that is nicely linked to Semantic Web standards.

alex0 · November 3, 2022, 11:52pm

Welcome!

To be clear: “social networks” strictly speaking doesn’t imply microblogging platform with an ephemeral stream of content like Twitter, Facebook etc.

Of course a Logseq-based network would look much more like a decentralized wiki(pedia).

I don’t think ActivityPub would be useful for Logseq since it’s more about streams of content that you probably won’t edit later. It’s RSS on steroids.

I suggested Matrix protocol because it’s de facto a decentralized database where users on different servers can access and edit data with a sort of built-in ACL (access control list). Indeed I think what’s really important is being able to give users on other servers the rights to read/write portions of our graphs.

The point of Matrix is keeping the data consistent even if portions of the network split and rejoin later. Plus Matrix comes with e2e encryption.

If you think about Matrix as an instant messaging protocol like XMPP you are looking at it from the wrong perspective, it can be a new general-purpose layer on top of the Internet stack providing features we generally see in silos, like logging in and interact with other users of the same platform, but decentralized and federated.

andrewzhurov · November 14, 2022, 10:51am

+1 to the idea of using blocks to express Semantic Web data.
Blocks are powerful for that they can hold both human-friendly text and computer-friendly props,
allowing both humans and computers to reason on them.
And we want that computer assistance.
Semantic Web is a great idea and fits perfectly with the design of blocks, as they resemble a direct labeled graph by referencing other blocks, just like Semantic Web’s model.

I like the metaphor of distributed wiki.
I believe one important trait of such wiki is immutability,
as we scale to the size of the Web we better to not have link rot in the design.
Likily, it’s possible to have an immutable Semantic Web graph by using content-hashes to identify blocks. That way we have an immutable direct acyclic graph, and it’s possible to host it on top of content-addressable storages, such as IPFS, GNUnet, or use a CDN, Matrix, Solid, or all at once - the more the merrier.
Due to blocks being content-addressed we can get ‘trust’ ouf of equation - we care little where data for a content-hash is coming from, because we can verify that it’s exactly what we asked by comparing hashes.
Same applies for ways to distribute data, we can use any number of protocols we like - it can be p2p, via ActivityPub, RSS, Solid, IPFS, OrbitDB, Matrix - you name it.=)

sindoc · November 14, 2022, 2:51pm

I love the way you think and surely, so many others here in the Logseq community, whom I am looking forward to hearing as well.

I think the protocol discussions are way too early, especially there are other things the Logseq community can think of. The thing about Logseq is that for me, as a mere bystander, Logseq is how HTTP/HTML should have been designed in the first place. Somehow, it didn’t happen and we are where we are.

So I think there’s an opportunity to first re-introduce the simple concept of a website to people and re-introduce it with Logseq as the starting point. Simply due to the exact ability that you yourself mentioned: easy input of structured data.

In fact, Logseq may want to look at two aspects: the workflow side of things and closer integration with the Unix platform. Logseq is client-based and can be tied to the power of the command line. Another aspect Logseq can look into, potentially, is ways to consume the data that is produced inside of it. The query concept is very powerful but we should easily be able to turn queries into tables, tile views, etc.

So, as you can see, I think there’s an immense opportunity ahead to let people know that anyone can now have an extremely nuanced website. At least people like me will join. I am a programmer in the sense that I can use an API if it’s created by responsible engineers. As soon as a library becomes experimental or poorly designed, people like me cannot use them. Logseq on the other hand is compared (at least for me) with Excel, as a piece of software that runs the risk of being adopted by a larger audience that the usual geeks that adopt however difficult tech in our industry. Kudos to them but what about us who are not that technical.

andrewzhurov · November 15, 2022, 7:28am

In fact, Logseq may want to look at two aspects: the workflow side of things

Agree, UX could be improved, and that would be of big value for users.

closer integration with the Unix platform

Going into the direction of OSes gets us in a bog-land, because software reproducibility is bollocks.
I.e., it’s darn hard to get environment set up for software to run reproducible across different machines.
There is Docker, which is meh. There are Nix and Guix which are pretty good and they push into this direction, but still have their limitations. So it’s hard to rely on OS-level software, which in turn would make OS-level features of Logseq unreliable.
Personally, I would prefer Logseq deepen towards the Web direction first.

Another aspect Logseq can look into, potentially, is ways to consume the data that is produced inside of it.

Agree, having computer assistance on knowledge we create is of immense value.
And we can get a whole new level more out of it by interlinking our graphs and the exsting Semantic Web.

Publishing Logseq graph as a website is already there.
It’s a nice feature, but such a website will be yet another out there, disjoined from the rest.
I think Logseq can be made into a way more powerful tool - a Semantic Web browser, where we would build one distributed wiki, having data and text coaligned for computers and humans to reason on it.
Where such wiki is to be stored and how it is to be distributed is of less importance.

Publishing UX can be as scary as ticking ‘auto-publish’ checkbox, and having your public blocks published out there when you’re finished updating them.
Discovery of content made by others can be done in numerous ways, to name a few:

getting notified when your block got referenced by somebody
when browsing the Web being shown blocks that refer to current page (ala Hypothes.is)
browsing your friends’ graphs
browsing some public graphs
browsing existing Semantic Web graphs

The more we interlink the easier it is to travel this Semantic Web.

ddavo · December 4, 2022, 8:48am

About the unix tools thing, the fact that notes are just markdown is a huge thing.

I’ve been able to mass-process some things just using sed & awk, but you could use perl or python. But it would be much better if logseq exported the graph somehow with a kind of API, even if its only usable when logseq is open.

alex0 · December 4, 2022, 1:11pm

From my understanding, since Logseq is written in ClojureScript, it is compiled into JavaScript that is supposed to be run by a browser.

But it seems you can use Clojure(Script) with a command line interface using a tool called Babashka and this is what logseq-query (lq command) does:

Somehow it can connect to your Logseq graphs without running Logseq and perform queries. It’s already very useful and it can be combined with the usual Bash/Python/etc scripts.

Now what would be nice is the same with other Logseq features, not only queries.

Even better would be a library (maybe in C or Rust for max compatibility) that could be used from any programming language to manipulate Logseq files and entities programmatically without any JavaScript (or other interpreted languages) involved.

For instance I spent some time trying to write a parser for properties:: (to read and write their values programmatically) but I failed because I’m not very used to this.

andrewzhurov · December 5, 2022, 9:02am

You raise a good use-case - programmatic manipulation on top of our notes, that’s powerful and we want that. And a good problem - that dealing programmatically with text is a pain.

So we would want data (e.g., Logseq’s inner representation of our text notes) to be programmatically accessible.
One way would be to have Logseq serve it via an API. It’s a common approach.
Another would be to export Logseq data, in JSON or EDN.
But yet another way, that seems so appealing to me, would be to derive a Semantic Web representation of Logseq’s data. Then it could be queried with SPARQL (akin to Datalog (that DataScript uses), but able to perform queries across the whole Semantic Web, not just local dbs). Also, it can be serialized as JSON for those cases where we don’t need SPARQL queries. And JSON can be dealt with any programming language out there. And another dope thing is that we wont be limited in access to only our knowledge graph, but will have access to graph of others and the rest of the Semantic Web, building one interconnected graph of knowledge. ^.^

I agree that having programming access to our data is of huge value, and it’s more valuable the more data there is. Integrating our graphs into the Semantic Web would be like merging our lakes of data into the ocean.

andrewzhurov · December 5, 2022, 9:31am

To have our notes as data would be a dream! Then indeed we can work on them programmatically from whatever language we prefer.

Good pointer to lq.
From what I reckon, atm Logseq’s graphs are stored as ~/.logseq/graph1.transit (.transit is a serialization of EDN).
lq runs in NodeJS, reads a graph and feeds it to DataScript engine. For this use-case .transit is ideal.
For reach from other languages .transit is… not that accessible. Having it in JSON would give us way broader reach.

To have our notes in a semantic form, such as JSON, and have a client of your choise to work with it (e.g., Logseq), yet having programmatic access to it to do whatever we wish with it - that sounds very appealing.

alex0 · December 5, 2022, 10:41am

FYI Tienson did an experiment about using EDN instead of Markdown/Org.

andrewzhurov · December 5, 2022, 12:03pm

Interesting, more info?

alex0 · December 5, 2022, 12:20pm

I saw this tweet by Tienson:

https://twitter.com/tiensonqin/status/1583170757823430657

Sorry, I have not any other information, as I said, it seems just an experiment.

Edit: found the branch

shutosha · December 6, 2022, 2:20am

It is very important … not only from the knowledge sharing perspective but also to provide ready reference graphs to new users (like me) who wish to use Logseq for publishing perspective. In fact, I landed on https://demo.diasporamemory.com/#/page/Diasporic%20Memory , which inspired me to explore this area … my initial attempts , with just couple days of work is at https://shutri.com

I fully realize that Logseq is primarily targeted to mine your mind - notes , todos , journals etc … but that doesn’t stop it from being best publishing platform as well. As regards to social features like “follow” or “share” - they are sure important but they are not a MUST have to get started …

Maybe - it is just a thread, in this community (or elsewhere), where expert hosted graphs are listed with basic instructions on features , and how to use them as a reference template …

candide · December 6, 2022, 11:12pm

Hey @shutosha : Digitized Diasporic Memory is actually my graduate thesis project!!! I’ve been following this thread quietly and was pleasantly surprised to see it mentioned here. Thank you for the shoutout! Happy to hear that it inspired you.
If you’d like to reach out to chat some more, let me know.