Specification for public graph discovery. Decentralized social network on logseq

shutosha · December 8, 2022, 8:09pm

Wow …great to meet you ! Please keep up the good work …

gww · December 9, 2022, 10:25am

Regarding implementation, I think that there was a lot of good work done in semantic web Semantic Web - Wikipedia space, and to me combining two (logseq with tripes, and urls to be entities in RDF sense) makes perfect sense when thinking about multi graph or federations in logseq.

(potentially related topic: RDF/JSON-LD/Tripple Store/Schema.org, etc )

Erich_Greenebaum · January 6, 2023, 8:39pm

Hi all, I just replied to @andrewzhurov on that referenced thread on this subject and just read over the latest posts here, so wanted to follow up here.

Exactly so, and that is basically what JSON-LD is designed for. To start, I think it might be as simple as the sketch I offered on the other thread for decorating property blocks with JSON-LD @context and @type tags at the system level.

From the schema.org homepage:

Schema.org vocabulary can be used with many different encodings, including RDFa, Microdata and JSON-LD. These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model. Over 10 million sites use Schema.org to markup their web pages and email messages. Many applications from Google, Microsoft, Pinterest, Yandex and others already use these vocabularies to power rich, extensible experiences.

Beyond providing a shared namespace of types which benefit from relational composability, there are huge wins baked in, including making real graph queries across federated graphs, a path to being indexed by traditional search engines, and so on. In short, JSON-LD is the lingua franca of linked data on the web these days, and is isomorphic to standard RDF triples.

Having gone there, I’m going to take one more swing at my Fluree pitch to support a lot of the goals I read through in this thread. Andrew had expressed (in the other thread) that while Fluree would be a fairly straight forward engine to drop in, but come with added complexity that seemed misplaced in the application. Given the discussion here, I beg to disagree.

I’ll preface this by recognizing Logseqs origins and mission are around being a second brain and is a more inwardly focussed mindset with a more ah hoc evolution that doesn’t lend itself to strong typing. Yet, to be interoperable with the semantic web, type consistency is needed.

Anyway, I realize the notion of shared editing of graphs may seem sacrilegious, yet, I need to share my knowledge and not by rendering out graph segments and hosting them as webpages, and I actually do want to co-edit knowledge graphs. And even further, I want to have granular access control on the graph. And to be clear, I get that isn’t/wasn’t the target use-case of Logseq. But it would be great, lol.

Anyway, to clear up a couple things about the fluree architecture. It has two layers which run (and scale independently) in separate containers:

A blockchain persistence/state, and
A RDF graph overlayed/indexing the state (this can even be run on client side javascript).

So, yes, the underlying state is immutable and append only. When an object is deleted or updated, a new block is written to reflect that change of state, while the graph (which is what is queryable) is updated the new state. The graph can time travel across state for free… queries include a “at time t” input and it costs the same to look into the past as the present. It also provides for independent scaling of read and write performance. The engine can achieve millisecond response time for queries. Clients register for updates from the ledger nodes for commits to triples in their local cache (basically functions like a CDN).

Granted, there is complexity added around consensus, and why do it if you don’t need consensus. But for shared editing of a graph state that all parties can rely on, it would be well worth it either on a local node or in the cloud.

Turning to the out of the box advantages:

Semantic Web native out of the box. Can be queried using SPARQL and GRAPHQL. Native JSON-LD in first half of the year.
All transactions are cryptographically signed by identities and encrypted; absolute provenance.
Built in “smart functions” for identity based access control
ACID transactions
Client edge-node graph only reads in the data in needs and loca queries are blazing fast. Write performance can be scaled (and paid for) according to need.

Then, finally, the power to use real, nestable, graph queries executed directly against the Fluree API seems very powerful.

Anyway, I hope you’ll give Fluree a second look in light of the developments in this thread as it seems to hit a lot of the features that have been discussed here.

All the best!

ddavo · January 7, 2023, 9:29am

This was done in 0.8.15!! (PR #7699)

andrewzhurov · January 11, 2023, 11:27am

Thank you for a thoughtful response, a delight to read.
I’ll give it more thought later on and will get back to you.

I’m ignorant of that, how’s that done?

Erich_Greenebaum · January 11, 2023, 7:32pm

Essentially, the schema.org vocabularies were established as an initiative amongst the big search engines to put a semantic patina across the web 2.0 space. In sum, google et.al. “understand” these types and can use them to contextualize results. Obviously Logseq would need to present an http endpoint for a search engine to crawl, but if the contained information is coded according to the schema.org vocabularies, it enables the indexer to reason about the content and contextualize results.

(edited for clarity)

Erich_Greenebaum · January 20, 2023, 4:36am

Hello @andrewzhurov - In case you aren’t familiar with it, I wanted to share a link to ipld.io which is part of the larger universe of IPFS and all that related tech (libp2p, etc).

IPLD is the data model of the content-addressable web. It allows us to treat all hash-linked data structures as subsets of a unified information space, unifying all data models that link data with hashes as instances of IPLD.
(from their home page)

So, i.e. it can interact with data on Git as easily as IPFS, or any other hash based address space. Given the earlier discussion in this thread regarding CIDs, IPFS, IPNS, etc, it seems potentially useful to have an interface that allows you to interact with all of them as a unified namespace.

I also wanted to add my two cents on the mutable vs immutable question. In my mind, I’d like to have both. Certainly there is canonical knowledge and fields - science is obviously built on the canonical history of published works. Another way to look at it is that while I never want to lose the history of “my” graph, I want the state of my graph to evolve. I want the state of my second brain to reflect my current state of mind, which may be different that it was five years ago. There are lots of contexts where this sort of dynamical graph is useful.

Anyway, I’m a big proponent of leveraging everything that comes for free from the IPFS universe, and it seems to offer a lot of flexibility on the immutable->mutable dimension. To those ends one thing to be aware of is that IPNS (iirc) has recently introduced multiple modes including peer to peer meshing to update records between nodes, creating faster consensus amongst coordinated peers.

Perhaps by leveraging the IPLD tooling, you could have federated graph space across i.e. Git and IPFS.

Another approach to both persistence and shared state is the Ceramic project which is in the web3 space, using things like Lit Protocol (smart wallets with encryption features), which can write encrypted blobs to ipfs. They’ve recently updated their database to be what they frame as a graph database, but which is backed by Postgres, so it’s fine for short path traversal. The cool thing about it is that they use GraphQL schemas to define the types in their system, which yields a composable network of types.

It does have the downside risk of being a startup web3 project, but it is seeing big adoption in the space so is likely to thrive. Offsetting the downside risk is that anyone can run their own network of nodes.

On the positive side, it lets you offload a lot of the big lifts for what’s being discussed here. I’ve also discovered their DB layer is totally pluggable, so replacing (or complimenting) Postgres with Fluree should be an easy lift, which I’m stoked about.

Last thing… Check out dunlin.xyz. It’s a basic Logseq/Roam style thing, although not nearly as feature complete. That said, it’s web3 based, and provides for sharing graphs with all sorts of permissions and conditions. Might be worth a gander.

Hope this is helpful.

andrewzhurov · February 1, 2023, 8:00pm

Hi, @Erich_Greenebaum! Thank you again for such thoughtful responses.
Sorry for taking so long to get back to you.

I’ve been surprised to see how you share understanding of both Semantic Web and content-addressable stores, proposing an intriguing synergy between the two.

It’s been interesting to learn about the Ceramic project, perhaps we could think of use for it.
Also it’s been interesting to get familiar with dunlin.xyz, a close in spirit project to ours. May be a source of inspiration.

Thank you for bringing attention to the Fluree project, it does have a ton of interesting aspects that we could find of use. Aside of other good points, it seems to be a mighty fine solution for when we’re in of consensus. And the good thing about content-addressable data (in RDF, in our case), is that it can be stored and shared via any number of ways! We essentially care little about where it comes from ^.^, so it may come from a Fluree blockchain, as well as from any other source, such as ActivityPub, Matrix, IPFS & other sweet tech, as been creatively thought of by folks in thread. We’re not locked to just one - the more the merrier.=)

All of them seem to posess features that make them particularly suitable for different use-cases.
For example,
Fluree - for when we need consensus and Datalog capabilities. It seems excellent in that.
IPFS - for when we need a global content-addressable storage.
& other mentioned sweet tech has its strong sides, which I’d fail to present at the level it deserves.

As you mention, the data can be in JSON-LD, making it possible to use within the Semantic Web.
How would we publish blocks as JSON-LD into a content-addressable storage?
So we can ref to such blocks from Logseq and run Semantic Web queries on them.
Curious to learn your thoughts!

alex0 · February 1, 2023, 9:06pm

Are you aware of this script that generates RDF from Logseq pages?

andrewzhurov · February 2, 2023, 8:26am

I wasn’t! Thanks for bringing it to my attention. Very nice to see work towards the Semantic Web!
Also stumbled upon it in an adjacent thread, where it’s been kindly shared by @cldwalker, here.

Erich_Greenebaum · February 3, 2023, 1:26pm

Hey Andrew - thanks for the reply and no worries on the time!

Regarding IPFS, I’ve been looking at using IPLD for persisting graph structure, but definitely just writing an IPFS CID onto the Fluree ledger gives you provenance, and then using a combination of smart-functions and i.e. Lit, we get full access control - which I grant is not always something you want for shared semantic data, lol, but sometimes is.

I think the biggest win is that by using a RDF database underneath Logseq, you get all that great semantic querying from the graph engine itself. People have done some interesting things to extend the semantics of GraphQL, which can be a lot more accessible than people having to learn SPARQL, if they don’t need all of its features. That has the benefit of making Logseq data objects accessible using one of the most popular API styles on the web, easily consumable by things like React apps.

But I think your basic notion of putting blobby content into IPFS and referencing it in the graph is probably the best strategy for that aspect. I’m using that approach elsewhere and it seems to be a good fit.

andrewzhurov · February 12, 2023, 6:27pm

Storing JSON-LD in IPLD sounds interesting, what would be the benefits over storing JSON-LD as blobs on IPFS?