RDF/JSON-LD/Tripple Store/Schema.org, etc

andrewzhurov · December 13, 2022, 6:10am

There is an ongoing effort to have our graphs integrated with the Semantic Web, discussed under this post, as you have already discovered.=)

It uses DataScript’s quad store. (speaking in RDF terms, a quad is of shape subject+predicate+value+time)

Fluree is an interesting solution.
It uses Flakes data model, that is closely akin to DataScripts’s, which seems to make it possible to drop-in a Logseq graph into Fluree with little effort.
It can provide a SPARQL endpoint on top of it, deriving RDF data from Flakes.

However, there are some downsides which may make it a less appealing solution for the problem at hand.
It’s a blockchain, meant to guarantee consensus in a distributed system, whereas we don’t need consensus when we build an immutable acrete-only public graph, as there are no conflicts in the first place, it’s eventually consistent. So it seems to me.

It requires to be run from a shell, perhaps in a Docker for better reproducibility, and that is a hefty footprint on user’s OS. May not be practical / possible on some devices at all, such as mobile phones. May require not that user-friendly setup steps.

There is consensus chit-chat, which makes publishing data take longer and more costly on computation and network than publishing a signed immutable block to a content-addressable storage of your choice.

Using Fluree solely as the source of truth would limit options of where we can get the data from.
Whereas we could be using any combination of content-addressable stores to publish and discover data (IPFS, GNUNet, https CDNs).

In Fluree Interconnection with the SemanticWeb happens by exposing SPARQL endpoint on a Fluree server. Atm this server seem to only include just some Semantic Web sources, such as Fluree blockchain, Wikidata and BigData. source
I haven’t found them mention how to adding more sources, although I guess it is possible.

SPARQL endpoints are expensive on the server. source
Serving Semantic Web data as Linked Data Fragments strikes a good balance between client and server cost, and allows for federated queries from the browser, via js libs such as Comunica.

Overall, it seems Fluree is a good fit for consensus-requiring use-cases, and this feature comes with a hefty architecture complexity. It also seems our use-case does not require consensus, so putting Fluree to use would add accidental complexity.

When this is interesting, allowing to find the exact version of a Logseq block that other block referred to (which I believe is a must have, as it prevents link rot), it comes with the need to have this time blockchain and search through it, recreating state as of some time, which is costly on storage and requires blockchain architecture or some other way to guarantee that time log is not tempered with.
An alternative solution to use content-based addressing for block references, baking in reference to the exact content/block. Such a reference can be represented as a HashURI, being a part of RDF representation of a black, and being lazily resolved to it’s actual content by a query engine, such as Comunica (it would need to be extended with such hash-based-uri resolving module though (in plans)).

Thank you for bringing Fluree to attention, it’s is an interesting piece of tech and perhaps some of it’s ideas could be used as inspiration sources, e.g., how data can be signed and encrypted.
Keen to hear more ideas this way, it gets us one step closer towards our dream.=)