RDF/JSON-LD/Tripple Store/Schema.org, etc

Erich_Greenebaum · December 5, 2022, 1:41pm

I’m curious if there is any thinking/work around using JSON-LD/Schema.org to leverage i.e. @context and @types to create structure in LogSeq, i.e. instances of typed objects with available properties. Also wondering if the LogSeq graph uses a tripple store of some kind.

I was speculating that Fluree could be a good fit - it is clojure based, has a blockchain state model with a RDF graph layered on top giving it some interesting properties:

The fact it is built on an append only log means one can “time travel” over the data… everything is stored as a delta.
Having the state engine separate from the graph engine offers some great methods for graph consumption; one can have many graph nodes streaming updates from the ledger.

Meanwhile, they’re about to add support for JSON-LD, which will make interop with Schema.org and other ontologies/vocabularies easy to work with.

Finally, everything is encrypted.

Anyway, mostly curious about how the developers see LogSeq in relation to technologies like RDF, etc.

Thanks much for the great tool and any feedback on this!

-Erich

gww · December 5, 2022, 2:24pm

Is looks like we have similar interest!

i.e. to distinguish between like [[Apples]] vs hate [[Apples]] :
https://discuss.logseq.com/t/colored-graph-relationship-types/13044

Also to be able to export graph How to export graph from commandline? to eventually later convert to other formats (e.g. n-triples).

(bot asked me to edit this message…)

Erich_Greenebaum · December 5, 2022, 4:10pm

Hey Hey! Sorry I didn’t find your other post when I was searching around. I think one basic thing to explore with the devs is starting with if there is some structural reason that i.e. RDF doesn’t work for this use case.

At the simplest level, it would be great to be able to “prototype” based on types available in common ontologies/vocabularies i.e. schema.org. It seems like a natural solution to integrate LogSeq graphs into the larger semantic web. It would open up a giant search space of structured content to LogSeq, and vice-a-versa.

gww · December 5, 2022, 4:39pm

Structurally most RDF stores (e.g. using SPARQL), source data from different sources and map into RDF.

E.g. here it is example with CSV=>RDF

https://github.com/w3c/csvw
https://www.w3.org/community/csvw/
https://www.w3.org/2013/csvw/wiki/CSV2RDF

working group.

So if one can map logseq (e.g. relationships) into CSV ( https://discuss.logseq.com/t/how-to-export-graph-from-commandline/13047 ) ,
AND one can define mapping from CSV to RDF,
then transitively mapping from logseq to RDF can be provided!

Needs may differ,
but to me it’s two things that I am looking for:

get graph visible in visualisation page
if possible, extended to colour edges (so it’s not only “Alice–Apple” but “Alice–(likes)–Apple”

And as bonus to add all other extra vertices and edges, e.g. to text blocks , and properties and their values etc. as further steps (e.g. importing into Apache Jena to query via SPARQL https://jena.apache.org/tutorials/sparql.html )

Erich_Greenebaum · December 6, 2022, 2:11pm

Regarding LogSeq->CSV->RDF…

This seems like a viable method, but a couple points:

this is what JSON-LD was developed to address in the first place and
it already provides for including the edge “type” (predicate in the subject->predicate->object triple) via reference to i.e. schema.org vocabularies.

If any of the developers are reading this, if you could comment on LogSeq’s philosophical disposition towards semantic web tech that would be great. Maybe the more direct question is, how LogSeq imagines sharing/linking/collaborating, “federated” graphs, etc?

Hope this finds everyone well!

andrewzhurov · December 13, 2022, 6:10am

There is an ongoing effort to have our graphs integrated with the Semantic Web, discussed under this post, as you have already discovered.=)

It uses DataScript’s quad store. (speaking in RDF terms, a quad is of shape subject+predicate+value+time)

Fluree is an interesting solution.
It uses Flakes data model, that is closely akin to DataScripts’s, which seems to make it possible to drop-in a Logseq graph into Fluree with little effort.
It can provide a SPARQL endpoint on top of it, deriving RDF data from Flakes.

However, there are some downsides which may make it a less appealing solution for the problem at hand.
It’s a blockchain, meant to guarantee consensus in a distributed system, whereas we don’t need consensus when we build an immutable acrete-only public graph, as there are no conflicts in the first place, it’s eventually consistent. So it seems to me.

It requires to be run from a shell, perhaps in a Docker for better reproducibility, and that is a hefty footprint on user’s OS. May not be practical / possible on some devices at all, such as mobile phones. May require not that user-friendly setup steps.

There is consensus chit-chat, which makes publishing data take longer and more costly on computation and network than publishing a signed immutable block to a content-addressable storage of your choice.

Using Fluree solely as the source of truth would limit options of where we can get the data from.
Whereas we could be using any combination of content-addressable stores to publish and discover data (IPFS, GNUNet, https CDNs).

In Fluree Interconnection with the SemanticWeb happens by exposing SPARQL endpoint on a Fluree server. Atm this server seem to only include just some Semantic Web sources, such as Fluree blockchain, Wikidata and BigData. source
I haven’t found them mention how to adding more sources, although I guess it is possible.

SPARQL endpoints are expensive on the server. source
Serving Semantic Web data as Linked Data Fragments strikes a good balance between client and server cost, and allows for federated queries from the browser, via js libs such as Comunica.

Overall, it seems Fluree is a good fit for consensus-requiring use-cases, and this feature comes with a hefty architecture complexity. It also seems our use-case does not require consensus, so putting Fluree to use would add accidental complexity.

When this is interesting, allowing to find the exact version of a Logseq block that other block referred to (which I believe is a must have, as it prevents link rot), it comes with the need to have this time blockchain and search through it, recreating state as of some time, which is costly on storage and requires blockchain architecture or some other way to guarantee that time log is not tempered with.
An alternative solution to use content-based addressing for block references, baking in reference to the exact content/block. Such a reference can be represented as a HashURI, being a part of RDF representation of a black, and being lazily resolved to it’s actual content by a query engine, such as Comunica (it would need to be extended with such hash-based-uri resolving module though (in plans)).

Thank you for bringing Fluree to attention, it’s is an interesting piece of tech and perhaps some of it’s ideas could be used as inspiration sources, e.g., how data can be signed and encrypted.
Keen to hear more ideas this way, it gets us one step closer towards our dream.=)

Erich_Greenebaum · January 6, 2023, 5:12pm

Hi Andrew, I thought I’d sent a reply to your thoughtful response, but it seems not, so returning to it here.

There are a few points on Fluree that would be worth following up on at some point, but I think my original post focussed on it too much. My main curiosity is about applying shared type vocabularies, ala schema.org, to scope/contextualized Logseq parameters as types from those vocabs. i.e. a “person::” could be in the context of the schema.org @person type. Likewise an @Author in the Schema.org context is a @Person with additional parameters, i.e. refs to @Publications.

Anyway, just curious if that kind of thing might be possible in the future.

Hope you had a great New Years, and thanks for replying.

cldwalker · January 25, 2023, 9:15pm

With Logseq’s properties I’m able to create RDF and leverage schema.org concepts. I’ve put up a well commented example script at nbb-logseq/examples/linked-data at main · logseq/nbb-logseq · GitHub. The script should be configurable enough to handle different approaches to making ontologies in Logseq. A small portion of Logseq’s docs can now generate turtle rdf See https://twitter.com/cldwalker/status/1618355498176352259 for some more info

Cristian_Vasquez · January 27, 2023, 11:25pm

Hi! do you have an example of what the RDF looks like?

andrewzhurov · February 1, 2023, 8:28pm

Wow, so great to see work in this direction! ^.^
Thank you for sharing the code and docs, Gabriel, well written, a pleasure to read.
I’m very curious, do you have any further plans on expansion towards the Semantic Web??

cldwalker · February 2, 2023, 9:41pm

Answered this linked-data example · Issue #5 · logseq/nbb-logseq · GitHub for anyone interested

cldwalker · February 3, 2023, 2:23am

Glad it was helpful. I’m hoping to to bring semweb-inspired features to logseq that are pragmatic. I’d love it if logseq would allow us to write, query and read things, not strings. I’d also love it if we as a community could get to a point where we published and consumed each other’s things.

I shared the rdf example to show the community that logseq already provides the structure to communicate “things”. Of course, we can make it easier and more intuitive to share types and properties and manage them. But we don’t have to wait for that to start sharing how we structure data. In the meantime, I’m eager to see other’s public graphs and the ontologies they put into practice.

andrewzhurov · February 3, 2023, 8:51am

Oh, wow, I’m stoked about the direction of your thought.
That this initiative is not only movement into the direction of allowing user’s graphs to become a part of the Semantic Web, but also movement into making Logseq itself to be built around not text but data.
That is some next-level game to be had!

Going through Logseq codebase I noticed how views that work on blocks accept name of a block (or name of an alias block of that block) as an argument, and they end up in need to name → entity if they want to work on data of that entity. I guess that is a small example where data-first approach would shine, as we will be passing entity in arguments right away, making it possible to have pure views, faster render time, simpler codebase - magic! Loving it!

That of course would be a small gain in comparison to other data-first gains Logseq will benefit from, the rest, I imagine, would be a pure game-changer. Just thinking about the concept you introduced to me today, that semantics/data is the source of truth and text will be just an interface to express those semantics gives me shivers. Yet again, it’s so nice to see Logseq moving towards that wonderland, damm good initiative.

I’ve been shown only some possibilities that data-first approach would give us, and it’s already gold, wonder what kind of superpowers are out there yet to be discovered

What’s the next step?? Any plans yet?

Erich_Greenebaum · February 3, 2023, 1:05pm

Great to see this happening!

One initial though I’d had was that it would be great to get autocompletion for types… So, i.e., once you define a @context, one would get autocomplete of types, etc.

Once Fluree completes their JSON-LD features, your efforts above would make it even easier to integrate with Fluree - at that point, the sky’s the limit for federated queries, shared editing, etc. SPARQL/GraphQL queries could execute directly against the database. Plus all the cool aspects of Fluree, like guaranteed provenance, granular access control, etc, etc, out of the box. @cldwalker - if you aren’t familiar with it, you might want to check it out.

Anyway, thanks much for heading down the semweb path!

andrewzhurov · February 7, 2023, 5:57pm

That would be a handy feature, in my opinion.
It is appealing how Fluree gives us a bunch of features out of the box and how it seems to be easy to integrate with. Would be really interested in hearing updates on how it goes if somebody would like to take on it.

cldwalker · February 10, 2023, 1:24am

The next step is for the community (including myself) to build graphs with large ontologies. Once there are a couple of large graphs with overlapping rdf vocabulary, we can start to see if linked data’s strengths hold up e.g. querying across graphs. Ultimately I’m more interested in what can linked data do for us than us doing something with linked data. So if anyone has published graphs with ontologies, however simple, please do share

cldwalker · February 10, 2023, 2:06am

One initial though I’d had was that it would be great to get autocompletion for types… So, i.e., once you define a @context, one would get autocomplete of types, etc.

I don’t know what else a context-like property would solve but as far as completion, that’s an already solved problem for how I write rdf with GitHub - logseq/docs: Logseq documentation. Logseq pages are any part of an rdf triple. In other words, we already have type autocompletion in any block with page and tag autocompletion. Also logseq autocompletes property values that are unique to a given property so we have contextual autocompletion. See https://docs.logseq.com/#/page/properties/block/usage for more.

Once Fluree completes their JSON-LD features, your efforts above would make it even easier to integrate with Fluree - at that point, the sky’s the limit for federated queries, shared editing, etc. SPARQL/GraphQL queries could execute directly against the database. Plus all the cool aspects of Fluree, like guaranteed provenance, granular access control, etc, etc, out of the box. @cldwalker - if you aren’t familiar with it, you might want to check it out.

It’s been awhile since I’ve looked at it but I doubt Fluree can run in all the environments we’d need it to - desktop, mobile and web. Did Athens manage to take advantage of Fluree’s semweb features while using it as their backend? I don’t see anything in their docs

andrewzhurov · February 10, 2023, 6:20am

Nice, querying across graphs would be so exciting to see! ^.^
Do you have your eye on some ontologies you’d like to try yet?

Erich_Greenebaum · February 10, 2023, 1:07pm

It’s been awhile since I’ve looked at it but I doubt Fluree can run in all the environments we’d need it to - desktop, mobile and web. Did Athens manage to take advantage of Fluree’s semweb features while using it as their backend? I don’t see anything in their docs

Ah, I’d forgotten Athens had done that, I’ll have to check it out. I think without the JSON-LD support in place, it might not have been a generic enough solution, given the way they define collections, etc. They’ve been saying they’re close to release for going a year, so maybe the Athen’s folks lost patience waiting.

I think it could be lightweight enough to configure clients on desktop, and they do have a very light javascript client that can run in a browser. That said, I certainly take your point that it might be just too heavy overall to be a fit.

Thanks for the feedback!

sid597 · February 22, 2023, 1:45pm

At Athens we were only using Fluree ledger, we discussed a few times moving to full Fluree backend but there were other things that got higher priority.

We wanted to use it due to all the points mentioned in this thread. We wanted to use semweb features of Fluree but we did not get to that point from product development pov, I hope we get there in Logesq.

Regarding performance I asked similar questions during a fluree demo in Athens Discord you can see that thread for more info.