Using Fuseki and RDF4J to load and retrieve data from a remote graph datastore

Allowing readers to add comments to my blogs led me to figure out how to use Fuseki as a backing-store for the comments. Making my site interactive rather than just read-only is exciting step forward.

Posted: Wed 18 Dec, 2019, 16:01
Having used Apache Jena to hold all my personal data backing my website for many years now, I am totally sold on the benefits. It provides a flexible, natural way for me to hold my content and meta-content, which is indexed, searchable and easily saved down to a Turtle ttl file. At time of writing triki however, Fuseki was in it's infancy and so my graph store is held in memory right next to my render engine.

Architecturally however, having an external graph store accessible over HTTP SPARQL has always seemed like a natural progression, which is precisely what Fuseki offers. Adding comments to my site was the perfect opportunity to trial Fuseki with a view to moving all the content to out-of-process graph stores.

Blogging with Comments

Deciding to add comments has been a long time coming. Writing blogs without allowing readers to comment is a very one-way, hectoring, 'I talk, you listen' approach which I have never been comfortable with. I'm trying to be more TCP than UDP. So why has it taken so long? Well the first blocker was having a good way to authenticate people. This has worked it's way into the site via OAuth2 integration with Google, Amazon and Outlook - three supporters of the OpenID authentication standard. The only reason to authenticate is to reduce spam which is essential to protect the site.

The other delay has been WebMentions which are the IndieWeb way of commenting on internet published content. IndieWeb is an active and long established movement, supported by some very smart people. I am totally aligned to their goals of promoting a web where all own our own content, and I really wanted to implement WebMentions. However comments can only be left by those running their own site with the correct software, which is a depressingly tiny minority. To make comments accessible to all, I have reluctantly built my own commenting functionality and will add WebMention support later.

The last and final delay was that I was unsure about holding comments next to my own content in the graph store. Content generated by me is (very) slow growing but comments could grow much more quickly and fill up the graph store memory. I really wanted to externalise it to a separate store, hence a Fuseki dataset over HTTP.

Starting Fuseki

Fuseki can be run in-process or standalone. To run standalone (as I am), just download the jar and run:

./fuseki-server

This will spin up Fuseki on localhost:3030 and from there a dataset can be added. I added one called 'comments'. I also discovered that it is not possible to add datasets when accessing the Fuseki console on a non-localhost machine, so to allow this, edit run/shiro.ini and comment out:

/$/** = localhostFilter

Backups can be triggered via the console or over the REST interface, ensuring your data is never lost.

Adding data to Fuseki

So once a user has added a comment and submits, the data needs to be pushed into the Fuseki store. My first question was... how? Very easily is the answer. After some trial and error it turned out that RDFConnection has a load(Model model) method. To get a connection that allows updates, the following worked for me:

RDFConnection connection = RDFConnectionFuseki.create()
                    .destination("http://localhost:3030/comments/")
                    .queryEndpoint("sparql")
                    .updateEndpoint("update")
                    .acceptHeaderSelectQuery("application/sparql-results+json, application/sparql-results+xml;q=0.9")
                    .build()
        

After that, I just build up a new Model (from a form POST submission) and chuck in all the predicate/object triples for the comment, which ends up being a very small Model with only seven triples.

To save this over the network to Fuseki just run:

       rdfConnection.load(commentModel)
        

Done. Very intuitive, very simple.

Reading data out of Fuseki using RDF4J

The beauty of Fuseki is that it offers standard HTTP SPARQL 1.1 interfaces which can be integrated with any library that speaks this. RDF4J is a popular library to talk to graph datastores, so it seemed like a good opportunity to try it out. More importantly, it offers SparqlBuilder which is safe, builder-pattern way of constructing a query, which has no (mature) alternative in native Apache Jena-land. Building up queries is so simple and intuitive, and allows me to stop writing hand-crafted SPARQL queries, which is awesome.

Building up the query was as easy as:

        Variable comment =  SparqlBuilder.var("comment");
            Variable commentText =  SparqlBuilder.var("commentText");
            Variable commentorWebsite =  SparqlBuilder.var("commentorWebsite");
            Variable created =  SparqlBuilder.var("created");

            Prefix triki = SparqlBuilder.prefix("triki", iri("http://www.opentechnology.net/triki/0.1/"))
            Prefix dcTermsPrefix = SparqlBuilder.prefix("dcterms", iri(DCTerms.getURI()))
            Prefix commentsPrefix = SparqlBuilder.prefix("comments", iri("http://www.opentechnology.net/triki/comments/0.1/"))

            SelectQuery selectQuery = Queries.SELECT();
            String queryString = selectQuery
                    .prefix(triki)
                    .prefix(commentsPrefix)
                    .prefix(dcTermsPrefix)
                    .select(comment)
                    .select(commentText)
                    .select(commentorName)
                    .select(commentorWebsite)
                    .select(created)
                    .select(id)
                    .where(comment.isA(commentsPrefix.iri("Comment")))
                    .where(comment.has(commentsPrefix.iri("comment"), commentText))
                    .where(comment.has(commentsPrefix.iri("commentorName"), commentorName))
                    .where(comment.has(commentsPrefix.iri("commentorWebsite"), commentorWebsite))
                    .where(comment.has(dcTermsPrefix.iri("identifier"), id))
                    .where(comment.has(dcTermsPrefix.iri("created"), created))
                    .where(comment.has(dcTermsPrefix.iri("references"), iri(targetUrl)))
                    .orderBy(created.desc())
                    .limit(50)
                    .getQueryString()

            TupleQuery tupleQuery = queryConnection().prepareTupleQuery(QueryLanguage.SPARQL, queryString);
        

The Variables are the query question mark variables that are used in clauses and the select. No more writing SPARQL Query manually. So simple.

So finally, anyone can comment on any blog on my site.

If you have any comments or questions, please sign in below and add a comment!