Building an RDFa importer service

RDFa is a way to embed a Semantic Model into Linked Data.  In this short post we describe how we can implement a microservice to import these contents into the mu.semte.ch stack.  We will go through our own development process and see what we discovered.

In the process described here, we will ignore file upload and ignore support for the Delta service.  Both of these will be discussed in other posts.

Getting started

We create a new folder with the basic stub for a new Ruby template.

# /path/to/importer/web.rb
get '/' do
 content_type 'application/json'
 { data: { attributes: { hello: 'world' } } }.to_json
end
# /path/to/importer/Dockerfile
FROM semtech/mu-ruby-template:2.4.0-ruby2.3
MAINTAINER Aad Versteden <madnificent@gmail.com>
# see https://github.com/mu-semtech/mu-ruby-template for more info

With these files in place we can wire this new service up in a standard mu-project by updating our docker-compose.yml and dispatcher.ex.

In the docker-compose.yml we add our development component and link it to the dispatcher.

dispatcher:
  ...
  links:
    - rdfaimporter:rdfaimporter
...
rdfaimporter:
  image: semtech/mu-ruby-template:2.4.0-ruby2.3
  links:
    - db:database
  ports:
    - "8888:80"
  environment:
    RACK_ENV: "development"
  volumes:
   - "/path/to/importer/:/app"

In the dispatcher, we add the following above match _ do

# new content in dispatcher.ex
match "/import/*path" do
  Proxy.forward conn, path, "http://rdfaimporter/"
end

After starting our project, we can surf to http://localhost/importer and we will receive our default hello world output.  As we update the code in our ruby-template, we will see the updates appear live.

Importing RDFa

Hint: If you’re working your way through this post, you won’t need to execute the steps in this section.

During our search for a good solution, we search online for a good RDFa importing library.  We find the rdf-rdfa library on GitHub.  This library looks clean so we create a new Gemfile and add the latest version to it.

# /path/to/importer/Gemfile
gem 'rdf-rdfa', '2.2.2'

When we restart the container, which we need to do because we changed the dependencies, we notice that the mu-ruby-template we see the following output.

rdfaimporter_1 | You have requested:
rdfaimporter_1 | rdf-rdfa = 2.2.2
rdfaimporter_1 | 
rdfaimporter_1 | The bundle currently has rdf-rdfa locked at 2.1.0.
rdfaimporter_1 | Try running `bundle update rdf-rdfa`

Turns out the mu-ruby-template already made the decision for us.  We can remove our Gemfile and continue humming away with the version offered by the mu-ruby-template.

Parsing the RDFa file

With the RDFa library selected, and documentation in place, we work through a first version for parsing the file.

We save an RDFa annotated file (without blank nodes) into ./data/share/our-example.html, where the import service can find it.

<div resource="http://test.com/Articles/81216194" vocab="http://test.com/vocabulary/" typeof="Article" class="article">
  <h2 property="hasTitle">New article</h2>
  <p property="hasContent">
    Content of an which refers to <span property="referredPerson" typeof="foaf:Agent" resource="mailto:madnificent@gmail.com"><a property="email-address" href="mailto:madnificent@gmail.com">Aad Versteden</a></span>.
  </p>
</div>

We will use this example in a simple case with debugging output.  We can see the contents in an easy-to-interpret format by pasting it at http://rdfa.info/play.  In our first try we send this file through the rdf-rdfa library and we dump the contents.

For a cleaner interface, we change our get to process on /import/ and update the dispatcher accordingly:

 # updated content in dispatcher.ex
 match "/import/*path" do
   Proxy.forward conn, path, "http://rdfaimporter/import/"
 end

Then we implement a basic dump of the parsed contents:

require 'rdf/rdfa'

get '/import/' do
  content_type 'application/json'

  graph = RDF::Graph.load("/share/#{params[:file]}")
  dump = graph.dump :ttl

  { data: { attributes: { parsed: dump } } }.to_json
end

When we access http://localhost:8888/import/?file=our-example.html, we see the resulting turtle in the response.  Yay, we’re ready to write this into the triplestore.

Writing contents

We can write contents to the triplestore by using sparql_client.insert_data_graph.  At first, we try this with a temporary graph.

require 'rdf/rdfa'

get '/import/' do
  content_type 'application/json'

  graph = RDF::Graph.load("/share/#{params[:file]}")
  dump = graph.dump :ttl

  sparql_client.insert_data graph, :graph => "http://test.com/1"

  { data: { attributes: { parsed: dump } } }.to_json
end

When we surf to http://localhost:8888/import/?file=our-example.html, our data is inserted into the triplestore.  We find it by going to  http://localhost:8890 and executing a query which lists all triples of the specified graph.

SELECT * WHERE {
  GRAPH <http://test.com/1> {
    ?s ?p ?o.
  }
}

Our contents are inserted into the application graph by updating the graph statement.

require 'rdf/rdfa'

get '/import/' do
 content_type 'application/json'

 graph = RDF::Graph.load("/share/#{params[:file]}")
 dump = graph.dump :ttl

 sparql_client.insert_data graph, :graph => ENV['MU_APPLICATION_GRAPH']

 { data: { attributes: { parsed: dump } } }.to_json
end

Conclusion

With this, mu.semte.ch has been extended to import RDFa documents.  There is the slight limitation that only documents without blank nodes are allowed.  An extension to the twelve-line-long microservice could help here.

In a future post we will address the connection with a file upload and with the delta service, making the importer safer, and more solid.