[Profile picture of Ruben Verborgh]

Ruben Verborgh

Writing a SPARQL parser in JavaScript

JavaScript didn't have a parser for SPARQL 1.1 yet, so I created one.

If we want to make the Semantic Web more webby, we need software in the Web’s main language JavaScript. The support for the SemWeb’s data format RDF has been quite good in JavaScript, with several libraries supporting Turtle and JSON-LD. However, proper support for the SPARQL query language has been lacking so far, especially for the latest version SPARQL 1.1. Since I need SPARQL for several of my projects, such as Linked Data Fragments, I wrote a proper SPARQL parser myself. It is created through the Jison parser generator and converts SPARQL into JSON. Its main design goals are completeness, maintainability, and small size.

The Semantic Web world has so far mainly focused on intelligent servers, instead of the promised intelligent agents. This is reflected in the codebase: the majority of Semantic Web software is written in Java, which runs on servers, but not (natively) on browsers. While there are great libraries in other languages such as Ruby, JavaScript is still a bit behind. That’s a pity, because browsers are the most important clients of the Web, and they breathe JavaScript. Furthermore, JavaScript engines have become much more powerful over the past couple of years, and they have given rise to server-side JavaScript through Node.js.

I’ve recently worked a lot on client (and server) development of Semantic Web applications, and I implement in JavaScript because this allows to run the code on servers and browsers alike. This was my motivation 2 years ago to start a Turtle library, which provided streaming parsing at an unprecedented speed. Turtle is an important language for RDF data, but expressing queries over such data in a standardized way requires the SPARQL language. Existing parsers are outdated and/or buggy, and I needed a SPARQL parser I could rely on.

The full expressivity of the SPARQL 1.1 query language lets us write advanced clients. ©Karen Eliot

Design goals and implementation strategy

After some consideration, I outlined the main design goals of the SPARQL.js parser as:

  • completeness: correctly parse all of SPARQL 1.1
  • maintainability: allow evolution easily
  • size: be small to allow fast downloading in browsers

Now I had already written a Turtle parser, so since triples in SPARQL queries essentially follow the Turtle syntax, an obvious question would be: why not reuse or extend the existing Turtle parser?

While it seems that this would save a lot of work, it actually wouldn’t: triples are only a small, quite trivial part of SPARQL. Furthermore, the design goals of the Turtle library were quite different: there’s no need to parse SPARQL queries in a streaming way, as queries are generally only a few lines long, and never larger than memory. Furthermore, the N3.js parser is hand-written to obtain highly optimized code, which makes maintainability harder. This is especially a problem because the SPARQL grammar is much more complex than Turtle’s.

For this reason, I decided to implement the SPARQL.js parser using a parser generator. This allowed me to keep the code simple and to hand over common tasks to the generator. After a quick comparison with PEG.js, I picked Jison for the job.

From grammar to parser… semi-automatically

The SPARQL specification includes a grammar in Extended Backus-Naur Form (EBNF) notation. It has rules such as the following:

Query ::=
    Prologue
    (SelectQuery | ConstructQuery | DescribeQuery | AskQuery)
    ValuesClause

The above rule says that a SPARQL query consists of a prologue (prefix and base declarations), then one of the four query forms SELECT, CONSTRUCT, DESCRIBE, or ASK, and then a series of values. Each of those is then detailed by further rules.

Most people who implement a parser implement these rules by hand for their parser generator. However, since Jison already supports an EBNF dialect, I instead automatically translated the rules into the Jison format. First, I copied the EBNF grammar verbatim from the SPARQL document and then wrote a script to translate this into Jison syntax. This created the funny situation where I had a source file that was translated into a parser generator, which is translated into a parser. In a sense, I guess it was a parser generator generator.

After those easy steps, I had a SPARQL accepter. That is, I had a parser that could tell whether a given query was valid SPARQL or not. A big win for little work.

From SPARQL to JSON

To convert this accepter into an actual JSON parser, I had to extend the generated Jison document with code that tells what to do with each SPARQL component. For instance, PREFIX declarations should be remembered, so if we encounter a prefixed name, we can correctly expand it. While this was still a big job, the major benefit was that I knew I had a valid accepter, because the structure of the rules had been automatically generated from the SPARQL specification.

I specifically chose not to parse to the SPARQL algebra, because I think it can be unnecessarily complex for JavaScript (or other) developers. Maybe I will extend the parser to output SPARQL algebra as well some day, but it seemed more important to have something usable sooner. Like for my N3.js triple representation, I strived to find something that is as easy as possible in JSON.

Here is an example SPARQL query:

PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT ?p ?c WHERE {
    ?p a dbpedia-owl:Artist.
    ?p dbpedia-owl:birthPlace ?c.
    ?c <http://xmlns.com/foaf/0.1/name> "York"@en.
}

It translates into JSON as follows:

{
  "type": "query",
  "prefixes": {
    "dbpedia-owl": "http://dbpedia.org/ontology/"
  },
  "queryType": "SELECT",
  "variables": [ "?p", "?c" ],
  "where": [
    {
      "type": "bgp",
      "triples": [
        {
          "subject": "?p",
          "predicate": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
          "object": "http://dbpedia.org/ontology/Artist"
        },
        {
          "subject": "?p",
          "predicate": "http://dbpedia.org/ontology/birthPlace",
          "object": "?c"
        },
        {
          "subject": "?c",
          "predicate": "http://xmlns.com/foaf/0.1/name",
          "object": "\"York\"@en"
        }
      ]
    }
  ]
}

Note how you can still recognize the original query, while it is perfectly accessible as a JSON structure. It seems a good format to start developing applications.

Try it yourself

The SPARQL.js library has been released as open source on GitHub.
Try it out and let me know what you think!

The next major step would be a SPARQL execution engine in JavaScript. We have made some progress with this in the Linked Data Fragments client, which used an older but incomplete SPARQL parser. I can’t wait to upgrade to SPARQL.js ;-)

Fun fact: I tried to make this library as small as possible. To achieve this, I implemented two features in the Jison parser generator that brought down the size of generated parsers to less than half of the original size. The cool thing is that all Jison-generated parsers benefit from this: the same functionality with half the weight!
The final SPARQL.js parser is 18kb gzipped, which seems nice for full SPARQL 1.1 support.