Posts Tagged ‘sparql’

SPARQLing The Highest Point in Every US State

Wednesday, February 4th, 2009

In a previous post, I mentioned that it should be pretty easy to use SPARQL to make a map of the highest point in each of the 50 US States. Having written that, I thought I should maybe actually, you know, try it.

The following chunk of code uses ARC2, an rdf/semantic web library for PHP to query the dbpedia endpoint and then put the results on a Google Map.

To try this out, you need to:

  1. Have a functional PHP installation
  2. Download ARC2 into your web path (no setup required)
  3. Set the path to ARC in the code below
  4. Get a Google Maps API Key (free)
  5. Set your API key in the code below
  6. Run

Note that this is a demo and is written to be easy to run – a real application might separate the data logic from the webpage and make more sophisticated use of Javascript/Google Maps API.

<?php
//include ARC2 libraries
include_once("path/to/ARC2.php");
//instantiate a RemoteStore
$config = array('remote_store_endpoint' => 'http://dbpedia.org/sparql');
$store = ARC2::getRemoteStore($config);
//build the SPARQL query
$q = '
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?state ?mtn ?lat ?long
WHERE {
?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> .
?state dbpedia2:highestpoint ?mtn .
?mtn geo:lat ?lat .
?mtn geo:long ?long
}
';
//process the results
$results = array();
if ($rows = $store->query($q, 'rows')) {
foreach ($rows as $row) {
$state = substr($row['state'], strlen("http://dbpedia.org/resource/"));
$mtn = substr($row['mtn'], strlen("http://dbpedia.org/resource/"));
$lat = $row['lat'];
$lng = $row['long'];
$results[] = array($state, $mtn, $lat, $lng);
}
}
?>
<!DOCTYPE html "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
<title>Google Maps JavaScript API Example</title>
<script type="text/javascript" src="http://maps.google.com/maps?file=api&amp;amp;v=2&amp;amp;key=YOUR_KEY"></script>
<script type="text/javascript">
function initialize() {
if (GBrowserIsCompatible()) {
var map = new GMap2(document.getElementById("map_canvas"));
map.setCenter(new GLatLng(37.4419, -122.1419), 3);
map.addControl(new GMapTypeControl());
map.addControl(new GLargeMapControl());
<?php
//populate map with results
foreach($results as $result) {
list($state, $mtn, $lat, $lng) = $result;
echo("map.addOverlay(new GMarker(new GLatLng($lat,$lng), {title: '$mtn'}));n");
}
?>
}
}
</script>
</head>
<body onload="initialize()" onunload="GUnload()">
<div id="map_canvas" style="width: 100%; height: 100%"></div>
</body>
</html>

* Using substr() to chop off “http://dbpedia.org/resource/” from the names is probably cheating. I think you’re supposed to use rdfs:label@en instead.

Anatomy of a SPARQL Query Part 1 – Select

Sunday, February 1st, 2009

..in which I try to describe a few aspects of the SPARQL query language in plain(er) English.

SPARQL is the query language for the semantic web and is a W3C recommendation (that is, practically a standard). To issue a query, you need an endpoint. All my examples will be using the DBPedia endpoint. Open that page in a new window. You should be able to copy-paste any query on this page into DBpedia and play around with the results.

The basic thing you can do with SPARQL is SELECT. Let’s start by getting a list of US States:

SELECT ?state WHERE {
  ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> .
}

Now, it is important to note that the choice of variable names is arbitrary. You could have chosen ?s or even ?turnip just as easily. This query will give you the exact same results:

SELECT ?turnip WHERE {
  ?turnip skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> .
}

Okay, well that’s pretty sweet. So let’s learn something interesting about the nifty-fifty – how about state capitals?

SELECT ?state ?capital WHERE {
  ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> .
  ?state dbpedia2:capital ?capital
}

So that works. But WTF are skos and dbpedia2? They are called namespaces. The short story is that to be useful in a SPARQL query, everything needs a unique identifier. URLs (like a web address) provide just that. If two things have the same URL, they are exactly the same. Unfortunately, URLs are often quite long. To save typing and space on the screen, skos and dbpedia2 and rdf are all abbreviations. When you see skos, it actually means http://www.w3.org/2004/02/skos/core# and when you see skos:subject it means http://www.w3.org/2004/02/skos/core#subject.

Namespaces – well that’s all very interesting, but how do you know that the property is called dbpedia2:capital instead of, say, dbpedia2:stateCapital? Happily, there is a query that gives a list of all the properties for a particular subject. In this case, the subject is Alaska:

SELECT ?prop WHERE {
  <http://dbpedia.org/resource/Alaska> ?prop ?obj
}

Of course, you don’t have to provide the subject. You could provide the predicate or the object instead. Now, on with the show. Let’s look at the State Flower:

SELECT ?state ?flower WHERE {
  ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> .
  ?state dbpedia2:flower ?flower
}

Uh oh – there are a few states missing! Awesome as it is, DBPedia doesn’t know everything. In this case, some states are missing the dbpedia2:flower property. Fortunately, there’s the OPTIONAL keyword:

SELECT ?state ?flower WHERE {
  ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> .
  OPTIONAL { ?state dbpedia2:flower ?flower }
}

Okay, that’s better. We don’t have 50 flowers, but at least we still have all the states. So which state has the biggest population? The ORDER BY clause can help:

SELECT ?state ?pop WHERE {
  ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> .
  ?state dbpedia2:poprank ?pop
}
ORDER BY ?pop

Go ahead and try it. It works, but not exactly, right? The problem is that DBPedia’s dbpedia:poprank property is not always an integer value. What does that mean? It means that sometimes a “1” isn’t a “1”, sometimes it is a “1”^^dbpedia:units/Rank. So what can you do about it? In this case, you can just cast it to an integer. This query will give you the states in order of population:

SELECT ?state ?pop WHERE {
  ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> .
  ?state dbpedia2:poprank ?pop
}
ORDER BY xsd:int(?pop)

So that’s useful and there are a number of conversion functions to do similar things. Now here’s something a little different – this query gets the latitude and longitude of the highest point in every state. With a little bit of scripting, you could plot these on a map:

PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?state ?mtn ?lat ?long WHERE {
  ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> .
  ?state dbpedia2:highestpoint ?mtn .
  ?mtn geo:lat ?lat .
  ?mtn geo:long ?long
}

Here it was necessary to declare the geo namespace using the PREFIX keyword. The DBPedia SPARQL endpoint already declares a bunch of namespaces for your convenience, but geo is not among them so we need to add it in. It is worth noting that the name geo is arbitary – you could have called it fred. Namespaces in widespread use have developed conventional prefixes that should be used to prevent confusion.

SPARQL also supports a limited set of comparison operations using the FILTER keyword. The query below lists states that joined after January 1st, 1850:

SELECT ?state ?date WHERE {
  ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> .
  ?state dbpedia2:admittancedate ?date
  FILTER ( ?date > "1850-01-01T00:00:00Z"^^xsd:dateTime )
}

The recommendation lists a number of comparisons http://www.w3.org/TR/rdf-sparql-query/#SparqlOps, though in practice these may not be 100% available in a particular implementation.

I’ll be back with Part 2 soon. As always, I hope this will be useful, so comments/feedback are always welcome – especially from people who can correct any mistakes printed here.

* Update: Here’s Part 2 – SPARQLing The Highest Point in Every US State