Anatomy of a SPARQL Query Part 1 – Select
by craiget | February 1st, 2009..in which I try to describe a few aspects of the SPARQL query language in plain(er) English.
SPARQL is the query language for the semantic web and is a W3C recommendation (that is, practically a standard). To issue a query, you need an endpoint. All my examples will be using the DBPedia endpoint. Open that page in a new window. You should be able to copy-paste any query on this page into DBpedia and play around with the results.
The basic thing you can do with SPARQL is SELECT. Let’s start by getting a list of US States:
SELECT ?state WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . }
Now, it is important to note that the choice of variable names is arbitrary. You could have chosen ?s or even ?turnip just as easily. This query will give you the exact same results:
SELECT ?turnip WHERE { ?turnip skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . }
Okay, well that’s pretty sweet. So let’s learn something interesting about the nifty-fifty – how about state capitals?
SELECT ?state ?capital WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . ?state dbpedia2:capital ?capital }
So that works. But WTF are skos and dbpedia2? They are called namespaces. The short story is that to be useful in a SPARQL query, everything needs a unique identifier. URLs (like a web address) provide just that. If two things have the same URL, they are exactly the same. Unfortunately, URLs are often quite long. To save typing and space on the screen, skos and dbpedia2 and rdf are all abbreviations. When you see skos, it actually means http://www.w3.org/2004/02/skos/core# and when you see skos:subject it means http://www.w3.org/2004/02/skos/core#subject.
Namespaces – well that’s all very interesting, but how do you know that the property is called dbpedia2:capital instead of, say, dbpedia2:stateCapital? Happily, there is a query that gives a list of all the properties for a particular subject. In this case, the subject is Alaska:
SELECT ?prop WHERE { <http://dbpedia.org/resource/Alaska> ?prop ?obj }
Of course, you don’t have to provide the subject. You could provide the predicate or the object instead. Now, on with the show. Let’s look at the State Flower:
SELECT ?state ?flower WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . ?state dbpedia2:flower ?flower }
Uh oh – there are a few states missing! Awesome as it is, DBPedia doesn’t know everything. In this case, some states are missing the dbpedia2:flower property. Fortunately, there’s the OPTIONAL keyword:
SELECT ?state ?flower WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . OPTIONAL { ?state dbpedia2:flower ?flower } }
Okay, that’s better. We don’t have 50 flowers, but at least we still have all the states. So which state has the biggest population? The ORDER BY clause can help:
SELECT ?state ?pop WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . ?state dbpedia2:poprank ?pop } ORDER BY ?pop
Go ahead and try it. It works, but not exactly, right? The problem is that DBPedia’s dbpedia:poprank property is not always an integer value. What does that mean? It means that sometimes a “1” isn’t a “1”, sometimes it is a “1”^^dbpedia:units/Rank. So what can you do about it? In this case, you can just cast it to an integer. This query will give you the states in order of population:
SELECT ?state ?pop WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . ?state dbpedia2:poprank ?pop } ORDER BY xsd:int(?pop)
So that’s useful and there are a number of conversion functions to do similar things. Now here’s something a little different – this query gets the latitude and longitude of the highest point in every state. With a little bit of scripting, you could plot these on a map:
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> SELECT ?state ?mtn ?lat ?long WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . ?state dbpedia2:highestpoint ?mtn . ?mtn geo:lat ?lat . ?mtn geo:long ?long }
Here it was necessary to declare the geo namespace using the PREFIX keyword. The DBPedia SPARQL endpoint already declares a bunch of namespaces for your convenience, but geo is not among them so we need to add it in. It is worth noting that the name geo is arbitary – you could have called it fred. Namespaces in widespread use have developed conventional prefixes that should be used to prevent confusion.
SPARQL also supports a limited set of comparison operations using the FILTER keyword. The query below lists states that joined after January 1st, 1850:
SELECT ?state ?date WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . ?state dbpedia2:admittancedate ?date FILTER ( ?date > "1850-01-01T00:00:00Z"^^xsd:dateTime ) }
The recommendation lists a number of comparisons http://www.w3.org/TR/rdf-sparql-query/#SparqlOps, though in practice these may not be 100% available in a particular implementation.
I’ll be back with Part 2 soon. As always, I hope this will be useful, so comments/feedback are always welcome – especially from people who can correct any mistakes printed here.
* Update: Here’s Part 2 – SPARQLing The Highest Point in Every US State
Java Applet Instrumentation
by craiget | January 22nd, 2009Java 6 has a neat feature that has received little attention: the Attach API
Using this API, you can hook your own code into a process already running on the JVM. It’s meant to build code-profilers and things of that sort, but with a little fooling around, you can attach to a running Applet and play with the live objects. I’ll describe my method – if you know a better way to do this, please let me know.
First, you need to turn off Applet Security. I’m lazy and went with a shotgun approach. Create a new .policy file:
grant { permission java.security.AllPermission; };
Then, set the browser plugin to use this policy. On Windows, try this:
- Open the Control Panel
- [doubleclick] Java Control Panel
- [click] Java tab
- [click] Java Applet Runtime Settings
- [click] View
- [click] Java Runtime Parameters
- [type] -Djava.security.policy=C:path_toyour.policy
- [click] OK
Same thing for Linux, just figure out where your Applet Runtime settings are.
Now you need a way to run your Agent. Annoyingly, the Agent must live in a jar file. You can build it on the command line using JAR, but I chose to do it programmatically to avoid the extra step. This program takes a single command line argument, the PID (process id) of the JVM the applet is running on. There’s different ways to find it. On Windows, try running “tasklist”. On Linux, “ps -A”. Place this file at “com/stuff/Runner.java”:
import java.io.File; import java.util.ArrayList; import java.util.List; import com.sun.tools.attach.VirtualMachine; public class Runner { public static void main(String args[]) throws Exception { //create .jar including all .class files under directory: com/stuff/ List<String> jarFiles = new ArrayList<String>(); for(String file : (new File("com/stuff")).list()) if(file.endsWith(".class")) jarFiles.add("com/stuff/"+file); String[] filenames = jarFiles.toArray(new String[]{}); String jarFile = System.getProperty("user.dir")+"/agent.jar"; JarUtil.jar(filenames, jarFile); //find PID of process to monitor String pid = args[0]; //attach agent.jar if(Integer.parseInt(pid) > 0) { VirtualMachine vm = VirtualMachine.attach(pid); vm.loadAgent(jarFile, null); } else { System.out.println("Bad PID: " + pid); } } }
Rad, so now for the Agent itself. When the JVM calls our Agent, it
passes an instance of Instrumentation, which has the interesting method
getAllLoadedClasses(). A Class isn’t very useful unless there are
static methods to get instances of the Class. Fortunately,
AppletPanelCache.getAppletPanels() comes to the rescue as a way to get
at Objects instead of Classes. Here’s MyAgent.java:
import java.lang.instrument.Instrumentation; public class MyAgent { public static void agentmain(String agentArgs, Instrumentation inst) { for(Class klass : inst.getAllLoadedClasses()) { if(klass.getName().endsWith("AppletPanelCache")) { Method m = klass.getMethod("getAppletPanels", new Class[]{}); Object[] panels = (Object[])m.invoke(null, new Object[]{}); for(Object panel : panels) { //do something interesting with an instance of Panel } } } } }
What can you do with a Panel? Well, let’s see, for an instance to be of any use in an Applet, it is likely connected *somehow* to the rest of those top level Panels where everything is shown. With a heavy dose of Reflection, you can recursively explore each Panel’s children, ultimately getting a reference to most (is it most or all? does anyone know?) of the live objects. Once you have a live object, you can do whatever you want – call methods, inspect fields, etc. The methods getComponents() and getWindows() are a good place to start.
So, how to run it? Depends where you have Java installed. Make sure tools.jar is on your classpath. I use this invocation:
- compile javac -classpath /your/path/to/tools.jar:. com.stuff.*
- run (first, open an applet in your webbrowser)
- java -classpath /your/path/to/tools.jar:. com.stuff.Runner -Djava.security.policy=/home/path_to/your.policy
Annoyances – When an Agent calls System.out.println(), it gets the Applet’s PrintStream instead of printing to the Console. My solution? Well, Runner *is* still connected to the Console, so have Agent send all its output to Runner over a socket. There may be a better way. Also, I’ve been unable to attach twice without shutting down and restarting Firefox – not sure exactly why.
One more neat thing you might try – hooking your own AWTEventListener into the AWT Event Processing pipeline:
if(klass.getName().endsWith(“Toolkit”)) { long mask = Long.MAX_VALUE; //call our listener on ALL event types Method m = klass.getMethod("getDefaultToolkit", new Class[]{}); Toolkit toolkit = (Toolkit)m.invoke(null, new Object[]{}); Method n = klass.getMethod("addAWTEventListener", new Class[]{AWTEventListener.class, long.class}); n.invoke(toolkit, new Object[]{new MyAWTEventListener(), mask}); }
One cute idea – dispatch MouseEvent and KeyEvent to make a video game “play” itself.
Remember to reset your Java Applet Runtime Settings, if you are concerned about such things.
References: Hotpatching a Java 6 Application by Jack Shirazi
Hello Sharepoint, Meet PHP
by craiget | January 22nd, 2009I am not a huge fan of Sharepoint, but we are starting to use it at work and it does do certain things well.
Trying to make Sharepoint do something non-obvious can be a lot of fun, especially on a slow day.
So I was trying to create a simple feedback form that would be stored by Sharepoint as a List. While this is well within Sharepoint’s capabilities, it ends up being kind of a pain. You end up either creating a custom WebPart or designing something in InfoPath or writing ASP code. None of these options really appeals to me, mostly because I don’t know much about Sharepoint and I don’t have priviledges to deploy code on the server. Luckily most of the Sharepoint API is exposed through Web Services. Just make a PHP-backed webpage, produce the XML and fire it off to Sharepoint – no problem, right?
Unfortunately, this is easier said than done. SOAP is typically an all-or-nothing proposition. You don’t “kind of” get a response, you either get success or an opaque error message.
I spent the better part of a morning piecing this together from a zillion blog posts. There are two big tricks – getting the XML right and using NTLM authentication.
This code sample requires the NuSOAP library for PHP – you could also do it with PHP’s native SOAP I suppose. You will need to have a fairly recent version of CURL to get the NTLM authentication.
<?php /* Requires the NuSOAP library */ require_once('lib/nusoap.php'); /* Your username and password, separated by a colon Domain may be optional, depending on your setup */ $auth = "domainusername:password"; /* Location of the Lists.asmx file If the list is in a subsite, the subsite must be in the path */ $wsdl = "http://domain.com/some-site/some-subsite/_vti_bin/Lists.asmx?WSDL"; /* GUID of the list */ $guid = "{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}"; /* Setup NuSOAP Sharepoint requires NTLM Authorization You need a fairly recent version of CURL installed for this */ $client = new soapclient($wsdl, true); $client->setCredentials("","","ntlm"); $client->setCurlOption(CURLOPT_USERPWD, $auth); /* XML for the request, add extra Fields as necessary */ $xml = ' <UpdateListItems xmlns="http://schemas.microsoft.com/sharepoint/soap/"> <listName>'.$guid.'</listName> <updates> <Batch> <Method ID="1" Cmd="New"> <Field Name="Title">My Title</Field> <Field Name="ABC">My Value</Field> </Method> </Batch> </updates> </UpdateListItems> '; /* Invoke the Web Service */ $result = $client->call('UpdateListItems', $xml); /* Check for Errors */ if(isset($fault)) { echo("<h2>Error</h2>". $fault); } /* Debugging Info */ echo("<h2>Request</h2>" . htmlspecialchars($client->request, ENT_QUOTES)); echo("<h2>Response</h2>" . htmlspecialchars($client->response, ENT_QUOTES)); echo("<h2>Debug</h2>" . htmlspecialchars($client->debug_str, ENT_QUOTES)); unset($client); ?>
That’s pushing data into Sharepoint. You can also pull data using the GetListItems Web Service. The code is basically the same with different XML.
