Using GROUP BY, COUNT and SAMPLE in apache jena SPARQL

MassStrike picture MassStrike · May 1, 2013 · Viewed 16.1k times · Source

So I have an RDF schema that contains many "groups", and each of these groups has a "name", and contains a number of "elements". I need to select the name of every group, along with the number of elements for each. Here is a sample of a group RDF schema...

<Group rdf:ID="group_actinoid">
    <name rdf:datatype="&xsd;string">Actinoid</name>
    <element rdf:resource="#Ac"/>
    <element rdf:resource="#Th"/>
    <element rdf:resource="#Pa"/>
    <element rdf:resource="#U"/>
    <element rdf:resource="#Np"/>
    <element rdf:resource="#Pu"/>
    <element rdf:resource="#Am"/>
    <element rdf:resource="#Cm"/>
    <element rdf:resource="#Bk"/>
    <element rdf:resource="#Cf"/>
    <element rdf:resource="#Es"/>
    <element rdf:resource="#Fm"/>
    <element rdf:resource="#Md"/>
    <element rdf:resource="#No"/>
</Group>

...and here's the query I've been trying to get to work...

  1 PREFIX pt:<http://www.daml.org/2003/01/periodictable/PeriodicTable#>
  2 PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
  3 PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  4 PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
  5
  6 SELECT (SAMPLE(?name) AS ?NAME) (COUNT(?elem) AS ?ELEMENTCOUNT)
  7 WHERE {
  8         ?group rdf:type pt:Group .
  9         ?group pt:name ?name .
 10         ?elem pt:element ?group .
 11       }
 12 GROUP BY ?group

...but I'm getting an empty result and I'm not really sure why. I should be getting a group name, along with however many elements that group contains, for every group in the owl file.

Answer

Joshua Taylor picture Joshua Taylor · May 1, 2013

It's much easier to answer these kinds of questions if a minimal working example is provided (e.g., a complete RDF dataset that we can query over). For instance, in the above, since we don't know the XML base of the document, we can't know whether the individual described by <Group rdf:ID="group_actinoid">...</Group> will actually match the pattern ?group rdf:type pt:Group.

Here's some data based on yours, but which contains another group so that we can see the grouping and aggregation:

@prefix pt: <http://www.daml.org/2003/01/periodictable/PeriodicTable#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.

pt:actinoid
  a pt:Group ;
  pt:name "Actinoid" ;
  pt:element pt:Ac ;
  pt:element pt:Th ;
  pt:element pt:Pa ;
  pt:element pt:U ;
  pt:element pt:Np ;
  pt:element pt:Pu ;
  pt:element pt:Am ;
  pt:element pt:Cm ;
  pt:element pt:Bk ;
  pt:element pt:Cf ;
  pt:element pt:Es ;
  pt:element pt:Fm ;
  pt:element pt:Md ;
  pt:element pt:No .

pt:beatles
  a pt:Group ;
  pt:name "Beatles" ;
  pt:element pt:John ;
  pt:element pt:Paul ;
  pt:element pt:George ;
  pt:element pt:Ringo .

Here's a SPARQL query that is very similar to yours (although I used some of the shorter forms where possible), and corrected the swapped ?element pt:element ?group to ?group pt:element ?element. With this SPARQL query, we get the kinds of results that it sounds like you're looking for.

PREFIX pt:<http://www.daml.org/2003/01/periodictable/PeriodicTable#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
SELECT (SAMPLE(?name) AS ?NAME) (COUNT(?element) as ?NELEMENTS)
WHERE {
  ?group a pt:Group ;
         pt:name ?name ;
         pt:element ?element .
}
GROUP BY ?group 

Calling the data groups.n3 and the query groups.sparql, here are the results produced by Apache Jena's command line ARQ:

$ /usr/local/lib/apache-jena-2.10.0/bin/arq  --data groups.n3 --query groups.sparql
--------------------------
| NAME       | NELEMENTS |
==========================
| "Beatles"  | 4         |
| "Actinoid" | 14        |
--------------------------

When I run the same query on the data at http://www.daml.org/2003/01/periodictable/PeriodicTable.owl (after downloading and saving as PeriodicTable.owl), I get the names and counts shown in the following:

$ /usr/local/lib/apache-jena-2.10.0/bin/arq \
      --data ~/Downloads/PeriodicTable.owl \
      --query groups.sparql
--------------------------------------------------
| NAME                               | NELEMENTS |
==================================================
| "Lanthanoid"^^xsd:string           | 14        |
| "Noble gas"^^xsd:string            | 7         |
| "Halogen"^^xsd:string              | 6         |
| "Actinoid"^^xsd:string             | 14        |
| "Chalcogen"^^xsd:string            | 6         |
| "Pnictogen"^^xsd:string            | 6         |
| "Coinage metal"^^xsd:string        | 4         |
| "Alkali metal"^^xsd:string         | 7         |
| "Alkaline earth metal"^^xsd:string | 6         |
--------------------------------------------------