I am developing a java application that uses ARQ to execute SPARQL queries using a Fuseki endpoint over TDB.
The application needs a query that returns the place of birth of each person and other person that was born in the same place.
To start, I wrote this SPARQL query that returns person_ids and the place of birth of each person.
prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?person_id ?place_of_birth
where {
?person_id fb:type.object.type fb:people.person .
?person_id fb:people.person.place_of_birth ?place_of_birth_id .
?place_of_birth_id fb:type.object.name ?place_of_birth .
FILTER (langMatches(lang(?place_of_birth),"en"))
}
LIMIT 10
----------------------------------
| person_id | place_of_birth |
==================================
| fb:m.01vtj38 | "El Centro"@en |
| fb:m.01vsy7t | "Brixton"@en |
| fb:m.09prqv | "Pittsburgh"@en |
----------------------------------
After that, I added a subquery (https://jena.apache.org/documentation/query/sub-select.html) adding other person who was born there, but I get more than one person related and I only need one.
prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?person_id ?place_of_birth ?other_person_id
where {
?person_id fb:type.object.type fb:people.person .
?person_id fb:people.person.place_of_birth ?place_of_birth_id .
?place_of_birth_id fb:type.object.name ?place_of_birth .
{
select ?other_person_id
where {
?place_of_birth_id fb:location.location.people_born_here ?other_person_id .
}
}
FILTER (langMatches(lang(?place_of_birth),"en"))
}
LIMIT 10
---------------------------------------------------
| person_id | place_of_birth | other_person_id |
===================================================
| fb:m.01vtj38 | "El Centro"@en | fb:m.01vtj38 |
| fb:m.01vtj38 | "El Centro"@en | fb:m.01vsy7t |
| fb:m.01vtj38 | "El Centro"@en | fb:m.09prqv |
---------------------------------------------------
I have tried to add a LIMIT 1 subquery but it seems that does not work ( the query is executed but never ends )
prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?person_id ?place_of_birth ?other_person_id
where {
?person_id fb:type.object.type fb:people.person .
?person_id fb:people.person.place_of_birth ?place_of_birth_id .
?place_of_birth_id fb:type.object.name ?place_of_birth .
{
select ?other_person_id
where {
?place_of_birth_id fb:location.location.people_born_here ?other_person_id .
}
LIMIT 1
}
FILTER (langMatches(lang(?place_of_birth),"en"))
}
LIMIT 3
Is there a way to return only one result in the subquery, or can I not do that using SPARQL.
You can use limits in subqueries. Here's an example:
select ?x ?y where {
values ?x { 1 2 3 4 }
{
select ?y where {
values ?y { 5 6 7 8 }
}
limit 2
}
}
limit 5
---------
| x | y |
=========
| 1 | 5 |
| 1 | 6 |
| 2 | 5 |
| 2 | 6 |
| 3 | 5 |
---------
As you can see, you get two values from the subquery (5 and 6), and these are combined with the bindings from the outer query, from which we get five rows in total (because of the limit).
However, keep in mind that subqueries are evaluated from the innermost first, to the outermost. That means that in your query,
select ?person_id ?place_of_birth ?other_person_id
where {
?person_id fb:type.object.type fb:people.person .
?person_id fb:people.person.place_of_birth ?place_of_birth_id .
?place_of_birth_id fb:type.object.name ?place_of_birth .
{
select ?other_person_id
where {
?place_of_birth_id fb:location.location.people_born_here ?other_person_id .
}
LIMIT 1
}
FILTER (langMatches(lang(?place_of_birth),"en"))
}
LIMIT 3
you are finding one match for
?place_of_birth_id fb:location.location.people_born_here ?other_person_id .
and passing the ?other_person_id binding out into the outer query. The rest of the outer query doesn't use ?other_person_id, though, so it doesn't really have any effect on the results.
The application needs a query that returns the place of birth of each person and other person that was born in the same place.
Conceptually, you could look at this as picking a person, finding their place of birth, and sampling one more person from the people born in that place. You can actually write the query like that, too:
select ?person_id ?place_of_birth (sample(?other_person_idx) as ?other_person_id)
where {
?person_id fb:type.object.type fb:people.person .
?person_id fb:people.person.place_of_birth ?place_of_birth_id .
?place_of_birth_id fb:type.object.name ?place_of_birth .
FILTER (langMatches(lang(?place_of_birth),"en"))
?place_of_birth_id fb:location.location.people_born_here ?other_person_idx .
filter ( ?other_person_idx != ?person_id )
}
group by ?person_id ?place_of_birth
This is a much trickier problem if you need more than one "other result" for each result. That's the problem in Nested queries in sparql with limits. There's an approach in How to limit SPARQL solution group size? that can be used for this.