I cannot describe my problem formally due to my bad English; let me tell it using an example. The table below is actually grouped by 'subject','predicate'.
We define a set on rows, if they the same 'subject'. Now I want to combine any two sets if they contain the same 'predicate's, sum the 'count' of the same 'predicate', and count the number of distinct subjects which have a same set.
subject predicate count
s1 p1 1
s1 p2 2
s2 p1 3
s3 p1 2
s3 p2 2
Therefore, what wanted from this table is two sets:
{2, (p1, 3), (p2, 4)},
{1, (p1,3)}
where in the first set, 2 indicates there are two subjects (s1 and s3) having this set; (p1,3) is the sum from (s1, p1, 1) and (s3, p1, 2).
So how can I retrieve these sets and store them in Java?
How can I do it using SPARQL?
Or, firstly store these triples in Java, then how can I get these sets using Java?
One solution might be concat predicates and counts,
SELECT (COUNT(?s) AS ?distinct)
(group_concat(?count; separator = \"\\t\") AS ?counts)
(group_concat(?p; separator = \" \") AS ?propset)
(group_concat(?c; separator = \" \") AS ?count
?s ?p ?c
} GROUP BY ?propset ORDER BY ?propset
Then the counts could be decoupled, then sum up. It works fine on small dataset, but very time consuming.
I think I will give up this weird problem. Thank you very much for answering.
Let's start with
select ?predicate (sum(?count) as ?totalcount)
?subject ?predicate ?count
group by ?predicate
That's the basic bit, but the grouping isn't right (now clarified).
The grouping variable should be like this (hope this is the right syntax):
select ?subject (group_concat(distinct ?p ; separator = ",") AS ?propset)
?subject ?p ?c
group by ?subject
I hope that gives:
subject propset
s1 "p1,p2"
s2 "p1"
s3 "p1,p2"
So the final query should be:
select ?predicate (sum(?count) as ?totalcount)
?subject ?predicate ?count .
select ?subject (group_concat(distinct ?p ; separator = ",") AS ?propset)
?subject ?p ?c
group by ?subject
group by ?propset ?predicate
Does that work?