I have a relatively large set of nodes, and I want to find all pairs of nodes that have matching property values, but I don't know or care in advance what the property value is. This is basically an attempt to find duplicate nodes, but I can limit the definition of a duplicate to two or more nodes that have the same property value.
Any ideas how to proceed? Not finding any starting points in the neo4j docs. I'm on 1.8.2 community edition.
EDIT
Sorry for not being clear in the initial question, but I'm talking about doing this through Cypher.
Cypher to count values on a property, returning a collection of nodes as well:
start n=node(*)
where has(n.prop)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;
Example on console: http://console.neo4j.org/r/k2s7aa
You can also do an index scan with the property like so (to avoid looking at nodes that don't have this property):
start n=node:node_auto_index('prop:*') ...
2.0 Cypher with a label Label:
match (n:Label)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;
Update for 3.x: has
was replaced by exists
.