neo4j find all nodes with matching properties

Paul picture Paul · May 29, 2013 · Viewed 22k times · Source

I have a relatively large set of nodes, and I want to find all pairs of nodes that have matching property values, but I don't know or care in advance what the property value is. This is basically an attempt to find duplicate nodes, but I can limit the definition of a duplicate to two or more nodes that have the same property value.

Any ideas how to proceed? Not finding any starting points in the neo4j docs. I'm on 1.8.2 community edition.

EDIT
Sorry for not being clear in the initial question, but I'm talking about doing this through Cypher.

Answer

Eve Freeman picture Eve Freeman · Jun 4, 2013

Cypher to count values on a property, returning a collection of nodes as well:

start n=node(*)
where has(n.prop)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;

Example on console: http://console.neo4j.org/r/k2s7aa

You can also do an index scan with the property like so (to avoid looking at nodes that don't have this property):
start n=node:node_auto_index('prop:*') ...

2.0 Cypher with a label Label:

match (n:Label)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;

Update for 3.x: has was replaced by exists.