Clustering in Gephi (Louvain Method)

puelo picture puelo · Apr 30, 2013 · Viewed 7.5k times · Source

i have started to work with gephi to help me display a dataset. The dataset contains:

tags (terms for a certain picture) as nodes

Normalized Google Similarity Distance between those tags as edges with a weight (between 0 und 1)

Every tag is connected to every other tag , as long as they both belong to the same picture. So i have one cluster of nodes and edges for every picture.

I have now imported this dataset to gephi in the following format:

nodes: id, label

edges: target, source, weight (between 0 and 1)

Like 500 nodes and 6000 edges.

My problem now is that after importing all those nodes and edges the graph looks kinda bunched with no real order. Every cluster of every picture is mixed into other clusters of other pictures. Now using Modularity as Partitition algorithm (wich should use the Louvain method) the graph is getting colored, each color represent a picture. Now i can split this mess, using the Force Atlas 2 Layout.

I now have a colored graph with something like 15 clusters (every cluster respresent 1 picture)

Now i want to cluster those clusters again using tags (nodes) according to their Normalized google distance (weight of the edges), wich should then be tags wich are somewhat equal in their meaning.

I hope you guys understand what i want to accomplish. I can also upload a picutre to clarify it.

Thanks a lot

Answer

Vincent Labatut picture Vincent Labatut · May 4, 2013

I don't think you can do that with the standard version of Gephi. You would need to develop a plugin to implement the very last step of your process.

Gephi is good for visualizing and browsing graphs, but (for now) there are more complete tools when it comes to processing topological properties. for instance, the igraph library (available in C, R and python) might be more appropriate for you. And note that you can use a file format compatible with both Gephi and igraph, which allows you to use both tools on the same data.