Using igraph for a non-network problem
I recently had a problem of the following form: suppose I have data on some companies, but these companies names’ might have changed over the years. For example, a company might have started out as Apple, then become Banana, and then Cabbage. In my use case, I had this information as a dataframe of pairs of company names. For example:
With this as a starting point, I want to standardize the name of the company to be “Apple” across all years, i.e. I want to tell R that Apple = Banana = Cabbage --> Apple = Cabbage
. I couldn’t think of a particularly clean way to code this up.
Then it hit me that this was just a network problem! I was just trying to find the variants of company names that were connected to each other (below is an illustration and a terrible excuse to play with the ggraph
package).
Enter igraph
, a popular R package for dealing with network data. I just need to use igraph
functions to convert the dataframe of pairs of company names into an (undirected) graph (i.e. a network), and then it can tell me which company names belong to the same group by finding the connected components of the network.
I’m not necessarily recommending this as the best way to handle the problem. I just think it’s fun when you’re able to reformulate a problem for a different domain, and in this case it also lead to a reasonably straightforward solution.
Leave a Comment