Bipartite Graphs are interesting types of graphs that can be useful in many real-world scenarios. In the main part of the article, we would be looking at what actually are bipartite graphs, what are its applications and in the end, a use case would be presented based on one of its applications: Recommendation system. In order to solidify concepts, an introduction to graph theory to assist the better understanding of the article and some resources to learn about it would also be presented.

Table of Contents

  1. Introduction to Graph Theory.
  2. Introduction to Bipartite Graphs.
  3. Applications of Bipartite Graphs.
  4. Use Case : Graph based Recommendation System.
  5. Conclusion.

Introduction to Graph Theory

In simple words Graph Theory is the “ theory regarding the properties and applications of graphs “. A graph is a data structure that is defined by two components : a) Node or vertex b) Edges connecting the nodes.

The edges may represent metadata that defines the relationship between the nodes. For example in the case of social network the edges may represent the kind of relationship ( friend, follower, etc ) between the nodes that represent the users. Another important term that one needs to understand for this article’s perspective is Degree.The degree of the vertex v , written as δ(v) , is the number of edges with v as an end vertex. By convention, we count a loop twice and parallel edges contribute separately. This information is sufficient to follow this article but if you wanna delve deeper into graph theory then you will find Analytics Vidhya’s blog Introduction to graph theory helpful .

We will wrap this section by analysing a sample graph using networkx and nxviz libraries of python that are extremely helpful to deal with big networks.

We will model the above graph using networkx that provides some handy methods to analyse the networks.

Note: Make sure you have these libraries installed on your system before proceeding. You can do it easily by following the links provided above.

img

#importing libraries
import networkx as nx
import nxviz as nv
import matplotlib.pyplot as plt

NodesList=['A','B','C','D','E']
Graph=nx.DiGraph() #creates an instance of Directed Graph
Graph.add_nodes_from(NodesList) #adding nodes from a list

EdgeTuples=[('C','B'),('B','C'),('D','C'),('D','A'),('D','E'),('B','D'),('A','B'),('E','A')]  #list of tuples containing connected pairs

Graph.add_edges_from(EdgeTuples) #adding edges between nodes
print(Graph.nodes())
print(Graph.edges())
print(Graph.degree()) #will return the list of tuples ( containing degree of each node)

Output :

NodeView(('A', 'B', 'C', 'D', 'E'))
OutEdgeView([('A', 'B'), ('B', 'C'), ('B', 'D'), ('C', 'B'), ('D', 'C'), ('D', 'A'), ('D', 'E'), ('E', 'A')])
[('A', 3), ('B', 4), ('C', 3), ('D', 4), ('E', 2)]

Introduction to Bipartite Graphs

A bipartite graph, also called a bigraph, is a set of graph vertices decomposed into two disjoint sets such that no two graph vertices within the same set are connected. Nodes can only be connected to nodes in different partition. A bipartite graph is a special case of a k-partite graph with k=2.
Following is a bipartite graph containing two partitions: Users and Books.This kind of graph is obtained in the case of online shopping where users share relationship only with the products but not with other users. Similarly, no two books are related to each other. This perfectly qualifies to be represented by bipartite graphs according to the above definition.

graph

Now let us model above bipartite graph using networkx and nxviz.

Note : In order to make partitions we have to explicitly pass an argument bipartite to add_nodes_from method to define two partitions.

Import networkx as nx
Import matplotlib.pyplot as plt
From nxviz.plots import CircosPlot

Users=['user 1','user 2','user 3','user 4']
Books=['Book 1','Book 2','Book 3']
Graph=nx.Graph()
Graph.add_nodes_from(Users,bipartite='Users')  #passing bipartite argument
Graph.add_nodes_from(Books,bipartite='Books')  #passing bipartite argument 

EdgeList=[('user 1','Book 2'),('user 2','Book 2'),('user 3','Book 2'),('user 3','Book 1'),('user 4','Book 3')]

Graph.add_edges_from(EdgeList)
print(Graph.nodes(data=True)) #passing data=True will give metadata information

Output:

[('Book 1', {'bipartite': 'Books'}), ('Book 3', {'bipartite': 'Books'}), ('user 1', {'bipartite': 'Users'}), ('Book 2', {'bipartite': 'Books'}), ('user 4', {'bipartite': 'Users'}), ('user 2', {'bipartite': 'Users'}), ('user 3', {'bipartite': 'Users'})]

Each node has now a metadata associated with them that indicates in which partition they are. This will make things easier in the later parts of the article.

Import nxviz as nv


c = nv.plots.CircosPlot(Graph, node_color='bipartite', node_grouping='bipartite') #plotting circos plot which is nothing but nodes arranged in a circular manner and edges linking them together. 

# Draw c to the screen
c.draw()
# Display the plot
plt.show()

Output :

img

The plot obtained is called Circos Plot that Makes the visualization simpler. In this case we have grouped partitions and also coloured them accordingly.

Applications of Bipartite Graphs

Similarity Ranking

Bipartite graphs are used extensively in the online space , specifically in e-commerce domain and social network domain for Similarity Ranking . E-Commerce in real world is naturally modelled as a bipartite graph where Users forms one partition and Products forms another partition . This nature can greatly help us to measure the similarity between the users that can aid the recommendation of products that one user has bought to other person since both are similar in their choices.

Graph Based Recommendation System

Bipartite graphs can be used to make a simple recommendation system. We will introduce this application through an example which will be presented as a use case in the next section.

Recommending Github Repository to Users

Github is a kind of social network for developers in which users can collaborate with each other and can collaborate on different repositories according to their interests. This type of relationship between users and repositories can also be modelled as bipartite graphs with Users as one partition and Repositories as another. The relationship between them is that of contribution . A User contributes to Repositories. No two Repositories are connected to each other with the above relation and same goes for Users.

Below is a sample graph that shows the bipartite graph of Github network.

git

We can manually draw inference from above graph which will later be done with the help of networkx library. Below are the inferences drawn looking at the graph:

  1. User 1 contributes to DeepMind and SupportPredict Repositories based on his interest.
  2. User 2 do not contribute to any repo.
  3. User 3 being a data science enthusiast contributes to SupportPredict as well as DataViz.
  4. User 4 contributes to SupportPredict as well.

Now if we want to recommend User 1 to contribute to a repository, Which one would you recommend?

If your answer is DataViz , then you’re correct. Let us see how. User 1 contributes to SupportPredict , User 3 and User 4 also contributes to SupportPredict. Further User 3 contributes to DataViz. Hence it makes sense to recommend DataViz to User 1.

This is a smaller screenshot of a much larger network. In large network we can find the similarity ranking of two users depending on the common repo they contribute to.

Use Case : Graph based Recommendation System

Dataset Dataset we have with us is a Github Network consisting of Users and Repositories and links between them. Click here for the dataset.

Note : We will be using networkx and matplotlib library for the purpose of this use case. You can install them using below command.

$pip install matplotlib
$pip install networkx==1.1 
  1. We will start by loading our graph from a pickle file available with us.
import networkx as nx    # library for dealing with large networks
import matplotlib.pyplot as plt  # library for plotting
import nxviz as nv  # library for visualisation of graphs

Graph=nx.read_gpickle('git_network.p')   # loading graph from the file we have.
  1. Let us analyze the graph a little bit , to know the structure of the graph and what metadata is contained within the nodes.
print( list(Graph.nodes(data=True))[0] )   # accessing the first node and checking its metadata
print( list(Graph.edges(data=True))[0] )  # printing an edge sample and checking it's metadata

Output:

('u23432', {'bipartite': 'users'})
('u23432', 'p66574', {})

We can see that the nodes contain ‘ bipartite ‘ keyword that contains the information regarding the partition to which they belong. There are two partitions : ‘ users ‘ and ‘ projects ‘. We can also see that edges do not contain any metadata in them.

  1. Next we will write a function that returns the nodes from a given partition in a bipartite graph. We will be using list comprehension extensively in the code. If you are not familiar with the concept of list comprehension refer this.
# Define get_nodes_from_partition()
def get_nodes_from_partition(G,partition):
   '''
   G: Graph object
   partition: String , name of the partition we want info about.
   '''
   nodes=[n for n,d in G.nodes(data=True) if d['bipartite']==partition]
   return nodes   # this contains the node that belong to a particular partition

# Print the number of nodes in the 'projects' partition
print(len(get_nodes_from_partition(G, 'projects')))

# Print the number of nodes in the 'users' partition
print(len(get_nodes_from_partition(G,'users')))

Output:

11774
10677
  1. Finding ‘ User Similarity ‘ as mentioned in the Applications of Bipartite Graphs section is the most integral part of Recommendation System . In this code section we will be writing a function that takes in two nodes and returns the set of repository nodes shared between them. So let’s code .
def shared_partition_nodes(G,node1,node2):
   '''
   G: graph object
   node1: String
   node2: String
   '''
   assert G.node[node1]['bipartite'] == G.node[node2]['bipartite']

   nbrs1 = G.neighbors(node1)

   nbrs2 = G.neighbors(node2)

   # Compute the overlap using set intersections
   overlap = set(nbrs1).intersection(nbrs2)
   return overlap

print(shared_partition_nodes(G,'u7909','u2148'))
print(len(shared_partition_nodes(G,'u7909','u23432')))

Output:

{'p2000', 'p299', 'p607'}
0

It can be seen that projects ‘p2000’, ‘p299’, ‘p607’ are common to both ‘u7909’ and ‘u2148 ‘ and no common projects between ‘u7909’ and ‘u23432’

  1. We need to define the similarity score before proceeding to code it. Similarity Score can be defined as the ratio of no. of shared nodes between two users and no. of nodes in ‘projects’ partition.
def similarity_Score(G, user1, user2, proj_nodes):
   '''
   G: graph object
   user1: String
   user2: String
   proj_nodes: integer, no. of nodes in project partition
   '''
   assert G.node[user1]['bipartite'] == 'users'
   assert G.node[user2]['bipartite'] == 'users'

   shared_nodes = shared_partition_nodes(G,user1,user2)

   return len(shared_nodes) / proj_nodes

# Compute the similarity score between users 'u7909' and 'u2148'
project_nodes = get_nodes_from_partition(G,'projects')
similarity_score = similarity_Score(G,'u7909','u2148',len(project_nodes))

print(similarity_score)

Output

0.0002547987090198743

We can see from the above two code section that ‘u7909’ and ‘u2148’ have 3 repo in common and thus have a non zero similarity score.

  1. Last step in the recommendation system is to find the users most similar to the user we are recommending the repo to. Then from these list of users most similar to given user we can find the repo which is not common between them. This repository can be recommended to the given user. Let’s replicate this in our code.
def most_similar_users(G, user, user_nodes, project_nodes):
   '''
   G: graph
   user: String
   user_nodes: nodes in user partition
   project_nodes: nodes in project partition
   '''
   user_nodes = set(user_nodes)
   user_nodes.remove(user)
   similarities = defaultdict(list)
   for n in user_nodes:
       similarity = similarity_Score(G, user, n, len(project_nodes))
       similarities[similarity].append(n)

   max_similarity = max(similarities.keys())
   return similarities[max_similarity],max_similarity

user_nodes = get_nodes_from_partition(G, 'users')
project_nodes = get_nodes_from_partition(G, 'projects')

print(most_similar_users(G, 'u4560', user_nodes, project_nodes))

Output

(['u14984', 'u2800', 'u9525', 'u53', 'u1570', 'u363'], 8.493290300662476e-05)

The above output shows that the user ‘u4560’ is most similar to 6 output users.

Below code recommends projects from one user ( most similar to a given user ) to the given user.

def recommend_repositories(G, from, to):
   '''
     G : graph object
     from : String
     to : String
   '''
   from_repos = set(G.neighbors(from))
   to_repos = set(G.neighbors(to))
   # Identify repositories that the from_user is connected to that the to_user is not connected to
   return from_repos.difference(to)

# Print the repositories to be recommended
print(recommend_repositories(G, 'u14984', 'u4560'))

Output

{'p2033', 'p76356'}

In our case we wanted to recommend repos to ‘u4560’ for which we found the users that are most similar to the given users. Then we selected one user from the list of the most similar users and recommended the repos to which the selected user was connected to but not the given user. In this way a simple recommendation system is built.

Conclusion

In this article we looked at the ways in which the bipartite graphs can be useful in measuring similarity rankings between the users . This concept was extended to build a small recommendation system. There are many other networks that are naturally moulded as bipartite graphs and concepts learnt through this article can be applied to them. Hope you enjoyed the article and it helped you in some way.