Topological Sort
Last updated
Last updated
Topological sort is a technique used in graph theory to order the vertices of a directed acyclic graph (DAG).
It ensures that for every directed edge from vertex A to vertex B, vertex A comes before vertex B in the ordering.
It orders the vertices on a line such that all directed edges go from left to right. Such an ordering cannot exist if the graph is not a DAG and contains one or more directed cycle(s), because there is no way we can keep going on the right on a line and still return to a vertex already visited (we are talking about Back Edges here).
The topological sort is not necessarily unique.
An acyclic graph always has a topological sort
This is useful in scheduling problems, where tasks depend on the completion of other tasks.
Topological sort can sequence tasks while respecting all sequence constraints without any conflict.
Sink vertices are the only candidates for the final/last vertex position in the topological ordering.
We will search for the sink vertices, and as we get a sink vertex we will disconnect that vertex from the graph and will then backtrack and search for the next sink vertex along the search path, until all the vertices in the graph are visited.
This way we will get all the vertices in totally reverse order of that of what a topological ordering should be. So we would need to reverse the order to get the topological sort.
In DFS we print the nodes as we see them, which means when we print a node, it has just been discovered but not yet processed, which means it is in the Visiting state.
So DFS gives the order in which the nodes enter the Visiting state and not the Visited state.
For topological sorting we need to have the order in which the nodes are completely processed, i.e, the order in which the nodes are marked as Visited. Because when a node is marked Visited then all of its child node have already been processed
At any point during the DFS traversal of a graph we can divide the nodes in 3 groups
Nodes that we finished visiting, i.e. popped from the stack.
Nodes that are currently on the stack.
Nodes that are yet to be discovered.
This property actually tells us that any node is popped from the stack after all of its outgoing neighbours are popped. So in order to get a topological sorting we should just keep track of a list with the popped nodes in reverse order.
The BFS version for topological sort is known as Kahn's Algorithm.
The indegree of a node represents the number of arcs coming into a node. The outdegree represents the number of arcs going from a node.
The first node in a topological sorting should have the indegree equal to 0.
If the first node would have an arc coming into it, that particular arc would contradict the sorting property.
Find the in-degree of each node by initially calculating the number of incoming edges to each node.
Iterate through all the edges in the graph and increment the in-degree of the destination node for each edge.
Add all nodes with in-degree 0 to a queue.
While the queue is not empty:
Remove a node from the queue.
For each outgoing edge from the removed node, decrement the in-degree of the destination node by 1.
If the in-degree of a destination node becomes 0, add it to the queue.
If the queue is empty and there are still nodes in the graph, the graph contains a cycle and cannot be topologically sorted.
We can also use Kahn's algorithm to extract the lexicographically minimum topological sort by breaking ties lexographically. One can simply replace the queue
with a priority_queue
to implement this extension.
Topological Sorting with graph visualisation
DFS is a highly efficient way to compute topological sort of a graph in linear time
If we are on node that has an outgoing edge to another node . If the graph is a DAG we can be certain that falls in categories 1 or 3. If would be on the stack, than the arc would close a cycle, and this is impossible on a DAG.
Time Complexity: The outer for loop will be executed V number of times and the inner for loop will be executed E number of times.
Auxiliary Space: The queue needs to store all the vertices of the graph. So the space required is
Time Complexity: The outer for loop will be executed V number of times and the inner for loop will be executed E number of times.
Auxiliary Space: The queue needs to store all the vertices of the graph. So the space required is