Finding Minimum Completion Time of Scheduled Tasks with Topological Sort

Question 1

Finding Minimum Completion Time of Scheduled Tasks with Topological Sort

algorithm topological-sort

1110101001 · Jan 1, 2014 · Viewed 7.9k times · Source

Answer

Answer

I am assuming that the tasks do not have cyclic dependencies. As such, the tasks and their dependencies can be represented as a directed acyclic graph (abbreviated as dag from this point), where tasks are vertices, and an edge u -> v means that task u must be completed before task v can begin.

Your method is right in using topological sort to compute the order in which the tasks can be done, since the graph is a dag. I see that there are 2 main questions that you have.

Computing the tasks which can possibly be done simultaneously
Computing the minimum amount of time to complete all the tasks

Number 2 is simpler, so I shall address it first. Finding the minimum time to complete all the tasks is not difficult once you have computed the topological order. It is essentially finding the longest path in dag, you may have heard its application in the Critical path method. Essentially, the minimum amount of time needed to complete all the tasks is the time needed to complete the chain of tasks with the longest duration. This may seem counterintuitive at first glance, but the idea is that eventual completion of all tasks is dependent on how long we need to complete any chain of tasks, and hence, this is dependent on the time needed to complete the chain of tasks that take up the longest time.

Here is a pseudocode of the algorithm (I think it should be correct, but because I haven't been doing this for a bit of time, I may be wrong, so do verify yourself):

min_time_for_tasks(Graph, topological order):
    distance <- integer array whose size is number of vertices
    set all entries in visited array to negative infinity
    maxDist <- 0
    for v in topological order:
        if distance[v] == negative infinity:
            maxDist <- max(maxDist, longest_path(Graph, distance, v))
    return maxDist

// computes the longest path from vertex v
longest_path(Graph, distance, v):
    maxDist <- 0
    for all edges (v,w) outgoing from v:
        // vertex w has not been visited
        if distance[w] == negative infinity:
            longest_path(Graph, distance, w)
        // get the longest path among all vertices reachable from v
        maxDist <- max(maxDist, distance[w])
    distance[v] <- maxDist + time to complete task v
    return distance[v]

As for Number 1, computing the tasks which can be done simultaneously, this is slightly more tricky. I haven't actually done this before, but I do think that it can be done using a topological sort algorithm called Kahn's algorithm (it's the first algorithm for Topological Sort on wikipedia, link is here). I don't know about you, but normally I perform topological sort using a depth first search and a stack, and then popping vertices from the stack to obtain the order. Essentially, Kahn's algorithm is a topological sort algorithm, and with some slight modifications, it should be able to solve our first problem. I haven't done this before, so please verify this again. Pseudocode with explanation in the comments:

// This is a modified kahn's algorithm.
// Again, the graph is assumed to be a dag, and we first compute
// the number of incoming edges to every vertex, since Kahn's
// algorithm is dependent on this information.
// This function will return an array of sets of tasks which
// can be done at the same time.
// So all tasks in the set in returnArray[0] can be done at the same
// time, all tasks in the set in returnArray[1] can be done at the
// same time, etc. Explanation will be available after the
// pseudocode
compute_simultaneous_tasks(Graph):
    into <- integer array, all entries initialized to 0
    // this function is defined below
    compute_incoming_edges(Graph, into)
    returnArray <- empty array
    VSet <- Set of all vertices in the graph
    while VSet is not empty:
        // retrieve all vertices with no incoming edges
        // this means their prerequisite tasks have been completed,
        // and we can execute that particular task
        headVertices <- all vertices `v` in VSet such that into[v] is 0

        // remove all the vertices in headVertices from VSet
        VSet <- VSet.remove(headVertices)

        // we are going to remove the vertices in headVertices,
        // so the edges outgoing from them will be removed as well.
        for each vertex v in headVertices:
            for each vertex w outgoing from v:
                into[w] <- into[w] - 1

        // all the vertices in headVertices can be started at the
        // same time, since their prerequisite tasks have been
        // completed
        returnArray.append(headVertices)

    return returnArray

// computes the number of edges leading into each vertex
// so into[x] gives the number of edges leading into vertex x
// from some other vertex
compute_incoming_edges(Graph, into):
    for each vertex v in Graph:
        for each vertex w outgoing from v:
            into[w] <- into[w] + 1

So the compute_simultaneous_tasks function will give return an array of sets. Each set contains vertices/tasks which can be done at the same time. To get the main idea of why this is the case, inside the main loop of compute_simultaneous_tasks, we extract all vertices which have no incoming edges. This is equivalent to extracting all tasks with no prerequisites. Hence, we can see that it is safe to execute this set of tasks simultaneously, since they have no prereqs, and are certainly not dependent on each other! Then, we remove their outgoing edges, to simulate their completion. Then we move on to the next iteration of the loop, etc.

So we see that at each iteration of the while loop, we are doing what we just described, and hence it is safe to execute all tasks in returnArray[0] at the same time, all tasks at returnArray[1] at the same time but only after all tasks in returnArray[0] are done, all tasks in returnArray[2] after those in returnArray[1] are done, and so on.

So in this modification of Kahn's algorithm, instead of removing one vertex with 0 incoming edges at one step, we remove all vertices with 0 incoming edges at one step, in order to obtain all the tasks which can be done simulateneously in a "wave", with returnArray[0] being wave 0, returnArray[1] being wave 1, etc. Take note that we only remove outgoing edges from all vertices in a wave after we have extracted all the vertices with 0 incoming edges.

While I am not able to give a formal proof, I do believe that if one was to complete all the a wave simultaneously, one will be able to complete all the tasks in the minimum time we computed above, due to how dependencies are taken into account of.

Hope that helps, and Happy New Year =)

Question 2

Assume that there are an unlimited number of workers each of which can complete one task, each of which takes some time. There are also precedence constraints where one task cannot be completed until another is. What is the minimum amount of time needed for each task to be completed while respecting the precedence order?

As an example, lets say you have 3 tasks.

The first task takes 10 units of time to complete, the second takes 5 units of time to complete, and the third takes 6 units of time to complete.

The constraint is that the 2nd task cannot be started until the 3rd task is finished.

Given this, you can conclude that the minimum time needed for all the tasks to be completed while respecting the precedence is 11.

This is because you can do tasks 1 and 3 (which take times 10 and 6 respectively) simultaneously, and after task 3 is finished, you can begin on task 2 (which takes 5 units of time). Thus, all tasks will end at time 11.

It seems like this could be solved with a topological sort, which would give you the order in which to complete tasks in order to satisfy the constraints.

For example, in the problem, after topologically sorting the tasks you would end up with

Task 1 (10 Units of time) -> Task 3 (6 Units of time) -> Task 2 (5 Units of time)

However, once you obtain this order in which go about the tasks, how do you determine which ones can be done simultaneously and which ones need to be done one after the other? Further, how do you compute the minimum time needed for all of them to finish?

Finding Minimum Completion Time of Scheduled Tasks with Topological Sort

Answer

Related questions