The Science of Master Scheduling
How do you decide your project schedules? In this article, I explain the theory behind scheduling and how to create a master schedule. I also introduce pfd-tools, a tool based on the scheduling theory discussed here.
TL;DR
-
Once you have a PFD and element tables, you can mechanically compute a master schedule (using the
pfdplan&planmastercommands in pfd-tools) -
Using scheduling theory lets you pinpoint exactly where to improve (using the
criticalpathcommand in pfd-tools) - You can apply this to real projects and actually use it to create a master schedule
Preparing for the explanation
What is a master schedule?
In this article, I define a master schedule as something that specifies the start and end times of each phase defined in the project (the figure below is a made-up example).
Note that there is no formally agreed-upon definition of a master schedule. Various definitions exist (I will not cover other definitions here).
What is a process?
A process is an activity that takes deliverables as input and outputs deliverables.1
If you decompose an entire project into small processes and map them to higher-level phases, you can compute a master schedule. For example, assume that feature A is composed of feature A1 and feature A2, and feature A is not complete until both A1 and A2 are complete. If both A1 and A2 go through design, implementation, and system testing, then the high-level phases for feature A can be mapped to local processes as follows: design (A1 design, A2 design), implementation (A1 implementation, A2 implementation), and system testing (A1 system test, A2 system test):
The start time of a high-level phase is the earliest start time among its mapped processes. The end time of a phase is the earlier of (a) the start time of the next phase and (b) the latest end time among its mapped processes.
What is a Process Flow Diagram (PFD)?
A Process Flow Diagram (PFD) is one way to represent processes as a chain of deliverable transformations. It was proposed by Yoshio Shimizu.
In a PFD, deliverables are represented as rectangles,2 processes as ellipses, and the inputs and outputs of deliverables to/from processes as arrows.
These arrows explicitly show the dependency relationships between processes in the PFD. For example, if you pick one process and trace the arrows backward, you can see which processes must have completed before the chosen process can start.3
Among arrows, there is a special type called a feedback arrow that represents rework.4 If there is a feedback arrow from a deliverable to a process, that means the process must be re-executed when that deliverable is updated. Typical feedback arrows are, for example, from review comments or bug tickets back to implementation.
Strictly speaking, a solid arrow from a deliverable to a process means that the process at the arrow’s head cannot be executed unless the deliverable at the arrow’s tail is complete. In contrast, the dashed feedback arrow means that the process at the arrow’s head can be executed even if the deliverable at the arrow’s tail is not yet complete. In other words, implementation can start even without bug tickets, but once bug tickets appear, fixes (re-execution of implementation) are needed.
Information that does not fit in the PFD (detailed descriptions, etc.) is written in element tables. There are deliverable tables for deliverables, and process tables for processes.
Two important concepts are initial deliverables and final deliverables:
- Initial deliverables are deliverables that are not output from any process; in other words, they are given.
- Final deliverables are deliverables that are not used as input to any process; in other words, they are the project’s target deliverables. The project aims to output these final deliverables.
Scheduling theory
Scheduling theory is a framework that, given process dependencies and each process’s duration, defines all possible execution orders of processes and their total durations. Using scheduling theory, you can check whether an execution order follows the dependency constraints, and you can compute the total duration when that execution order is used.
Known scheduling theories include Program Evaluation and Review Technique (PERT), which uses PERT charts, and various models of the Resource-Constrained Project Scheduling Problem (RCPSP). Here, I introduce three models based on PFDs:
- Infinite resources, no feedback arrows
- Finite resources, no feedback arrows
- Finite resources, with feedback arrows (this is what pfd-tools calls the Finite resouces Single deliverable Model (FSM) model)
Among the models introduced here, the infinite-resources, no-feedback model is the simplest. So I will first explain the infinite-resources, no-feedback model. Then I will step up to a model that considers resource limits, and finally a model that also considers feedback arrows.
Model overview
Infinite resources, no feedback arrows
In this model, a process is considered executable if all of its input deliverables are ready and the process has never been executed before. Because there are no feedback arrows, there is no rework, and no process is executed more than once. PERT corresponds to this model.
This model assumes infinite resources, which makes it simple but unrealistic. Normally, executing a process requires a human resource, and in reality people cannot do multiple tasks at the same time (or their efficiency drops badly even if they can). Therefore, schedules computed with this method tend to have shorter durations than what is realistically achievable. If you want to compute more realistic durations, you need to use a model that considers resource limits, as described later.
Below is an example of how the execution of a software development process progresses under the no-rework assumption.
The total duration is computed as follows. If there are multiple processes, they are connected either in parallel or in series. The duration of parallel processes is the maximum of the durations of the processes executed in parallel. The duration of a serial chain of processes is the sum of the durations of the processes in the chain. By recursively applying this, the total duration is uniquely determined.
Finite resources, no feedback arrows
In this model, a process is considered executable if all of its input deliverables are ready, the process has never been executed before, and it can hold the required resources. There is still no rework because we do not consider feedback arrows. Compared to the previous model, we add the condition that resources must be held. In other words, multiple processes that execute at the same time cannot share the same resource. In the previous model, we could simply execute all processes that satisfied the conditions, but in this model, we must choose processes to execute such that their required resources do not conflict.
In this model, in addition to process dependencies, for each process we must specify which resources it needs and how much work those resources will consume. For example, for a process that only person A can handle, you might specify the required resource as “1 A” and set the work consumption to “1 person-day.” If work can be split among multiple people, you can model the work consumption to increase in proportion to the number of required resources.
Below is an example of how the execution of a software development process progresses without rework, under this model.
At time 2, processes P2 and P3 (which run from time 1 → 2) require overlapping resources when executed simultaneously, so only one of P2 or P3 can be executed. In this figure, we choose a plan where P2 is executed first, and then P3 is executed after P2 finishes.
The total duration is defined by a state transition model. A state records the current time and the remaining work for each process.
- In the initial state, the time is 0, and the remaining work for each process is equal to its estimated work.
- The next state is determined as follows. From the current state, we list all processes that are assignable. A process is assignable if all its input deliverables are ready and its remaining work is non-zero (equivalently, it has not yet completed once). By definition, if a process is assignable, then assigning the required resources makes it executable. Among all subsets of assignable processes, any subset in which no pair of processes has overlapping required resources is called a valid assignment set. From the current state, when we pick one element from a valid assignment set, we know the amount of work that will be consumed. We subtract that from the current remaining work, compute the shortest elapsed time until at least one of the remaining works becomes 0, and use that as the time of the next state. The time difference between the current state and the next state multiplied by the consumed work is subtracted from the remaining work to determine the next remaining work. In this way, we get a next state for each valid assignment set (so we get a state transition model where the assignments are the transition labels).
- If in some state the remaining work of all processes is zero (i.e., every process has been executed at least once), we call that state a completion state. For any process and any transition path, we will eventually reach a completion state.5
We call a sequence of assignments (i.e., a transition path) from the initial state to a completion state an execution plan. Any execution plan can be converted into a Gantt chart, so execution plans are important. The duration of each execution plan is the time of its completion state. Usually, there are multiple possible execution plans. Therefore, we want to search for an execution plan with the shortest duration. If you need a guaranteed shortest plan, you can use Dijkstra’s algorithm; if you want a reasonably short plan without a formal guarantee, you can use beam search; if you want a result very quickly, you can use a greedy algorithm.
Finite resources, with feedback arrows
In this model, a process is considered executable if all input deliverables except for those coming from feedback arrows are ready, and since the last time the versions of any of its input deliverables (including feedback deliverables) were updated, the process has not yet been executed, and it can hold the required resources. This model is implemented in pfd-tools.
In this model, every process can be executed more than once. Compared to the previous model, we add the condition of “whether version updates have been processed,” so we need the concept of deliverable versions. In the previous model, a deliverable was created only once, so we did not need versioning.
For feedback deliverables, we define a final version beyond which no further changes occur. Once a deliverable has reached its final version, we consider its version to remain unchanged thereafter. Otherwise, versions would keep changing forever and the process executions would never end. Since processes can be executed multiple times, we also need a way to compute the required work for the second and subsequent executions. In reality, the required work usually decreases with each rework. In this method, we model this as “each repetition uses x% of the previous work,” and so on.
Below is an example of how the execution of a small software development process progresses under this model
(you can check the progression with the pfdrun and pfdrungraph commands in
pfd-tools).
The method for computing the total duration is similar to the state transition model in the previous section. A state records the current time, the remaining work for each process, the set of deliverables whose versions have been updated but not yet consumed for each process, and the version numbers per deliverable.
- In the initial state, the time is 0, the remaining work is equal to the estimated work, and for processes that take initial deliverables as input, those initial deliverables are considered to have been updated but not yet consumed.
-
The definition of assignability differs slightly from the previous model. A process is
assignable if all of its input deliverables except those from feedback arrows are ready and at
least one of its input deliverables has a version that has been updated but not yet consumed. If a process
is assignable, then given the required resources it is executable. Among all subsets of
assignable processes, any subset in which no pair of processes has overlapping resources is a
valid assignment set. From the current state, when we pick an element from a
valid assignment set, we know the consumed work. We subtract that from the current remaining
work, compute the shortest elapsed time until at least one process’s remaining work becomes 0, and use
that as the time of the next state. We compute the time difference between the current state and the next
state and subtract “time elapsed × consumed work” to determine the base for the next remaining work. From
this base, for processes whose remaining work becomes 0 and will be re-executed, we add back the rework
amount (in this example, we set a re-execution work ratio x per process and compute rework as
estimated work * pow(x, number_of_executions). The number of executions of a process equals the version of its output deliverables). In this way, we obtain the next remaining work. So we again get a state transition model with the assignments as transition labels. - If in some state all processes are not assignable, we call that state a completion state. It is not yet fully proven that every path from the initial state eventually reaches a completion state, but empirically this seems to hold.
As before, we call a sequence of assignments (i.e., a transition path) from the initial state to a completion state an execution plan. Any execution plan can be converted into a Gantt chart, so execution plans are important. The duration of an execution plan is the time of its completion state. As with the previous model, there can be multiple execution plans, so it is useful to search for the plan with the shortest duration. If you need a guaranteed shortest plan, use Dijkstra’s algorithm; if you want a reasonably short one, use beam search; if you just want something fast, use a greedy algorithm.
Important elements not covered in detail
For brevity, I skipped two concepts that are actually necessary to compute a realistic master schedule. I will at least name them here.
Time-stamped initial deliverables
This is an extension that allows you to specify when an initial deliverable becomes available. You can model deliverables that will be created in the future by external processes and whose delivery date has been agreed upon.
Triggers
This is an extension that adds extra conditions to assignability. Normally, you do not start the next test level before the previous test level has finished. You can model this as a trigger condition that prevents the next test level from starting until the feedback loop of the previous test level is complete.
Master schedule
As mentioned earlier, once you map processes to phases, you can compute the master schedule from their start
and end times. The start and end times are all included in the execution plan, so as long as you have an
execution plan and a mapping from processes to high-level phases, you can compute a master schedule (in
pfd-tools, planmaster is responsible for this).
Critical path
A critical path is the set of processes such that increasing their duration or required work increases the overall duration. For example, assume there are two parallel processes A and B with durations 5 and 10, respectively.
If you increase the duration of process B, the overall duration increases, but if you increase the duration of process A, the overall duration does not change.
So B is on the critical path, whereas A is not.
If the duration of a process on the critical path becomes longer than planned, the overall duration also increases. Therefore, processes on the critical path are ones you must pay special attention to.
On the other hand, increasing the duration or required work of a process that is not on the critical path will not affect the overall duration, up to some limit. This maximum range within which you can increase a process’s duration or required work without changing the overall duration is called total float. The total float of every process on the critical path is zero.
If there are processes whose durations or required work you have not yet estimated, you can compute their total float and, by interviewing the team, check whether the sum of serial durations in that part will exceed the total float. That lets you quickly judge whether the overall duration is likely to increase. If the sum exceeds the total float, that process becomes part of the critical path, so it should be treated with care.
The critical path is also important when shortening the project duration. To shorten the duration, you need to repeat the following cycle:
- Identify the critical path.
- Improve all processes on that critical path.
- Go back to step 1.
You need to repeat this because, as you reduce the duration or work of processes on one critical path, there comes a point where the overall duration no longer decreases. For example, in the parallel A/B example, if you shorten the duration of process B below 5, the overall duration will not become shorter than 5.
This happens because the critical path switches from the one containing B to the one containing A.
In this way, each process on the critical path may have a minimum duration or required work such that reducing it further no longer reduces the overall duration. I call this the minimum elasticity value.6 Processes that have a minimum elasticity value are worth focusing on because improving them can shorten the overall duration.
To identify the critical path, fix an execution plan and, for each process you want to inspect, increase its
required work and check whether the overall duration increases. Assume all consumed work amounts are 1, let
x be the amount by which you increase the required work of the process, and let
y be the amount by which the overall duration increases. Then
x - y
is the total float. Processes with total float 0 are on the critical path. This is how you
identify the critical path (the criticalpath command does the same).
Similarly, again assuming consumed work is 1, take a process on the critical path, reduce its
required work to 0, and let z be the amount by which the overall duration decreases. Then
original_required_work - z is the minimum elasticity value. This is how you compute
the minimum elasticity value (the criticalpath command does the same).
Shortening the overall duration
Once you identify the critical path and improve all processes on it, you can shorten the overall duration. Possible ways to improve processes include the following.
Efficiency improvements
Turn serial processes into parallel ones
This corresponds to a graph operation that turns serial parts into parallel ones. The duration changes from a sum to a max, so you can shorten the duration. However, you often end up using earlier-stage deliverables instead of the originally planned deliverables, which makes this approach speculative.
Make work splittable
This corresponds to editing rows in the process table for resource assignments and consumed work. If there are idle resources, you can use them to increase the consumed work per unit time and thus shorten durations. For example, by decomposing deliverables so that independent parts can be created in parallel, you can realize this.
Shifting from Cost to Delivery
Reuse existing assets
This corresponds to a graph operation that removes all elements reachable when tracing backward from a deliverable and replaces them with initial deliverables. If the removed processes are on the critical path, their durations effectively become zero, which can significantly shorten the overall duration. However, the cost of acquiring the existing assets increases.
Automate with tools
This corresponds to loosening the resource assignment constraints and increasing consumed work in the process table. If using tools incurs costs, your financial cost will go up.
Add resources
This corresponds to adding rows to the resource table. Valid assignment sets increase, so if there are processes that can be split, you can increase the consumed work and shorten the duration. It is well-known from “The Mythical Man-Month” that adding people to a late project often makes it even later. However, if the PFD and element tables are sufficiently well-written, learning and communication costs may be relatively low, so I am less pessimistic than Brooks.
Shifting from Quality to Delivery
Reduce scope and thus reduce durations or required work
This corresponds to decreasing durations or required work in the process table. Note that this implies reducing the scope of all reachable deliverables (including final deliverables).
Accept the risk of quality degradation and reduce durations
This also corresponds to decreasing durations or required work in the process table. You take on the risk that final quality may become unacceptable. This should be avoided as much as possible.
How the schedule was created in this case
In this case study, I used the third model of scheduling theory described above (limited resources with feedback arrows) to create a schedule as follows. Whether the plan can actually be achieved with the intended schedule and quality is something that will only become clear going forward.
$ # Create the PFD collaboratively with stakeholders on diagrams.net in Google Drive, then download it locally:
$ cat ./pfd.drawio
<mxfile host="65bd71144e">
<diagram id="ZC_KSqUXoirMDHc_9cWT" name="P0">
...
$ # Assign IDs with pfdrenum:
$ pfdrenum -inplace ./pfd.drawio
$ # Immediately overwrite the PFD on Google Drive with this.
$ # Create an empty composite deliverable table:
$ printf "ID\tDescription\tDeliverables\n" > ./cd.tsv
$ # Generate the process table from the PFD:
$ pfdtable -p ./pfd.drawio -cd cd.tsv -t ap > ./ap.tsv
$ # Add extended columns to the process table:
$ edit ./ap.tsv
$ head -1 ./ap.tsv
ID Description EstimatedWork EstimatedReworkRatio RequiredResources StartCondition Group Milestone
$ # Generate the deliverable table from the PFD:
$ pfdtable -p ./pfd.drawio -cd cd.tsv -t ad > ./ad.tsv
$ # Generate the composite process table from the PFD:
$ pfdtable -p ./pfd.drawio -cd cd.tsv -t cp > ./cp.tsv
$ # Generate the resource table from the PFD and process table:
$ pfdtable -p ./pfd.drawio -cd cd.tsv -t r > ./r.tsv
$ # Create the process group table for the master schedule (not yet supported by pfdtable):
$ edit ./g.tsv
$ # Create the milestone table for the master schedule (not yet supported by pfdtable):
$ edit ./m.tsv
$ # Have all processes on the PFD estimated with two-point estimates (optimistic and pessimistic) collaboratively with stakeholders.
$ # Because we had no historical data, we roughly decided re-execution ratios and the final version counts of deliverables by trial and error.
$ # Record optimistic estimated work in ./ap1.tsv:
$ cp ./ap{,1}.tsv
$ edit ./ap1.tsv
$ # Record pessimistic estimated work in ./ap2.tsv:
$ cp ./ap{,2}.tsv
$ edit ./ap2.tsv
$ # Since specifying all the element tables every time is cumbersome, create project files:
$ echo '{"pfd":"pfd.drawio","atomic_process_table":"ap1.tsv","atomic_deliverable_table":"ad.tsv","resource_table":"r.tsv","composite_deliverable_table":"cd.tsv","milestone_table":"m.tsv","group_table":"g.tsv"}' > ./project1.json
$ echo '{"pfd":"pfd.drawio","atomic_process_table":"ap2.tsv","atomic_deliverable_table":"ad.tsv","resource_table":"r.tsv","composite_deliverable_table":"cd.tsv","milestone_table":"m.tsv","group_table":"g.tsv"}' > ./project2.json
$ # Check consistency. If pfdlint finishes normally, there is no problem.
$ # If it exits abnormally, fix the problems until it finishes normally:
$ pfdlint -f ./project1.json
$ pfdlint -f ./project2.json
$ # Create the optimistic execution plan:
$ pfdplan -f ./project1.json -poor -out-format plan-json | tee ./plan.json
{
"initial_state": {
"time": 0,
...
$ # Convert the execution plan into a format readable by Google Sheets Timeline view:
$ pfdtimeline -f ./project1.json -start 2025-04-01 -not-biz-days <(holidays -locale ja) ./plan.json | tee ./timeline1.tsv
AtomicProcess NumOfComplete AllocatedResources Description StartTime EndTime Start End
P1 0 Biz Do requirements analysis 2025-04-08 10:00:00 2025-04-22 10:00:00 5 15
...
$ # Paste into Google Sheets:
$ pbcopy < ./timeline1.tsv
$ # Create the pessimistic execution plan:
$ pfdplan -f ./project1.json -poor -out-format plan-json | tee ./plan.json
{
"initial_state": {
"time": 0,
...
$ # Convert the execution plan into a format readable by Google Sheets Timeline view:
$ pfdtimeline -f ./project1.json -start 2025-04-01 -not-biz-days <(holidays -locale ja) ./plan.json | tee ./timeline2.tsv
AtomicProcess NumOfComplete AllocatedResources Description StartTime EndTime Start End
P1 0 Biz Do requirements analysis 2025-04-08 10:00:00 2025-04-22 10:00:00 5 15
...
$ # Paste into Google Sheets:
$ pbcopy < ./timeline2.tsv
$ # Check the execution plans with Google Timeline view, look for unintended dependencies or missing triggers.
$ # If there is a problem, use pfdquery and others to investigate the cause and fix the PFD or process table.
$ pfdquery -f ./project.json P20
QUERY SOURCE RESULT
P20 PFD[desc] Release under IP restriction
...
$ # Check the difference between optimistic and pessimistic release times and report it as uncertainty.
$ # Create the master schedule using the optimistic estimate with a 1.5x buffer:
$ planmaster -ap ./ap1.tsv -g ./g.tsv -m ./m.tsv -p ./plan.json -b 1.5 -start 2025-04-01 -not-biz-days <(holidays -locale ja) | tee ./master.tsv
Group GroupDescription Milestone MilestoneDescription Start End
Admin Admin features M1 Requirements analysis 2025-04-10 14:30:00 2025-05-02 14:30:00
...
$ # Paste the master schedule into Google Sheets:
$ pbcopy < ./master.tsv
$ # At this point, check whether the deadline can be met.
$ # In this case, it could not be met, so we started process improvements.
$ # Perform critical path analysis. Processes whose TOTAL_FLOAT is 0 are on the critical path:
$ criticalpath -f ./project.json -poor | tee ./criticalpath.tsv
ATOMIC_PROCESS TOTAL_FLOAT MINIMUM_ELASTICITY
P1 0.00 10.00
P2 35.73 5.00
...
$ # For every process on the critical path, interview the owners about options that could be scoped out,
$ # get approval from the responsible person, and then scope them out. Update the estimated work accordingly:
$ edit ./ap1.tsv
$ edit ./ap2.tsv
$ # Repeat critical path analysis and continue scope reduction until the target is reached.
$ # Once the target is reached, fix the scope and recompute the master schedule:
$ pfdplan -f ./project1.json -poor -out-format plan-json | tee ./plan.json
$ planmaster -ap ./ap1.tsv -g ./g.tsv -m ./m.tsv -p ./plan.json -b 1.5 -start 2025-04-01 -not-biz-days <(holidays -locale ja) | tee ./master.tsv
Group GroupDescription Milestone MilestoneDescription Start End
Admin Admin features M1 Requirements analysis 2025-04-10 14:30:00 2025-05-02 14:30:00
$ # Update the master schedule in Google Sheets and you’re done:
$ pbcopy < ./master.tsv
The contents of the deliverables that appeared along the way (these were created arbitrarily for this article):
./pfd.drawio)
| ID | Description | Deliverables |
|---|---|---|
| D3 | Bug tickets for user-facing features in the BTS. | D3.1,D3.2,D3.3 |
| ID | Description |
|---|---|
| User | User-facing features |
| Admin | Admin features |
Conclusion
In this article, I introduced the theory of master scheduling and how it was applied in practice at Coincheck. To summarize:
- Once you have a PFD and element tables, you can mechanically compute a master schedule.
- Using scheduling theory lets you pinpoint exactly where to improve.
- You can apply this to real projects and actually use it to create schedules.
Footnotes
- ^ “A set of interrelated or interacting activities that transforms inputs into outputs.” ISO25000:2017
- ^ In Shimizu’s original proposal, deliverables are drawn with a “document” symbol. He also suggests that you don’t need to stick to the document symbol and can use any symbol that is easy to understand for the deliverable. Here, to keep the explanation simple, I use rectangles for all deliverables.
- ^ A representative notation for expressing process dependencies is the PERT chart (or arrow diagram). If you treat deliverables as events, processes as arrows, and annotate each arrow with the process duration, you can transform a PFD into a PERT chart. In other words, a PFD can be regarded as a superset of a PERT chart that emphasizes deliverables. It is a superset, not equivalent, because of the feedback arrows described later. Feedback arrows are not included in PERT charts.
- ^ In Shimizu’s original definition there are no dashed arrows. Dashed arrows here are my arrangement to make the direction of updates explicit.
- ^ The PFD, excluding feedback arrows, is a finite DAG, and all consumed work amounts are greater than 0. Therefore, in any state that is not a completion state, there is at least one assignable process. Hence the sum of remaining work strictly decreases with each transition, so we must eventually reach a completion state.
- ^ This is a term coined by the author. Here, I define elasticity as the extent to which changing the duration or required work of a local process changes the overall duration. With this definition, “total float + duration” becomes the maximum non-elastic value. There does not seem to be a PERT term corresponding directly to the minimum elasticity value.