AI Research for Planning Optimization in Rillsoft Project
Rillsoft conducts practical AI research for real planning problems. The focus is not a generic assistant, but the core question of every project plan: how can resources and dates be better aligned under constraints? This page deepens the AI focus from the Rillsoft Roadmap and describes two concrete research scenarios for automated planning optimization.
This is explicitly about evaluation and development, not about finished product promises. We investigate how a model can learn to gradually transform a naive initial plan into a lower-conflict and more resource-realistic plan.
Rillsoft evaluates reinforcement learning methods (PPO/actor-critic) in two separate trainings – for the optimal assignment of staff to activities and for the even distribution of resource utilization – assessed by objective plan quality, not by merely replicating an existing heuristic.
How we arrived at the approach
The data infrastructure required for this research is not new and not speculative at Rillsoft, but has been proven for quite some time.
Rillsoft has been using randomly generated projects for automated testing of Rillsoft Project for a long time. With large, diverse random projects, planning functions can be checked systematically and reproducibly – across very many configurations that could never be covered manually. These automated tests have already considerably improved the functionality and stability of Rillsoft Project.
This is exactly where the idea for the AI research came from: the same mature generation infrastructure that today supports quality assurance can also deliver realistic project instances as training data for learning methods – in any quantity and with a clearly defined initial and target state. The data core of the research therefore builds on a real, proven tool and does not first have to be invented.
Why project planning is well suited for AI-supported optimization
Large project plans are a good application field for learning-based optimization:
- They contain many interdependent decisions about sequences, dates and resource assignments.
- A naive early-start plan – all activities as early as possible – is quickly created, but rarely realistic.
- A good reference plan is hard to create manually in real projects and is therefore particularly valuable as a learning target.
- Resource bottlenecks make planning iterative and conflict-prone: one shift triggers follow-up changes.
- AI can help to learn good rescheduling strategies systematically, instead of searching for them by hand for every plan anew.
In technical terms, this is a classic Resource-Constrained Project Scheduling Problem (RCPSP) with resource leveling – a problem class of process and resource planning that has been studied for decades.
Two separate optimization tasks
Good project planning combines two functionally different questions that can be solved better separately:
- Who? Which person works on which activity? (resource assignment)
- When? At what point in time does which activity lie so that utilization stays even? (resource leveling)
Rillsoft therefore investigates two separate AI trainings on the same production-proven generation infrastructure – each with its own initial state, target state and action space. Both follow the same basic principle: first create a functionally good target state, then deliberately degrade it and let the AI learn the way back – each with a different damage logic.
| Aspect | Training A: Staff assignment | Training B: Utilization leveling |
|---|---|---|
| Functional question | Who works on which activity? | When does which activity lie? |
| Initial state | Activities with roles, without staff | Earliest-date plan with conflicts |
| Target state | optimal staffing | evenly distributed dates |
| Core action | assign, reassign, swap staff | shift activity in time |
Important: Training B builds on the data basis of Training A (it takes over its staffed, distributed plan as raw material), but remains an independent learning problem with its own initial and target state.
Training A — Optimal staff assignment
The goal of this training is to assign a plan without staff the functionally best staffing per activity – based on the roles and taking vacations into account – so that all employees are evenly and realistically utilized.
How the training data pair is created
Here, too, the principle “target state first, then initial state” applies:
- A generated test resource pool is used – with sufficient roles, staff and a generated vacation plan per employee.
- A project of fixed length is created (for example three months).
- For each employee an activity is created and this employee is assigned to it; the activity initially runs over the full project duration. This way every employee is scheduled.
- Each activity is then split into several activities – with a random duration of about 5 to 10 days (Rillsoft Project function “split into several activities”). The partial duration is deliberately kept from getting too small.
- Activities with the same start and end are combined to increase complexity.
- The roles are derived from the assigned staff (Rillsoft Project function “determine roles from staff”). This produces the target state: a plan in which all employees are optimally and vacation-compatibly utilized.
- Finally, the entire staff is removed from the activities (Rillsoft Project function “remove staff from activities”). What remains are activities with roles only – that is the initial state, the starting point for the AI.
Initial and target state
- Initial state: activities with roles, but without concrete staff.
- Target state: the same activities with optimal staffing – even utilization, vacations taken into account.
- Action of the AI: assign a role-compliant, available staff member to an activity.
- Feasibility: only staff with a matching role and without a vacation collision is an admissible action (masking).
When an early assignment blocks later on
The assignment is not an independent per-activity problem, but a sequential, combinatorial matching. A locally plausible assignment – an employee on an activity that “also fits” – can prove blocking across several steps and prevent further assignments:
- The early-bound employee would have been the only one able to cover a later activity in a role-compliant and vacation-free way.
- For this later activity, no admissible staffing remains – a dead end.
- Purely forward-greedy chosen assignments aggravate this, because scarce specialists are bound to non-critical activities too early.
The AI must recognize and resolve such dead ends, instead of only assigning forward. Functionally, this is the difference between greedy matching and globally optimal assignment: an assignment that has already been made must be revisable, so that a scarce specialist becomes free again for the activity that only they can cover.
The resolution of such blockages draws on architectural ideas from multi-agent orchestration (such as Steve Yegge’s Gas Town, which keeps work persistent and traceable as a revisable ledger). Transferred to staff assignment, this means:
- Assignments stay revisable: they are traceable, reversible decisions rather than final commitments, so that a blocking assignment can be specifically withdrawn.
- Global view instead of local greedy choice: the AI takes the future staffability of scarce roles into account before binding a specialist early.
- Reassignment as a regular action: a blocking staffing is swapped to resolve a bottleneck, instead of merely passing on unstaffed activities.
In the action space of Training A there are therefore, besides “assign”, also “reassign/swap” and “withdraw assignment”, and the reward penalizes deadlocks and no-longer-staffable activities – not only the utilization quality of the current step.
Training B — Distribute resource utilization evenly
The goal of this training is to reschedule a temporally compressed, conflict-laden plan by shifting the activities so that utilization is evenly distributed (resource leveling) – without violating logical dependencies.
How the training data pair is created
- The starting point is the staffed, distributed plan from Training A.
- All existing links (precedence relations) are deleted.
- Two milestones are created – one at the beginning and one at the end of the project.
- Random, finite link chains are created that start from the start milestone and end at the end milestone. This keeps the activity network well-defined and acyclic. This state is the target state with even temporal distribution.
- Then all activities are shifted to the earliest possible start dates. The good plan is thereby deliberately laden with conflicts – it compresses in time and creates overloads. That is the initial state handed over to the model.
Initial and target state
- Initial state: earliest-date plan with overloads and conflicts.
- Target state: the previously created, evenly distributed plan.
- Action of the AI: shift activities so that the plan comes close to the target state.
- Feasibility: shifts must preserve the links (start and end milestone, finite chains) (masking).
This way the model learns the leveling problem specifically: it brings a naive earliest-date plan back into a state in which resource utilization is even over time.
The actor-critic approach
Planning is a sequential decision problem: every rescheduling changes the state and influences the next sensible action. Actor-critic methods of reinforcement learning are exactly suited for this.
- The actor proposes concrete changes to the plan – such as shifting an activity, reassigning a resource or changing a priority.
- The critic estimates the value of these changes, i.e. whether they move the plan toward a better state.
- Both components are trained jointly. As a result, the model learns not just rigid rules, but improvement strategies.
In both trainings the same actor-critic family is used, only with a different action space: in staff assignment the actions are assignments, reassignments and withdrawals; in utilization leveling they are temporal shifts of the activities.
Some technical specifics:
- As a concrete method, PPO (Proximal Policy Optimization) is investigated, an established actor-critic method for large, discrete action spaces.
- The planning state is encoded as an activity graph (directed network of precedence relations), suitable for graph-based learning methods for scheduling.
- Admissible actions are masked so that logical dependencies are always preserved (feasibility guarantee). The functional planning logic is therefore not violated.
- The approach follows the paradigm “Learning to Improve” – iterative plan repair starting from an initial state, not one-off plan generation from scratch.
How plan quality is assessed
So that improvements are not only claimed but measured, the research needs clear evaluation standards. Among others, the following are examined:
- number and severity of resource conflicts,
- plan stability after a rescheduling,
- project end date or schedule shift,
- utilization quality across the resources,
- proportion of realistically assignable activities,
- comparison against the heuristic baseline of automatic staff assignment.
The reward is specific to each training: in staff assignment, assignment quality counts (even utilization, role fit, no unstaffable activities or deadlocks); in utilization leveling, leveling quality counts (even utilization over time, conflict reduction, schedule adherence).
What is decisive is the separation of reward and diagnostic metric: the reward in training is the objective plan quality. The distance to the reference plan serves solely for evaluation, never as a training target. Where useful, methods can additionally be situated against established RCPSP benchmarks (such as PSPLIB). Invented performance figures without a sound basis are deliberately avoided.
Practical benefit for users
The research stays close to product practice. Possible benefits:
- better decision support in large project plans,
- faster detection of unrealistic baselines,
- more well-founded suggestions under resource scarcity,
- potential for later automation and scenario analysis,
- extension of the existing planning logic – no black-box replacement.
The topics tie in directly with the core functions of Rillsoft: resource planning, capacity planning and multi-project management.
Scientific context
The approach ties in with established methods of process planning and machine learning. The following works place the methodological context (they do not prove our own, not-yet-achieved results):
- Kolisch, R.; Sprecher, A. (1997): PSPLIB – A project scheduling problem library. OR Software – ORSEP Operations Research Software Exchange Program. European Journal of Operational Research. The work is a central reference point for RCPSP benchmarks.
- Hartmann, S.; Briskorn, D. (2022): An updated survey of variants and extensions of the resource-constrained project scheduling problem. European Journal of Operational Research, 297(1), 1–14. The survey situates classic and extended RCPSP variants.
- Zhang, C.; Song, W.; Cao, Z.; Zhang, J.; Tan, P. S.; Xu, C. (2020): Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning. Advances in Neural Information Processing Systems 33, NeurIPS 2020. The work stands for graph- and RL-based scheduling methods.
- Mao, H.; Schwarzkopf, M.; Venkatakrishnan, S. B.; Meng, Z.; Alizadeh, M. (2019): Learning Scheduling Algorithms for Data Processing Clusters. Proceedings of ACM SIGCOMM 2019, 270–288. The work shows reinforcement learning for scheduling under dependencies in complex systems.
- Chen, X.; Tian, Y. (2019): Learning to Perform Local Rewriting for Combinatorial Optimization. Advances in Neural Information Processing Systems 32, NeurIPS 2019. The work is relevant for the paradigm of iteratively improving existing solutions.
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. (2017): Proximal Policy Optimization Algorithms. arXiv:1707.06347. This publication describes the PPO method investigated here within the actor-critic family.
Research status and outlook
This research overview is directional and non-binding. It describes evaluation and development work, not guaranteed product features or release dates.
With this, Rillsoft pursues a robust, step-by-step path to smarter resource and schedule optimization – building on a proven planning core and a tried-and-tested data infrastructure. You can find more about the overall picture of the further development in the Rillsoft Roadmap. Are you missing a function or would you like to initiate a development direction? Get in touch.
