High-end parallel systems present a tremendous research challenge on how to best allocate their resources to match dynamic workload characteristics and user habits that are often unique to each system. Although thoroughly investigated, job scheduling for production systems remains an inexact science, requiring significant experience and intuition from system administrators to properly configure batch schedulers. State-of-the-art schedulers provide many parameters for their configuration, but tuning these to optimize performance and to appropriately respond to the continuously varying characteristics of the workloads can be very difficult — the effects of different parameters and their interactions are often unintuitive.
In this paper, we introduce a new and general methodology for automating the difficult process of job scheduler parameterization. Our proposed methodology is based on online simulations of a model of the actual system to provide on-the-fly suggestions to the scheduler for automated parameter adjustment. Detailed performance comparisons via simulation using actual supercomputing traces from the Parallel Workloads Archive indicate that this self-adaptive parameterization via online simulation consistently outperforms other workload-aware methods for scheduler parameterization. This methodology is unique, flexible, and practical in that it requires no a priori knowledge of the workload, it works well even in the presence of poor user runtime estimates, and it can be used to address any system statistic of interest.
Barry Lawson and Evgenia Smirni. Self-Adaptive Scheduler Parameterization. Technical paper (TR-05-01). Math and Computer Science Technical Report Series. Richmond, Virginia: Department of Mathematics and Computer Science, University of Richmond, November, 2005.