GGrantIndex
← Search

AF: Small: Communication-Aware Algorithms for Dynamic Allocation of Heterogeneous Resources

$599,386FY2024CSENSF

Northeastern University, Boston MA

Investigators

Abstract

Modern computer infrastructure systems need to manage vast heterogeneous resources and run computations that are complex, distributed, and dynamic. Often, the distributed nature of the computations demands efficient communication; dynamic resources and computations demand online solutions without full knowledge of the future. For instance, scalable artificial intelligence (AI) requires the effective mapping of dynamic neural network computations into networks of computing devices. This project concerns the design of efficient and effective algorithms for scheduling large-scale computations in distributed infrastructure such as cloud systems and datacenter networks. The expected outcomes of the project are solutions for more effective processing of large AI tasks and resource allocation policies in cloud computing systems, with improved performance for business operations and mission-critical systems. The integrated educational component of the project includes training undergraduate and doctoral students in infrastructure algorithms, curriculum development, and outreach to engage high school students and inspire them to explore careers in math and computing. This project has two major thrusts. The first concerns the scheduling of precedence-constrained jobs and computation graphs in distributed networks and reconfigurable machines. This is motivated by the fact that as computational workloads get larger and more complex, it is often necessary to distribute many communicating jobs across a large network of devices. These devices may have different speeds, different computing capabilities, and restrictions on which jobs they can execute due to resource and security concerns. The second thrust concerns the online migration of computations and servers in a distributed system in response to dynamic requests. One motivation comes from data intensive applications that generate significant network traffic; to enable efficient communication among processes dispersed across many clusters, distributed systems are increasingly reconfigurable and strategically migrate processes to reduce communication. The presented problems include communication-aware scheduling of precedence-constrained jobs in networks with general delays, scheduling of split table jobs in reconfigurable machines, minimum-stretch embedding of graphs, heterogeneous variants of the classic online k-server problem, and online balanced graph partitioning. The technical approaches include new linear-programming based techniques that address communication and topological constraints, and new methods in online algorithms. This project also explores new frameworks for studying these problems, including learning-augmented algorithms through predictions of processing times and communication needs, reconfigurable architectures, and mobile ad hoc networks with heterogeneous devices. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

View original record on NSF Award Search →