CNS Core: Small: Towards Hybrid Data Center Switching Using Partially Reconfigurable Circuit Switch
Georgia Tech Research Corporation, Atlanta GA
Investigators
Abstract
Fueled by the growth of cloud computing, data center networks (DCN) continue to grow relentlessly in both size and speed. A highly cost-effective approach to this scalability problem, called hybrid DCN architecture, has received considerable research attention in recent years. A hybrid DCN employs two technologies to interconnect racks of computers in data center: a much faster and less expensive circuit switch that is reconfigurable with some performance cost, and a traditional packet switch. The research problem of hybrid switching is to near-optimally schedule the circuit switch, so that it removes as much traffic as possible from the packet switch. Previous work on hybrid switching solves this optimization problem based on a convenient assumption that the circuit switch is not partially reconfigurable. This is however an outdated and unnecessarily restrictive assumption because electronic and optical technologies underlying today's circuit switches can readily support partial reconfiguration in the following sense: Only the input ports affected by the reconfiguration need to pay a reconfiguration delay, while unaffected input ports can continue to transmit data during the reconfiguration. Allowing partial reconfiguration can significantly increase the throughput and reduce system delay, thus significantly improving data center operation. However, it also leads to a host of challenging research questions that will be tackled in this project. This project will engage students through integrated classroom curriculum and research training that span multiple disciplines, and includes outreach efforts to underrepresented minorities. The PI will work closely with leading networking and systems providers to facilitate technology transfer. Under the assumption of partial reconfiguration, the non-preemptive scheduling of the circuit switch in a hybrid switching system can be modeled as an Open Shop Scheduling Problem (OSSP), referred to as switching OSSP. Switching OSSP has opened up a new research direction teeming with open problems, because many algorithmic results (approximation ratio, NP-hardness, etc.) on general OSSP no longer apply. This project will investigate the following five research tasks on switching OSSP. First, it will study the approximation ratios of BFF (Best Fit First) under various traffic workloads, and design deterministic or randomized algorithms that can, with high probability, avoid or mitigate the worst-case scenarios. Second, the project will investigate research challenges that arise when a DCN is interconnected by a giant virtual switch comprised of a network of interconnected bufferless circuit switches. Third, it will study partially reconfigurable hybrid and optical switching when each rack has multiple transmitters/receivers, which appears to be a new and challenging problem that is fundamentally different than anything found in the literature. Fourth, for a data center with 100 or more racks, BFF still takes tens of milliseconds to compute, which is roughly an order of magnitude longer than the ideal level of latency (no more than a few milliseconds) desired by cloud providers. The reseearchers will study how to significantly reduce the computation time of BFF so that it remains fast enough when the DCN size becomes much larger in the future. Last, they will investigate how to accommodate ``last-minute" traffic arrivals during the transmission of the current batch (corresponding to arrivals during a previous scheduling epoch), which if successful, can further improve the delay performance of their solutions. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
View original record on NSF Award Search →