GGrantIndex
← Search

NeTS: Small: Towards Exposing and Mitigating End-to-End TCP Performance and Fairness Issues in Data Center Networks

$300,000FY2012CSENSF

Purdue University, West Lafayette IN

Investigators

Abstract

While TCP was designed to work in any environment, it has a long history of problems whenever introduced in newer environments (e.g., wireless networks, satellite networks, high-capacity networks) that it was not explicitly designed and tested for. A similar new frontier that challenges TCP is the data center and cloud environments that have become popular in recent times. This project will conduct research on exposing end-to-end challenges that TCP faces in the modern cloud computing and data center environments and propose solutions to address them. Specifically, it focuses on three major issues surrounding TCP in data center and cloud environments: First, in virtualized cloud environments, when multiple VMs share the CPU, CPU access latency for each VM (i.e., the interval during which a VM waits for the CPU) is in the order of tens/hundreds of milliseconds and can be orders of magnitude higher than the typical sub-millisecond network RTTs. This high RTT causes significant reduction in TCP throughput. Second, in multirooted data center networks, under certain conditions, TCP connections sharing a common link exhibit severe unfairness. Finally, in data center networks today, equal-cost multipath routing (ECMP) is often used to split traffic across multiple paths, but it can potentially cause significant load imbalance. Researchers have refrained from suggesting packet-level multipath routing in the past because of its ability to cause reordering that may reduce TCP throughput. It is however not clear whether this poor interactions with multipath routing exists even under symmetric topologies such as fattrees, or more generally, multi-rooted tree topologies. Intellectual Merit: The goal of this project is to comprehensively investigate these afore-mentioned three major issues discussed above, propose new solutions to address the problems, and finally, validate these solutions and hypotheses using real prototype implementations and through extensive evaluations. (1) To address TCP's performance deterioration in virtualized cloud environments, the project will explore a new approach called transport function delegation, where certain TCP functions are delegated to the driver domain or hypervisor. (2) It will conduct extensive experimentation using real testbeds to expose, and characterize the unfairness issue, and also propose and evaluate classic solutions (e.g., RED) and a new routing algorithm called equal-length routing to mitigate this problem. (3) It will revisit the conventional wisdom that fine-grained multi-path traffic splitting protocols interact poorly with TCP, in the context of data center networks which have regular topologies such as multi-rooted trees. Broader Impact: The broader impact of this research comprises of the following: (1) It will help improve the key aspects of TCP such as performance and fairness in data center and cloud environments, that will benefit all data center systems and applications. (2) Results of this research will be transferred to industry. (3) Results of this research will be integrated into courses such as operating systems, computer networks, and cloud computing. It will also provide training to graduate students and several Ph.D. theses are expected to come out of this research. (4) It will include participation of minorities (under-represented minorities and women).

View original record on NSF Award Search →