GGrantIndex
← Search

CRII: CSR: Towards Understanding and Mitigating the Impact of Web Robot Traffic on Web Systems

$174,319FY2015CSENSF

Wright State University, Dayton OH

Investigators

Abstract

This CSR-CRII project responds to the sudden rise of Web robot (a.k.a. Web crawler) traffic on Web systems around the world - from approximately 20% of all requests a decade ago to over 60% today. Because present Web systems' optimizations assume that the traffic serviced exhibit human-like patterns that robots do not, present robot activity on the Web may silently degrade performance, energy efficiency, and scalability of Web systems. As the Web continues to evolve towards a social platform where individuals upload extemporaneous thoughts and observations that only carry instantaneous value to organizations, and where the Internet of Things concept is expected to introduce millions of devices that collect data from the Web and submit requests to online services automatically, robot traffic will only rapidly increase in volume and intensity. For this reason, it is essential that we understand the impact of Web robot traffic on modern Web systems and devise technologies capable of mitigating their impact on system performance, energy efficiency, and scalability. This effort will synthesize our present understanding of robot traffic with machine learning tools, statistical analysis, and data science methods not previously considered in the context of Web traffic analysis and user behavioral modeling. It will improve our ability to understand the impact of robot traffic on Web systems by: (i) devising automatic methods to classify robots by their functionality and by the demands they impose; and (ii) develop novel robot traffic generators, tailored to a specific profile of robot types that can test how a system reacts to robot traffic of varying intensity and functional type mixtures. The project will also explore a prototype robot-resilient caching system that could lead to immediate performance payoffs for existing Web systems. The project will result in preliminary analytical models, empirical results, and prototype analysis software leading to longer-term research endeavors. Recent data from Web systems that provide services across many Web domains are immediately available for the project. The results of the project potentially may transform the way Web systems from single servers to large clouds are designed and optimized mitigating performance, energy efficiency, and the financial cost of servicing robots. Students to work on this project will be strategically recruited to broaden participation. Educational activities will provide students useful yet infrequently taught traffic analysis and Web systems security fostering stronger ties between knowledge engineering and cybersecurity student and research communities.

View original record on NSF Award Search →