GGrantIndex
← Search

Collaborative Research: Large-scale Database Construction of Firms' Organizational Form, Competition, and Industry Change

$250,000FY2016SBENSF

University Of Southern California, Los Angeles CA

Investigators

Abstract

This project proposes to analyze how organizations and industries change over time by building a large-scale database compiled from Internet web pages of over 1,000,000 private and public firms, and an analytical web-based tool to provide efficient access to this database. This large-scale database will contain information on products and services directly offered by public and private firms to their customers over the last 20 years, as well as links to the US Patent and Trademark Office's patent data for the last 20 years. The resulting database and analysis tool will enable researchers to answer many questions, including: How do industries, competition in industries and their products change over time? What are the dynamics of product introduction rates for private and public firms and how did private firms and their array of product offerings change during the recent financial crisis? What products are introduced by public and private firms following increases in patenting activity within an industry? Do the patenting firms introduce the new products or do non-patenting firms? Which government policy changes were most effective in stimulating the growth of entrepreneurial firms, and in what kind of markets did these policies work best? What local product market conditions are most conducive to successful entry by entrepreneurial firms and how do waves of innovation impact product market competition? What economic forces trigger firms to cross the boundary between public and private status? In addition to impacting academics that study innovation and how industries change over time, business decision makers, consumers, and regulators will benefit from the new industry designations. Businesses can use the database and web based tool to assess existing market structure around new products. This will facilitate more informed decisions about where and when to commit scarce resources to enter new markets. By examining the nature of existing competition and market structure by both public and private firms, large and small, entrepreneurs can also better assess the likely success of their new ventures. Regulators including the Securities and Exchange Commission (SEC) and the Department of Justice (DOJ) can also benefit from refined knowledge of industry structure and product market boundaries. This project will produce a highly scalable approach for mining historical Web data to create comprehensive product-based databases and tools to query and analyze this data. This integrated database will be built using firm web pages from the Internet Archive Wayback Machine project. Using the text from these web pages, the resulting database will classify firms as competitors, which will be used to build new industry definitions. These new industry definitions will be based directly on the product and service descriptions firms use to interface with customers. The project will produce a publicly available web-based analytical tool based on "Elasticsearch" (www.elasticsearch.com). This tool will allow users to query the database of public and private firms, their competitors, and the products they offer. Using this tool, a user will be able to input a company, or list of companies, and the tool will provide that company's product descriptions and a list of its competitors that produce similar products, along with the corresponding similarity scores, indicating the strength of competitive links. The analytical tool will also allow a firm's local product market to be visualized as a network. In addition to the hosted, online search tool, users will be able to download bulk datasets to perform their own local processing.

View original record on NSF Award Search →