Scalable Model-Based Inference for Social Networks from Complex Sampling Designs

$212,500FY2014SBENSF

University Of California-Los Angeles, Los Angeles CA

Investigators

Abstract

This project will develop new statistical methodology for the analysis of social networks from hard-to-study populations. This new methodology is expected to result in more robust and statistically valid inferences from data on hard-to-study populations. Traditional approaches to surveys do not work well when the population under study is hard to find. Typical reasons are that the individuals within the population are hard to identify within a larger population, or the population is stigmatized and individuals therefore are less likely to participate in the survey. In these cases, traditional methods are very expensive to apply. Examples of such populations are unregulated workers, the self-employed, new migrants, the homeless, and injection drug users. For such populations, surveyors are starting to employ methods that use the network of social relations amongst the population to facilitate participation in the survey. While these methods are an effective way to collect data, it is a challenge to make scientifically valid conclusions from data collected in this way. This new methodology will be applied to a popular survey approach that is used, for example, in public health departments across the globe to estimate rates of HIV and other diseases. The methods and software developed as part of this project have the potential to impact the disease rates estimated by public health units and the policy decisions based on them. Data about social networks reflect both emerging social structures and the lens through which they are observed. There is a dearth of statistical methodology for the collection and analysis of network data that enable understanding the implications of these social structures. This project will further the development of a general model-based framework for the analysis of social networks in situations where the network is partially unobserved. The research will develop scalable, composite, likelihood-based inference for a class of models known as exponential-family random network models (ERNM). This work will increase the range of applicability of these models. The investigators will develop an ERNM-based likelihood model for respondent-driven sampling (RDS) and new, richer designs, such as privatized network sampling. The model will be validated via Monte Carlo simulation studies over a range of network sizes. Secondary analyses of existing RDS datasets also will be conducted. Overall, the research will provide a basis for scientific estimation in situations where the social relations and/or individual characteristics either are not evident due to the sampling design or non-response mechanisms. This will allow for the analysis of data collected using new sampling designs that use social relationships to improve statistical efficiency and robustness.

View original record on NSF Award Search →