EAGER: Investigating linguistic dimensions in cross-domain authorship analysis
University Of Alabama At Birmingham, Birmingham AL
Investigators
Abstract
The ability to analyze a text and determine with certainty the author of that document, a task known as Authorship Attribution (AA), can help build a case against an online abuser, determine the trustworthiness of a document, and can also support this country's fight against terrorism by analyzing online communities of interest. This EArly Grant for Exploratory Research investigates new approaches for AA in two specific cross-domain settings: where both the topic and genre of the test documents differ from those of the training data. The research study departs from the standard single feature vector representation in text classification settings and follows a framework where the writeprint of authors is represented as a set of linguistic dimensions. The goal is to understand how each dimension will change in the new domain. The findings from this exploratory research will show the feasibility of building new text representations and approaches for text classification problems where there are larger domain shifts between the training and testing data and the breakout representation into linguistic dimensions is suitable. The results and findings from this work will contribute to building longer-term research projects which will be able to tackle more challenging cross-domain settings.
View original record on NSF Award Search →