CAREER: Authorship Analysis in Cross-Domain Settings
University Of Alabama At Birmingham, Birmingham AL
Investigators
Abstract
Authorship Analysis (AA) is the task of extracting characteristics from written documents that can help to determine authorship of a document, generate a profile of the author, or identify cases of plagiarism. AA can be used for historical purposes, to settle disputes over the original creators of a given document, and to build a prosecution case against an online abuser. Most previous work in AA assumes the availability of samples with known authorship that closely match the domain of the documents of interest. A strong assumption like this one limits the applications of AA approaches. This program addresses this key outstanding challenge by designing robust frameworks for scenarios with different cross-domain degrees: cross-topic, cross-genre and cross-modality (text vs. transcribed speech). The project leverages the large amounts of free text available representing each cross-domain setting to learn general lexical and syntactic distributional correspondences. These correspondences are used to map the out-of-domain texts to a representation that is closer to the target domain. Direct contributions of this research include new approaches to extract and embed cross-domain prior knowledge into AA models in the form of distributional trajectories; and a solid understanding of the influence of topic, genre, and modality in the feature engineering process for AA that will also be helpful in other text processing tasks. This research will make direct contributions to the field of forensic linguistics, which is of major relevance for national security. The PI will design an advanced seminar in computational approaches for forensic linguistics and will expand her ongoing educational and outreach activities for underrepresented groups in the STEM disciplines. The PI will integrate opportunities for international visits to key research labs for the graduate students involved in the program that will enrich their training and provide great networking opportunities.
View original record on NSF Award Search →