Preliminary Notes on Authorial Obfuscation

This is a brief outline of some papers and notes I’ve collected on authorial obfuscation that I would like to flesh out into a post at some point in the future. I keep getting into a paper writing mindset toward my blogposts when these are supposed to be for fun. I also got tired of looking at this draft and I think this might be useful to someone as is.

I started thinking about this topic because I was looking at what people have accomplished since the Anonymouth Project. The software for Anonymouth has severe bitrot at this point but several people have come up with more modern approaches in subsequent papers.

Obfuscation in this area has a long history of techniques though in my opinion most of these are vast overkill for the problem space. The goal of obfuscating a text while retaining the intended meaning seems on its face like it should be a trivial problem to solve.

The complexity arises when researchers attempt to force the obfuscation process to result in a document with the same level of penmanship as what went in. Scrambling a text is trivial, but having that text be both obfuscated and a grammatically correct translation of the original text is a herculean task.

The most popular technique that is in practical use for this kind of thing is translator chaining. This is referred to in the academic literature as “round trip machine translation”.

This technique consists of filtering text through a series of different translators from one random language to another, and then returning the translation back to the original language. Another variant of this process leaves the text in a different language than it began in.

A fun project idea would be a webapp that hooks into various translator APIs to automate this process for people doing this with large amounts of text. Doing the translator chaining by hand using web interfaces is very tedious. Having a model that allows you to swap between several different API’s and translator services would definitely be a useful public service.

It would also be possible to develop a tool like this that operated with some offline translation software tool that could be used to reduce the digital footprints involved. There is obviously a risk that the text being translated to be saved and recovered by the provider of the API(s) that were used to do the chaining.

Such an issue means that the project would be required to provide access to several alternative services that are hosted within different legal jurisdictions. Different actors would chose the legal range of their respective threat model. If they were in China for example they might use the Yandex Translate API or the Google Translate API rather than Baidu for their secret message.

The chaining method has been studied several times, and was introduced in Rao et al. Their proposal in this paper only involved testing single round trips between languages (e.g. English → Chinese → English). Later research showed that such a low number of rounds is not at all effective for document anonymization purposes.

Many papers that followed the original Anonymouth paper construct diverse solutions which seem at first glance to be excessively complex. Many of the subsequent techniques involve complex NLP or machine learning.

A much simpler solution to make the Rao et al method work is simply add more translations into the chain. This can create an incredibly obfuscatory effect. The problem with this solution should be clear to anyone.

When you chain too many translators together the information content of the original message is disturbed significantly. To fix this you would have to take the chained text and potentially introduce several minor modifications back into it until the original meaning is restored somewhat.

Adding back in information is a huge problem with the technique as every modification after the translator chain you are reducing the entropy that you just went to terrible lengths to inject into the text. This is the fine line that most of the approaches here have to walk.

It is easy to take a text and obfuscate it until it is no longer readable and claim that it is now anonymous. Anonymous gibberish without meaning isn’t relevant to anyone’s interest. The question is where you draw the line between obfuscation and anonymity while still retaining a cohesive sense of meaning.

I would like to come back to this post and write my thoughts on newer techniques and approaches to the idea that people have created. For now I don’t have the time though, so here is partial bibliography I would use in such a post that includes some relevant papers and abstracts in this small subfield.

A Girl Has No Name: Automated Authorship Obfuscation using Mutant-X (2019)

We propose a genetic algorithm based random search framework called Mutant-X which can automatically obfuscate text to successfully evade attribution while keeping the semantics of the obfuscated text similar to the original text.

Specifically, Mutant-X sequentially makes changes in the text using mutation and crossover techniques while being guided by a fitness function that takes into account both attribution probability and semantic relevance. While Mutant-X requires black-box knowledge of the adversary’s classifier, it does not require any additional training data and also works on documents of any length.

We evaluate Mutant-X against a variety of authorship attribution methods on two different text corpora. Our results show that Mutant-X can decrease the accuracy of state-of-the-art authorship attribution methods by as much as 64% while preserving the semantics much better than existing automated authorship obfuscation approaches.

While Mutant-X advances the state-of-the-art in automated authorship obfuscation, we find that it does not generalize to a stronger threat model where the adversary uses a different attribution classifier than what Mutant-X assumes. Our findings warrant the need for future research to improve the generalizability (or transferability) of automated authorship obfuscation approaches.

Anonymouth Revamped: Getting Closer to Stylometric Anonymity (2013)

The alterations we have made to Anonymouth aim to solve or mitigate the aforementioned usability and reliability issues. The first change made was to bring features together into words to remove and words to add.

However, simply providing users with two lists of words does little to help them to change the ways they are expressing their ideas—which is important. To this end, we have added two-way translations to Anonymouth’s repertoire.

Two-way translations are applied to each sentence using 15 different languages. The “translations” (the new English versions), like the list of words to add/remove, are sorted by Anonymity Index (AI)—a summation across features of the product of the number of occurrences of a feature, the feature’s information gain, and the amount the feature must be changed (the distance it is from its target value).

For all features in a word/sentence/document—to determine which words/sentences will contribute to the document’s anonymity. The user is presented with the sorted list of alternative sentences, and is able to swap a translated version of a sentence in for the original version; with the idea being that it is generally feasible to find a “translation” that alters the way a given idea is expressed. After the swap, the user is free to correct any errors the “translation” may have introduced.

Heuristic Authorship Obfuscation (2019)

We introduce an approach that (1) models writing style difference as the Jensen-Shannon distance between the character n-gram distributions of texts, and (2) manipulates an author’s subconsciously encoded writing style in a sophisticated manner using heuristic search.

To obfuscate, we explore the huge space of textual variants in order to find a paraphrased version of the to-be-obfuscated text that has a sufficient Jensen-Shannon distance at minimal costs in terms of text quality loss.

We analyze, quantify, and illustrate the rationale of this approach, define paraphrasing operators, derive obfuscation thresholds, and develop an effective obfuscation framework. Our authorship obfuscation approach defeats state-of-the-art verification approaches, including unmasking and compression models, while keeping text changes at a minimum.

Author Obfuscation: Attacking the State of the Art in Authorship Verification (2016)

We report on the first large-scale evaluation of author obfuscation approaches built to attack authorship verification approaches: the impact of 3 obfuscators on the performance of a total of 44 authorship verification approaches has been measured and analyzed.

The best-performing obfuscator successfully impacts the decision-making process of the authorship verifiers on average in about 47% of the cases, causing them to misjudge a given pair of documents as having been written by “different authors” when in fact they would have decided otherwise if one of them had not been automatically obfuscated.

The evaluated obfuscators have been submitted to a shared task on author obfuscation that we organized at the PAN 2016 lab on digital text forensics. We contribute further by surveying the literature on author obfuscation, by collecting and organizing evaluation methodology for this domain, and by introducing performance measures tailored to measuring the impact of author obfuscation on authorship verification

ER-AE: Differentially-private Text Generation for Authorship Anonymization (2020)

We propose a novel text generation model with the exponential mechanism for authorship anonymization. By augmenting the semantic information through a REINFORCE training reward function, the model can generate differentially-private text that has a close semantic and similar grammatical structure to the original text while removing personal traits of the writing style.

It does not assume any conditioned labels or paralleled text data for training. We evaluate the performance of the proposed model on the real-life peer reviews dataset and the Yelp review dataset. The result suggests that our model outperforms the state-of-the-art on semantic preservation, authorship obfuscation, and stylometric transformation.

Doppleganger Finder: Taking Stylometry To The Underground (2014)

We consider two scenarios. The first involves text written by an unknown cybercriminal and a set of potential suspects. This is standard, supervised stylometry problem made more difficult by multilingual forums that mix l33t-speak conversations with data dumps. In the second scenario, you want to feed a forum into an analysis engine and have it output possible doppelgängers, or users with multiple accounts.

While other researchers have explored this problem, we propose a method that produces good results on actual separate accounts, as opposed to data sets created by artificially splitting authors into multiple identities.

For scenario 1, we achieve 77% to 84% accuracy on private messages. For scenario 2, we achieve 94% recall with 90% precision on blogs and 85.18% precision with 82.14% recall for underground forum users.

We demonstrate the utility of our approach with a case study that includes applying our technique to the Carders forum and manual analysis to validate the results, enabling the discovery of previously undetected doppelgänger accounts.

Fighting Authorship Linkability with Crowdsourcing (2014)

In this paper, we start by showing that the problem is actually worse than previously believed. We then explore ways to mitigate authorship linkability in community based reviewing. We first attempt to harness the global power of crowd sourcing by engaging random strangers into the pro-cess of rewriting reviews.

As our empirical results (obtained from Amazon Mechanical Turk) clearly demonstrate, crowdsourcing yields impressively sensible reviews that reflect sufficiently different stylometric characteristics such that prior stylometric linkability techniques become largely ineffective.

We also consider using machine translation to automatically We also consider using machine translation to automatically rewrite reviews. Contrary to what was previously believed,our results show that translation decreases authorship linkability as the number of intermediate languages grows. Finally, we explore the combination of crowdsourcing and machine translation and report on results.

The Case for Being Average: A Mediocrity Approach to Style Masking and Author Obfuscation (2017)

Here we propose an approach that changes the text, so that it is pushed towards average values for some general stylometric characteristics, thus making the use of these characteristics less discriminative.

The approach consists of three main steps: first, we calculate the values for some popular stylometric metrics that can indicate authorship; then we apply various transformations to the text, so that these metrics are adjusted towards the average level, while preserving the semantics and the soundness of the text; and finally, we add random noise.

This approach turned out to be very efficient, and yielded the best performance on the Author Obfuscation task at the PAN-2016 competition.

Nondescript, my web app, guides writers in achieving stylometric obfuscation in order to preserve anonymity. The app runs simulations of authorship attribution scenarios by analyzing the user’s linguistic features.

In this paper, I will describe the conception of the Nondescript app; discuss related work; and present the results of a user study. Most users in the study were able to anonymize their writing in at least 5 out of 10 authorship attribution scenarios.

Users rated the anonymization process an average of 3.6 out of 5 in terms of ease of use. This work-in-progress project is situated in two domains: privacy technologies and computational linguistics.

Deception in Authorship Attribution (2013)

In this thesis we explore the performance of authorship attribution methods in adversarial settings where authors take measures to hide their identity by changing their writing style and by creating multiple identities.

We show that using a large feature set, it is possible to distinguish regular documents from deceptive documents with high accuracy and present an analysis of linguistic features that can be modified to hide writing style. We show how to adapt regular authorship attribution to difficult datasets such as leaked underground forum and present a method for detecting multiple identities of authors.

We demonstrate the utility of our approach with a case study that includes applying our technique to an underground forum and manual analysis to validate the results, enabling the discovery of previously undetected multiple accounts.

Breaking the Closed-World Assumption in Stylometric Authorship Attribution (2014)

This paper proposes the “classify-verify” method that augments classification with a binary verification step evaluated on stylometric datasets.

This method, which can be generalized to any domain, significantly outperforms traditional classifiers in open-world settings and yields an F1-score of 0.87, comparable to traditional classifiers in closed-world settings.

Moreover, the method successfully detects adversarial documents where authors deliberately change their styles, a problem for which closed-world classifiers fail.

Secure Obfuscation of Authoring Style (2017)

In this paper we first show that existing obfuscation systems are insecure and propose a general approach for constructing obfuscation algorithms, and then instantiate the framework to give an algorithm that semi-automatically modifies an author’s document.

We provide a secure obfuscation scheme that is able to hide an author’s document securely among other authors documents in a corpus. As part of our obfuscation algorithm we present a new algorithm for identifying an author’s unique words that would be of independent interest.

We present a security model and use it to analyze our scheme and also the previous schemes. We implement our scheme and give its performances through experiments. We show that our algorithm can be used to obfuscate documents securely and effectively.

Unsupervised Authorship Attribution (2018)

We describe a technique for attributing parts of a written text to a set of unknown authors. Nothing is assumed to be known a priori about the writing styles of potential authors. We use multiple independent clusterings of an input text to identify parts that are similar and dissimilar to one another.

We describe algorithms necessary to combine the multiple clusterings into a meaningful output. We show results of the application of the technique on texts having multiple writing styles.