Detecting unnatural regularites in data at scale
Our approach is to mathematically model as many different characteristics of online behavior as possible, with as many different techniques as possible, so as to construct as high dimensional a model as possible. This approach accommodates the reality that sophisticated adversaries can frequently change many aspects of their behavior so as to avoid detection, but they can’t change all of them all of the time.
However, the most critical features involve evidence of coordination among different, apparently unrelated identities. The presence of such coordination directly implies something not above board. Most importantly, it is an objective measure that applies everywhere. Our technology has been proven in contexts as varied as detecting membership in violent street gangs in Detroit to large corporations posting positive phony employee reviews on career websites.
Our model avoids the increasingly insidious trap of trying to arbitrate ground truth. Instead, it focuses on capturing totally objective artifactual aspects of online behavior such as patterns of coordination among accounts, suspicious co-temporal characteristics, repetitions of unusually consistent dialog, anomalously constructed social network structures, and so on. In this way, our analytics remain free from political or other biases that are unavoidably inserted as a result of the subjective annotation of training sets – or that slip in based on easily gamed things like crowd-sourcing knowledge.
While our components do use ML when it is appropriate, we do not overuse it. For example, it is not generally suitable for rapidly evolving “arms race” scenarios, or those in which data is sparse. We prefer instead to study a problem in the field, from a number of different angles and only then determine the best mathematical expressions of it. This includes:
- Different types of lattices / Lattice theory
- Sheaf theory and Category theory
- Graph homology / cohomology
We likewise often develop own our algorithms to best detect specific phenomena of interest.