Big Data, Low Transparency

Every day the Internet brings the world more information – and less understanding. It is crowded with voices, some genuine, others fabricated, but all with agendas. Many of these agendas are clear – such as brands openly advertising their products – while others are indirect or even covert. The relative proportions of these are changing, and not in favor of the more transparent.

Technologies and platforms change rapidly, but so do the motivations of the human actors behind the messages. User accounts are disposable and can be quickly repurposed. Images and videos can be fabricated, altered, or used out of context to champion almost any agenda. The Internet increasingly excels at bringing the reader an endless list of problems and then presenting answers that are often clear, simple, and wrong. This creates ground fertile for spin, and one in which having some of the facts is often worse than having none of the facts.

This means that increasingly, each individual piece of information must be understood in a broad and complex context, if it is to be truly understood at all. Exactly what is being said is often less important than who is really behind saying it – and why they are spending time and money to say it.

Businesses, governments, and the public at large will have to change their thinking in order to avoid being fooled by the increasing trickery and subterfuge in the online world. Many billions of dollars will be spent each year to disseminate information whose purpose is less than transparent – and still more will be spent with each passing year. Unfortunately, there is no silver bullet, or single technology approach that can be deployed to combat all of this.

Our approach is to mathematically model as many different characteristics of online behavior as possible, with as many techniques as possible, so as to construct as high dimension a model as possible. This approach accommodates the reality that purveyors of narratives will frequently change many aspects of their behavior so as to avoid detection, but they can’t change all of them all of the time.

Our model avoids the increasingly insidious trap of trying to arbitrate ground truth. Instead it focuses on capturing totally objective artifactual aspects of online behavior such as patterns of coordination among accounts, suspicious co-temporal characteristics, unusually consistent narratives, anomalously constructed social network structures, and so on. In this way, our analytics remain free from political or other biases that are unavoidably inserted as a result of the subjective annotation of training sets, or that slip in based on easily gamed things like crowd-sourcing knowledge.