In a Quid network of news stories, blue wine appeared in an outlying cluster that didn’t at first look like much. But on closer inspection, the trend had sizable impact in the news and deep social media traction — a good reminder that a network’s outlying nodes and clusters can sometimes be as valuable as the central ones.
As a natural language processing tool, Quid specializes in identifying central themes in any news narrative.
In a news network like this one, each node represents a news story and each colored cluster a topic within the larger public narrative.
By using cluster maps, we can apply metrics that help identify aspects of a news narrative, such as:
- Which topics are the most talked about?
- How much of the conversation do they represent?
- How similar are the topics?
While we use clustering to distill major themes in the overall conversation, what do we make of the outlying clusters and nodes?
First, it helps to understand why outliers are separated from the general conversation. An outlying cluster simply means the words used in articles in that cluster differ from those used in the general conversation.
While most of the conversation is composed of larger clusters — which represent a much higher volume of stories — the outlying clusters can represent ideas or topics outside those main themes. Because they don’t share language with stories in the main clusters, they appear as islands away from the mainland of the network. Less has been written about these ideas, so their clusters appear smaller in the network map — in some cases, just one or two stories.
But outlying clusters can be quite useful and shouldn’t be ignored. In fact, they can prove quite useful in predicting upcoming trends.
For example, we ran a network about wine and Millennials, who tend to be novice wine drinkers. We used the following search terms:
"novice wine trend"~20 OR "beginner wine trend"~20 OR "millennial wine trend"~20 OR "beginner wine prefer"~30 OR "millennial wine behavior"~30 OR "millennial wine"~15
In the network above, the main, central topics are Flavors & finishes, Local wine spots, and Editorial coverage.
But we also took a closer look at the outliers to understand wine topics being discussed at the edges of the network.
Isolating these outliers, we saw a whole range of interesting ideas, including Blue wine, Gigglewater (canned wine), Skinny prosecco, and wine being sold on eBay. Each cluster represents some aspect of the conversation regarding Millennnials/beginners and their interaction with wine.
Then in Quid, we took it one step further to identify which topics had the most social resonance.
Using our scatterplot function, we examined the outlying clusters. In this view, each node represents a cluster. Stories in the blue wine cluster had been shared at least 70,000 times via social media.
To be sure, plenty of outlying nodes and clusters turn out to be peripheral to an analysis or conversation. In those cases, the stories are likely low in volume, truly tangential to what you’re analyzing and therefore not that useful. In that case, it’s okay to filter or delete them.
But the example above proves that’s not always true. In fact, taking a second look at outlying nodes and clusters can help ensure that your analysis is truly comprehensive, including insights that are outside the mainstream conversation and a bit less obvious.
Intelligence in your inbox
Sign up for a monthly look into how data and visualization are changing the way we view the world.