Hi I’m Gilad

I love data, analysis and visualization. Chief data scientist at betaworks.

Reaction to Brian Solis’s Interest Graphs

I was pointed to the following blog post by Brian Solis, The Interest graph on Twitter is Alive – studying Starbucks top followers. Brian’s post defines an “interest graph” as a subset of the social graph around a certain topic. He claims that while Social Graphs of follower/following relationships were interesting, the interest graph is a step beyond, it is a focused network that shares “more than just a relationship”. While I’m a strong believer in topical graphs (I’ve been creating them for various analyses over the past couple years), I find his argument generalized and terminology around data problematic. There’s a lot of hand waiving, and little acknowledgement of the assumptions and biases of the data that’s coming from public Twitter profiles.

“While we are what we say in our Tweets, our bios also reveal a telling side of who we really are.”

Our tweets represent moments of time in which we displayed interest in a topic, person or thing, while our bios represent our aspirational selves. There are so many more people who write “activist” or “blogger” in their bio, whom in real-life wouldn’t be considered either one of those. Additionally, stated location is usually only updated upon profile creation and never touched again, thus making it obsolete in a highly mobile society. Only about 10% of users share geo-location while Tweeting, which leaves us with a lot of guess-timation work.

Statistical Bias
It is extremely important to keep in mind that Twitter is used by a subset of the population. Of course many users will use the terms ‘geek’, ‘technology’ or ‘social media’ in their bios! There’s substantial statistical bias when looking at Twitter across the board, so as a brand, you must not consider a Twitter audience as representative or your real life audience, but rather a slice.

Solis throws the term ‘influencers’ around. In one case, he links to this ReSearch.ly page that supposedly points to “Starbucks influencers”. However all I can see on top are spam bots


or foursquare checkins:


Or people who mention the word ‘starbucks’ in their tweet. I see no “influencers” in this list, nor do I suspect any of these posts affected others to gain more interest in the brand. By generalizing across anyone who posts the term ‘starbucks’, research.ly is contaminating its data. And this is precisely my point of contention with Brian’s post.

Graphs of Influence

Influece is complex. Certainly not binary. Influence is represented by a hodgepodge of human behavior, social dynamics and serendipity. Many experts are trying to define it, and the truth is, there’s no recipe. Solis calls a version of this the ‘brand graph’ – a group of highly connected individuals within a given topic. Apparently the tool looks at users who mention the term ‘starbucks’ and then sees if their followers also mentioned the word ‘starbucks’. Assuming that this represents a transaction of “influence is naive.

1. How do you account for timing?
2. Do you even look at whose following who? Perhaps a user was influenced by another profile who mentioned “starbucks”?
3. Maybe there was no influence at all. There’s a well-known property within social networks called – homophily (birds of the same feather stick together). We tend to connect with people who are similar to us. Most likely my friends will talk about topics that interest me. Doesn’t mean that i’m influenced by them.
4. Even if we agree that user A mentioned ‘starbucks’ because she saw user B posting about starbucks, why do we automatically assume that this is influence?

Before using the term influence, we must understand and acknowledge where our data is coming from, and its statistical bias. We must understand that Twitter is a highly engaging conversational space. And if we’re seeing a conversation about a topic, there doesn’t necessarily have to be a transaction of “influence”. We shouldn’t use that term lightly.

Interest Graph + Social Graph = Magic

While both the social graph and the interest graphs are interesting on their own, the real magic happens when we put them together. By overlaying the dynamic topical discussions on top of the social graph, we are able to identify clusters of users engaged in conversation over a topic. By following the spread of these topics, or the information cascades, we are able to start mapping out the spread of topics across the network. And by analyzing structural positioning of users (within the graph), we can start to get a sense for their level of influence, in creating and sustaining information flows.

Leave a Reply