Last week I had the honor to give the opening keynote at Dalhousie University’s symposium on measuring influence on social media. Its not common to see folks from industry keynoting academic events, so I was shocked when Anatoliy Gruzd from Dal’s Social Media Lab asked if I’d be the opening keynote at the symposium. I think a lot about the topic of influence, and have done a lot of work untangling what can be measured through data. Below I’m attaching a rough crib of my presentation, as well as my slide deck:
The promise of data brings us hope that we can finally quantify the effects of social influence, giving us the opportunity to place a better price tag on certain digital spaces or interactions, potentially making our ecosystem much more efficient. We can finally attempt to answer questions such as: how are people activated, and what causes folks to purchase a product or pass along a piece of information.
Marketers and media alike tend to generate hype around status affordances which are plastered all over social network sites. These are metrics such as – number of followers, mentions, comments, fans, and so on – used within social network spaces to highlight user status. It is easy to get swept away by these readily available metrics without necessarily knowing what they mean (if you haven’t seen this yet, check out Colbert’s Internet Numbo-Tron 3000 skit: when tracking tweets per minute means… absolutely nothing!).
Influence as an Exposed Metric
I like the following definition of influence in social spaces:
The ability to disproportionately affect the spread of information.
In my work I’m extremely interested in how information spreads. For this reason, I look for points of influence when users get others to be attentive to a piece of information or media. If you’re a consumer brand, interesting points of influence for you are cases where a friend gets another to purchase an item. There’s always a wanted outcome in the form of an action: information spread, purchasing an item, viewing a TV show, etc.
Yet influence as an exposed metric is problematic for many reasons. We don’t think of providing a simple quantifiable measure for love, hate or trust. Yet we expect to do so with Influence. Can you tell me how much of your thinking is *innately* yours? What percentage of your thoughts are a direct result from advertising campaigns? What made it into your head because of peers and what are your original thoughts? Some say that influence has more to do with what is unconscious, the ways in which our brain picks up bits of information and formulates them together into an opinion or preference.
On top of that, people aren’t necessarily rational in their approach to trust. I may trust someone and continue to be influenced by their recommendations despite past transgressions. Some may bring influence from outside the network – a celebrity, a public figure. How does the fact that they attain influence outside the observed network affect our measurement? I haven’t seen anyone able to quantify and match the effects of influence across networks. And what about context? I shouldn’t be deemed an influencer on “popcorn” just because my tweet from the theater was retweeted by others (*cough* Klout *cough*).
Social recommendations happen between peers, friends and family members all the time. This is not new. Whats different now is that these moments of influence may be visible to us through the lens of data.
The key to understanding influence is to look at the system as a whole, and think about users and how they’re interconnected rather than trying to identify specific people, or “influencers”. Users serve as information brokers, choosing what to give their attention to. But what drives these choices? And more importantly, can they be predicted?
I’m interested in a broader notion of influence. Not strictly peer to peer, or lists of these so-called “influencers”, but rather the effect on a community. I think of influence in the context of a networked ecosystem. Can we identify network attributes that create a higher likelihood for our wanted outcome? Can we figure out points in time when the network comes together in ways that will most likely help a message spread? An obvious but effective attribute is time of day. If your audience is mostly located within a certain geographic region, its best to publish content during the day (in that timezone) or else the majority of your audience will be sleeping. That’s just the start.
Based on recent experiments, Duncan Watts and Peter Dodds claim that going viral has more to do with the receptivity of an audience rather than the people doing the sharing, tagging and endorsing. They claim that role of “influencers” has been overstated:
“highly influential people were more effective than the average person in triggering social epidemics. But their importance was far less than the “overall structure of the network”: what matters far more to an idea, candidate, or product going viral is that the networks of people are easily influenced and networking with others who are easily influenced.”
“Twitter mega-influencers did generate greater cascades, but not regularly. Their ”hits” were sporadic and inconsistent, while newer and less influential Twitter users had breakout retweets because of the subject, topic, or timing.”
Sinan Aral, an assistant professor at NYU’s Stern School of Business and an authority on social contagion, studies the ability to identify susceptible members as a way to predict influence. The network is chaotic, can be sporadic and inconsistent in terms of what generates large information flows. By focusing on understanding a group of users, how they’re interconnect, when they’re active and what topics “activate” them (what they’re susceptible to) we can start seeing patterns emerge.
A Bit about SocialFlow
SocialFlow is a technology startup in New York City that optimizes publishing to Twitter and Facebook for media outlets and brands. Lets say if you’re The Economist, you have hundreds of articles published to your website on a daily basis. How do you choose what to post to Twitter/Facebook and when to do that? It is clear that there are diminishing returns the more you publish to social channels, meaning, you see substantially less clicks per shared link, and more unsubscribes if you overload people’s feeds with your content. So you have to pick out a few articles and make sure to post them at certain times of the day.
This is exactly what we do. We take in a feed of content that could be published to Twitter and Facebook. And based on a whole slew of metrics, we decide which article to post, and when to post it. How exactly do we do this you may ask?
SocialFlow is a data powerhouse. We ingest around 2TB of data per day. We work very closely with Twitter and consume whats called the public firehose – receiving any publicly posted tweet into our systems in realtime. Then we have multiple systems that index, track and count various attributes of this data. For example, we care deeply about audiences, so we run a wide array of stats on audiences (e.g. followers of a given account).
At SocialFlow we use a number of metrics to try and predict which piece of content is most likely to yield the highest level of responses at any given point in time. We look at audience activity – how active is an audience at any given point in time, who from the audience is active. We also look at historical activity – what has activated my audience in the past? what have folks retweeted in the past. And in general, whats happening in the network – Is it peaking out of the ordinary? Are there conversations that are taking off in unusual ways?
We constantly look at the impact of events on the network. By understanding whats normal, we can better identify events that deviate from the norm. This gives us the ability to quantify the impact of an event, or its’ “influence” on the network. In my presentation, I present a number of examples: a major football game, the Aurora Colorado shooting and Whitney Houston’s death. In each case we see clear deviations from the norm, and identify a unique pattern – one representing a typical sports match, while the other, a typical breaking news event. The Aurora shooting displays a very different curve, due to consequences illustrated by this blog post.
If we go back to our definition of influence, it is important for us to understand what the network normally looks like, so that we can identifying deviations from the norm. In each case we can quantify the level of influence an event had on the network, by comparing to the norm. Next I highlight event classification. The better we get at classifying an event to one of multiple bins, the better we understand its attributes: how much time will a trend persist, when it will most likely peak, how fast it will decline and how far (geographically) it will spread. We identify point in time where audiences are in “sync”, focused on a single topic, versus points in time where there’s much more volatility, many topics are at play.
Networked Audiences and Information Flows
Next we take a look at the shape of an audience. One question that I’m very interested in, is whether a highly clustered network is more susceptible to the spread information compared to a network which are less dense. In the case of Kony 2012 we identified pre-existing communities amongst the initial users who heavily shared the video. These different parts of the network “lit up” at the same time, getting the topic trending across different cities at the same time, generating a snowball effect. This wasn’t simply a viral video that was randomly placed online and spread like wildfire, but rather the effect of a highly organized group and a pre-existing network that was set on spreading the content.
Similarly we see different events “light up” the part of the network that’s relevant to the context of the event. Coupons and deals light up one part of the population, while the political debates another. Each group that’s lit up is susceptible within that context.
Next I illustrate two examples of information flows. In the first, showing how a hashtag spreads, it is clear that the node with the most followers (a.k.a. the “influencer”) is not the most important node in the flow, but rather the node bridging between the original content creator and this highly followed node. Without this bridge, the information would never have spread, hence the node with the most influence within this specific flow is not necessarily the most highly followed, but rather the best positioned in terms of network and interest.
The second example is the case of @KeithUrbahn in the breaking news about the Osama Bin-Laden raid. Two users played a very important role in this information cascade, by re-contextualizing the information coming from Keith Urbahn, and giving their trust. Both @JakeSherman and @BrianStelter saw Keith’s tweet, and wrote that he is a trusted source due to his close connection to Donald Rumsfeld. Information that requires a little more digging, but when used at the right time, helps the network gain trust, and thus, information spreads at an incredibly rapid rate.
Instead of focusing on lists of “influencers” think about the network of users that you’re trying to understand. How does it behave usually, and when does it deviate from the norm. Think about audience receptivity – what topics light up your fans or followers? In aggregate there are definite patterns here. Think about network attributes of your audience – its shape, how clustered users are and who are your most central users?
Think about bridges, connectors, those that can help take a piece of information from an interesting source, to users with an audience. And always have in mind what’s the outcome that you’re trying to attain. Whether clicks, web traffic or product purchase, influence should always be mapped out to a wanted action from a chosen population.
Hope this is helpful. Slides below: