#Sandy: Social Media Mapping

Hurricane Sandy was the largest Atlantic hurricane on record, devastating parts of the Caribbean as well as the US East Coast. The loss of human life, displacements of families and damage to homes and businesses on the East Coast was unprecedented. During the storm and its aftermath, social media has been critical in spreading important information and mobilizing relief efforts. Some argue that Twitter was more accessible and far-reaching than any TV network, with over 20 million Tweets posted during the height of the storm.

Thanks to the wide range of questions posted by folks on Twitter, In this post we try to examine as many as we can. We look at prominent hashtags, urls, devices used and topics shared over the course of the past week around Hurricane Sandy. We show a map of user locations as they lose power in their homes. Finally we discuss a case where misinformation spread and how the network caught on incredibly fast.

Blackout map based on Tweets (interactive version below)

Continue reading #Sandy: Social Media Mapping

Data Behind the Vice Presidential Debate

Had the honor to present some data from yesterday’s VP debate at Bloomberg TV’s “Money Moves” show. I showed some Twitter visualization highlighting the different topics at play, as well as people’s perception of who won the debate (hint: there are a surprising number of people who think Ryan won Biden…) More below:



A Networked Take on Social Influence

Last week I had the honor to give the opening keynote at Dalhousie University’s symposium on measuring influence on social media. Its not common to see folks from industry keynoting academic events, so I was shocked when Anatoliy Gruzd from Dal’s Social Media Lab asked if I’d be the opening keynote at the symposium. I think a lot about the topic of influence, and have done a lot of work untangling what can be measured through data. Below I’m attaching a rough crib of my presentation, as well as my slide deck:

————————

The promise of data brings us hope that we can finally quantify the effects of social influence, giving us the opportunity to place a better price tag on certain digital spaces or interactions, potentially making our ecosystem much more efficient. We can finally attempt to answer questions such as: how are people activated, and what causes folks to purchase a product or pass along a piece of information.

Marketers and media alike tend to generate hype around status affordances which are plastered all over social network sites. These are metrics such as – number of followers, mentions, comments, fans, and so on – used within social network spaces to highlight user status. It is easy to get swept away by these readily available metrics without necessarily knowing what they mean (if you haven’t seen this yet, check out Colbert’s Internet Numbo-Tron 3000 skit: when tracking tweets per minute means… absolutely nothing!).

 

Influence as an Exposed Metric

I like the following definition of influence in social spaces:

The ability to disproportionately affect the spread of information.

In my work I’m extremely interested in how information spreads. For this reason, I look for points of influence when users get others to be attentive to a piece of information or media. If you’re a consumer brand, interesting points of influence for you are cases where a friend gets another to purchase an item. There’s always a wanted outcome in the form of an action: information spread, purchasing an item, viewing a TV show, etc.

Yet influence as an exposed metric is problematic for many reasons. We don’t think of providing a simple quantifiable measure for love, hate or trust. Yet we expect to do so with Influence. Can you tell me how much of your thinking is *innately* yours? What percentage of your thoughts are a direct result from advertising campaigns? What made it into your head because of peers and what are your original thoughts? Some say that influence has more to do with what is unconscious, the ways in which our brain picks up bits of information and formulates them together into an opinion or preference.

On top of that, people aren’t necessarily rational in their approach to trust. I may trust someone and continue to be influenced by their recommendations despite past transgressions. Some may bring influence from outside the network – a celebrity, a public figure. How does the fact that they attain influence outside the observed network affect our measurement? I haven’t seen anyone able to quantify and match the effects of influence across networks. And what about context? I shouldn’t be deemed an influencer on “popcorn” just because my tweet from the theater was retweeted by others (*cough* Klout *cough*).

Social recommendations happen between peers, friends and family members all the time. This is not new. Whats different now is that these moments of influence may be visible to us through the lens of data.

 

Networked Influence

The key to understanding influence is to look at the system as a whole, and think about users and how they’re interconnected rather than trying to identify specific people, or “influencers”. Users serve as information brokers, choosing what to give their attention to. But what drives these choices? And more importantly, can they be predicted?

I’m interested in a broader notion of influence. Not strictly peer to peer, or lists of these so-called “influencers”, but rather the effect on a community. I think of influence in the context of a networked ecosystem. Can we identify network attributes that create a higher likelihood for our wanted outcome? Can we figure out points in time when the network comes together in ways that will most likely help a message spread? An obvious but effective attribute is time of day. If your audience is mostly located within a certain geographic region, its best to publish content during the day (in that timezone) or else the majority of your audience will be sleeping. That’s just the start.

Based on recent experiments, Duncan Watts and Peter Dodds claim that going viral has more to do with the receptivity of an audience rather than the people doing the sharing, tagging and endorsing. They claim that role of “influencers” has been overstated:

“highly influential people were more effective than the average person in triggering social epidemics.  But their importance was far less than the “overall structure of the network”:  what matters far more to an idea, candidate, or product going viral is that the networks of people are easily influenced and networking with others who are easily influenced.”

“Twitter mega-influencers did generate greater cascades, but not regularly.  Their ”hits” were sporadic and inconsistent, while newer and less influential Twitter users had breakout retweets because of the subject, topic, or timing.”

Sinan Aral, an assistant professor at NYU’s Stern School of Business and an authority on social contagion, studies the ability to identify susceptible members as a way to predict influence. The network is chaotic, can be sporadic and inconsistent in terms of what generates large information flows. By focusing on understanding a group of users, how they’re interconnect, when they’re active and what topics “activate” them (what they’re susceptible to) we can start seeing patterns emerge.

 

A Bit about SocialFlow

SocialFlow is a technology startup in New York City that optimizes publishing to Twitter and Facebook for media outlets and brands. Lets say if you’re The Economist, you have hundreds of articles published to your website on a daily basis. How do you choose what to post to Twitter/Facebook and when to do that? It is clear that there are diminishing returns the more you publish to social channels, meaning, you see substantially less clicks per shared link, and more unsubscribes if you overload people’s feeds with your content. So you have to pick out a few articles and make sure to post them at certain times of the day.

This is exactly what we do. We take in a feed of content that could be published to Twitter and Facebook. And based on a whole slew of metrics, we decide which article to post, and when to post it. How exactly do we do this you may ask?

SocialFlow is a data powerhouse. We ingest around 2TB of data per day. We work very closely with Twitter and consume whats called the public firehose – receiving any publicly posted tweet into our systems in realtime. Then we have multiple systems that index, track and count various attributes of this data. For example, we care deeply about audiences, so we run a wide array of stats on audiences (e.g. followers of a given account).

 

The Data

At SocialFlow we use a number of metrics to try and predict which piece of content is most likely to yield the highest level of responses at any given point in time. We look at audience activity – how active is an audience at any given point in time, who from the audience is active. We also look at historical activity – what has activated my audience in the past? what have folks retweeted in the past. And in general, whats happening in the network  - Is it peaking out of the ordinary? Are there conversations that are taking off in unusual ways?

We constantly look at the impact of events on the network. By understanding whats normal, we can better identify events that deviate from the norm. This gives us the ability to quantify the impact of an event, or its’ “influence” on the network. In my presentation, I present a number of examples: a major football game, the Aurora Colorado shooting and Whitney Houston’s death. In each case we see clear deviations from the norm, and identify a unique pattern – one representing a typical sports match, while the other, a typical breaking news event. The Aurora shooting displays a very different curve, due to consequences illustrated by this blog post.

If we go back to our definition of influence, it is important for us to understand what the network normally looks like, so that we can identifying deviations from the norm. In each case we can quantify the level of influence an event had on the network, by comparing to the norm. Next I highlight event classification. The better we get at classifying an event to one of multiple bins, the better we understand its attributes: how much time will a trend persist, when it will most likely peak, how fast it will decline and how far (geographically) it will spread. We identify point in time where audiences are in “sync”, focused on a single topic, versus points in time where there’s much more volatility, many topics are at play.
 

Networked Audiences and Information Flows

Next we take a look at the shape of an audience. One question that I’m very interested in, is whether a highly clustered network is more susceptible to the spread information compared to a network which are less dense. In the case of Kony 2012 we identified pre-existing communities amongst the initial users who heavily shared the video. These different parts of the network “lit up” at the same time, getting the topic trending across different cities at the same time, generating a snowball effect. This wasn’t simply a viral video that was randomly placed online and spread like wildfire, but rather the effect of a highly organized group and a pre-existing network that was set on spreading the content.

Similarly we see different events “light up” the part of the network that’s relevant to the context of the event. Coupons and deals light up one part of the population, while the political debates another. Each group that’s lit up is susceptible within that context.

Next I illustrate two examples of information flows. In the first, showing how a hashtag spreads, it is clear that the node with the most followers (a.k.a. the “influencer”) is not the most important node in the flow, but rather the node bridging between the original content creator and this highly followed node. Without this bridge, the information would never have spread, hence the node with the most influence within this specific flow is not necessarily the most highly followed, but rather the best positioned in terms of network and interest.

The second example is the case of @KeithUrbahn in the breaking news about the Osama Bin-Laden raid. Two users played a very important role in this information cascade, by re-contextualizing the information coming from Keith Urbahn, and giving their trust. Both @JakeSherman and @BrianStelter saw Keith’s tweet, and wrote that he is a trusted source due to his close connection to Donald Rumsfeld. Information that requires a little more digging, but when used at the right time, helps the network gain trust, and thus, information spreads at an incredibly rapid rate.

 

Finally

Instead of focusing on lists of “influencers” think about the network of users that you’re trying to understand. How does it behave usually, and when does it deviate from the norm. Think about audience receptivity – what topics light up your fans or followers? In aggregate there are definite patterns here. Think about network attributes of your audience – its shape, how clustered users are and who are your most central users?

Think about bridges, connectors, those that can help take a piece of information from an interesting source, to users with an audience. And always have in mind what’s the outcome that you’re trying to attain. Whether clicks, web traffic or product purchase, influence should always be mapped out to a wanted action from a chosen population.

Hope this is helpful. Slides below:

 

Social Media and the Presidential Debate

Earlier this week I was invited to participate in Bloomberg TV’s Market Makers to talk about data from last week’s presidential debate. The segment was shot the following morning after the debate. Even with such short notice, we managed to show a few interesting views of the data:

1. Even though Romney is said to have won the debate, when you look at social data, #Obama2012 appears much more prominent and central. This might be happening because there are more users on Twitter rooting for Obama. Or perhaps this reflects a much more organized campaign, using a single hashtag for all of their communications.

2. We can clearly identify two different topic spaces amongst the Republicans – one is Romney’s campaign, and the other, Tea Party / #tcot. The conversation around Romney is much more fragmented than the conversation around Obama.

3. We observe three dominant clusters of users from Ohio discussing the debates. There was a clear political cluster, a media cluster, and surprisingly, a significant cluster of users from Ohio State University.

Video of the interview along with graphs are embedded below:



Some more information about the graphs:

First, I highlighted a simple graph showing the different curves that represent each of the prominent debate hashtags. Obviously #debates was substantially larger compared to #Obama2012, #Romney2012 and even #BigBird. That said, the fact that the other hashtags didn’t spike as much, doesn’t mean they were not dominant within the discussion online.

Next I presented a network graph that maps out prominent hashtags and user mentions during the first presidential debate. It is clustered by modularity, which means that hashtags/user mentions that appeared together in higher than usual levels, will be under the same color.

Here’s a zoomed in version:

And here’s #bigbird / #pbs:

The next graph maps out the friend/follow relationships between a segment of users who were discussing the #debates on Twitter. In this case, we see users from Ohio, or those affiliated with Ohio, and how they’re interconnected. Again, the graph is clustered by modularity, where three distinct clusters emerge.

The first (yellow, top right), seems to be politicos from Ohio, including @JohnKasich (governor), @johnboehner (Ohio congressional rep and speaker of the house) and @robportman (Ohio senator). The second (purple, middle right) are Twitter handles that represent local media in Cleveland and across Ohio such as @clevelanddotcom and @WEWS. While the third dominant cluster (green, bottom right) users from Ohio State University who formed a significant part of Ohio-ans discussing the debate.

Thoughts, ideas or suggestions? Find me on Twitter – @gilgul

What Can Social Media Teach Us About the Presidential Debates

Earlier this week I was invited to participate in Bloomberg TV’s Market Makers to talk about data from last week’s presidential debate. The segment was shot the following morning after the debate. Even with such short notice, we managed to show a few interesting views of the data:

1. Even though Romney is said to have won the debate, when you look at social data, #Obama2012 appears much more prominent and central. This might be happening because there are more users on Twitter rooting for Obama. Or perhaps this reflects a much more organized campaign, using a single hashtag for all of their communications.

2. We can clearly identify two different topic spaces amongst the Republicans – one is Romney’s campaign, and the other, Tea Party / #tcot. The conversation around Romney is much more fragmented than the conversation around Obama.

3. We identified three dominant clusters of users from Ohio discussing the debates. There was a clear political cluster, a media cluster, and surprisingly, a dominant cluster of users from Ohio State University.

Video of the interview along with graphs are embedded below:



Some more information about the graphs:

First, I highlighted a simple graph showing the different curves that represent each of the prominent debate hashtags. Obviously #debates was substantially larger compared to #Obama2012, #Romney2012 and even #BigBird. That said, the fact that the other hashtags didn’t spike as much, doesn’t mean they were not dominant within the discussion online.

Next I presented a network graph that maps out prominent hashtags and user mentions during the first presidential debate. It is clustered by modularity, which means that hashtags/user mentions that appeared together in higher than usual levels, will be under the same color.

Here’s a zoomed in version:

And here’s #bigbird / #pbs:

The next graph maps out the friend/follow relationships between a segment of users who were discussing the #debates on Twitter. In this case, we see users from Ohio, or those affiliated with Ohio, and how they’re interconnected. Again, the graph is clustered by modularity, where three distinct clusters emerge.

The first (yellow, top right), seems to be politicos from Ohio, including @JohnKasich (governor), @johnboehner (Ohio congressional rep and speaker of the house) and @robportman (Ohio senator). The second (purple, middle right) are Twitter handles that represent local media in Cleveland and across Ohio such as @clevelanddotcom and @WEWS. While the third dominant cluster (green, bottom right) users from Ohio State University who formed a significant part of Ohio-ans discussing the debate.

Weibo, China’s Twitter, Abuzz with Sentiment Over Liu Xiang’s Olympic Fail

Liu Xiang’s 110m hurdles race was one of the most anticipated Olympic races for audiences across Mainland China. Shanghai born Liu Xiang has emerged as one of China’s most visible cultural icons, being the first Chinese athlete to achieve the “triple crown” of athletics: World Record Holder, World Champion and Olympic Champion. During the 2008 Beijing olympics Liu had to withdraw from the competition at the last moment due to a previously unrevealed injury. In this past week’s hurdle race Liu pulled his Achilles tendon while taking off. He attempted to jump over the first hurdle but crashed straight into it. He then hopped the full 110 meter stretch, helped by other athlete friends.

This drew numerous reactions online, and was heavily covered across western media outlets. However, only a few covered sentiment and reactions coming from Mainland China. This article posted on China Hush highlights some of the positive support coming from Chinese users, yet we wanted to see a more complete view of the story.

In the following post, we analyzed over 150k Weibo reactions to Liu Xiang’s race. We identified dominant words and user sentiment posted in reaction to the Chinese Olympian’s failure to complete the race. We see an uproar of support, but at the same time a wide range of critiques coming out of Weibo users in China. For the first time we have an ability to quantify and analyze sentiment from such a large population within China; a peek into the sentiment of China’s public sphere.

Continue reading Weibo, China’s Twitter, Abuzz with Sentiment Over Liu Xiang’s Olympic Fail

Big Data for Breaking News: Lessons from #Aurora, Colorado

On July 20th we were glued to our computer screens obsessively tracking information coming out of Aurora, Colorado about the terrible shooting that took place. When breaking news events occur, we try to grab as much data as we can, to see what we can learn about the event. One of the simplest plots that we create is hashtag/word usage over time. When we looked at the #Aurora hashtag, we saw a really odd shape:

Why odd? Breaking news, especially of this sort, tend to look quite different. As news spreads about an event, conversation tends to spike and organically fall (in mathematical terms, the discrete observed frequencies at various time intervals follow the Zipfian distribution). In this case, the #Aurora hashtag started being used right after the shooting took place (7am UTC), and slowly rose as the US east coast woke up to the news (12pm UTC). Now notice the massive spike that happens around 6pm UTC. It is more than five times the height of the average beforehand, rapidly rising to 1000 tweets per minute!

Continue reading Big Data for Breaking News: Lessons from #Aurora, Colorado

PDF12 Keynote: Networked Power

Last week I gave a keynote the Personal Democracy Forum (#PDF12) in New York City - Networked Power: what we learn from data. PDF is an incredible gathering of some of the smartest folks working on understanding the idea of Personal Democracy, where every citizen is a full participant. In my presentation, I focused on the networked characteristic of our media ecosystem and the new form of power that networks attain. I described ways in which we at SocialFlow analyze networked audiences, mapping out attributes such as activity levels, topical interest, engagement and their evolving friendship-based shape. Some key points from the presentation:

  1. My Network != Your Network: networks of followers, audiences differ substantially in time of day, activity, engagement and shape.
  2. Overarching generalizations will most likely be misinforming.
  3. We need to better understand networked dynamics such as information flows, audience intersections and the effect of algorithmic curation.
  4. Networked spaces are NOT a pure meritocracy: certain positions are advantageous.

Analyzing UNICEF’s #SahelNow Campaign

A couple of weeks ago we were pleased to spend some time with the folks from UNICEF, analyzing and discussing their #SahelNow campaign. The campaign is focused on drawing attention towards the food crisis unfolding in the Sahel region in West and Central Africa. The campaign’s goal is to rush food, nutrition and other emergency relief to help children in the region. There is an urgent need for help from the public, and #SahelNow is an attempt to alert the world about this looming crisis.  SocialFlow supports the effort to enlist people around the world to help to sound the alarm.

The campaign has seen a substantial rise in references, including participation from a number of major celebrities. The SocialFlow research team helped UNICEF analyze and understand hashtag usage across Twitter by looking at a few different aspects of the data:

  1. Time Series Data: by mapping out levels of hashtag references we could identify prominent points in time where the conversation was spiking out of the ordinary
  2. Phrase Co-occurence: we generated a network graph view of all the related concepts referenced in Tweets along with the hashtag (concepts = phrases, other hashtags, users)
  3. Friendship Graph: we extracted the underlying network of relationships amongst those users who referenced the #SahelNow hashtag, in effect identifying dense clusters of users who were actively promoting the cause in their region.

Here’s a video highlighting some of the data manipulation we ran using gephi:

And attached below are a number of screenshots we took during the analysis:

Image 1: Whole network graph of concepts related to the #SahelNow hashtag (blue = user mentions, light green = hashtags, dark green = phrases)

Image 2: Hovering over the #foodcrisis hashtag shows all of the related concepts, effectively other hashtags, user names or phrases referenced with #foodcrisis in the same tweet.

Image 3: Relationship graph showing connections between users who posted to the hashtag. Note the dense clusters that emerge highlighting different regional and topical communities that reference the campaign.

Image 4: Zoom-in view of the Bahraini group – a dense cluster of mostly Bahraini users sharing the #SahelNow hashtag on Twitter.

Questions? Feel free to ping me on Twitter | @gilgul

The Tweet Comes Back! (עוד חוזר הציוץ)

This article was recently published on Globes, the leading Israeli business/market paper, covering our research on information flows on Twitter during the Tunisian and Egyptian revolutions. Tal Schneider (@talschneider), the article’s author, was especially interested in our findings around homophily: journalists tend to retweet other journalists, bloggers retweet bloggers, etc. Tal does a great job explaining the shift happening in news consumption, with many journalist personal accounts becoming significantly more successful than their respective media outlet.

Now the question is, if and when Twitter will finally hit Israel (if at all)? Now that there’s Hebrew support, language isn’t as big of an issue. Perhaps with the upcoming resurgence of the social justice movement?