The Tweet Comes Back! (עוד חוזר הציוץ)

This article was recently published on Globes, the leading Israeli business/market paper, covering our research on information flows on Twitter during the Tunisian and Egyptian revolutions. Tal Schneider (@talschneider), the article’s author, was especially interested in our findings around homophily: journalists tend to retweet other journalists, bloggers retweet bloggers, etc. Tal does a great job explaining the shift happening in news consumption, with many journalist personal accounts becoming significantly more successful than their respective media outlet.

Now the question is, if and when Twitter will finally hit Israel (if at all)? Now that there’s Hebrew support, language isn’t as big of an issue. Perhaps with the upcoming resurgence of the social justice movement?

A Tale of Three Rumors

This is a cross post for the Truthiness in Social Media Symposium which took place at Harvard University, March 6th 2012.

The more we use social media, the more seasoned we become at assessing the trustworthiness of information that we come across. With rumors constantly flying around, famous celebrities are often mistaken to be dead, while every little move Apple makes triggers an onslaught of buzz around new product features.

This post details three rumors, each with its own path, source, evolution and outcome. One makes it far, cascading through networks of users, fans and followers who decide to amplify and pass on the message, another makes it far but is found to be false, while the third quickly dies. What can we learn from their differences? How can we improve our ability to recognize a false piece of information in realtime?

 

Adding Context, Gaining Trust

This is a classic example of a viral information flow: 1) user provides an important piece of information to a hungry crowd 2) it spreads like wildfire 3) the information turns out to be true.

On the evening of May 1st, 2011, Keith Urbahn broke the news about Osama Bin Laden’s death, beating the official White House announcement by a full hour. He wasn’t the first to speculate Bin Laden’s death, but he was the one who gained the most trust from the network. Within a minute of @KeithUrbahn’s original post, it was validated and placed in context by two important players. Politico’s Jake Sherman wrote:

and Brian Stelter of the New York Times added:

Both play a critical role in the flow, adding context on the source, vouching for their audience’s trust. A few others came out with speculation before Keith Urbahn, yet none with a trustworthy tone, and none drew trust from the network.

 

The Aspired Truth

On November 17th 2011, The New York City NBC Twitter handle (@NBCNewYork) posted the following tweet:

Some context: this comes at the height of #N17, Occupy Wall Street’s global day of action, six days after the NYPD evicted Zucotti park occupiers, and a symbolic two months after the first occupation (Sept 17th). The environment was incredibly heated and everyone expected violence to erupt. It was just a matter of when, who and how.

Within 5 minutes of NBC New York’s post, @HuffPostMedia, @CKanal (social media editor at CNBC) and @BrianStelter (NYTimes) reposted the message. @Skidder, the director of editorial operations at Gawker media added a confirmation a few minutes later, and the formal @NBCNews account (158k followers at the time – x8 in size compared to NBCNewYork) posted it as well. Coming from so many reputable sources, an onslaught of Retweets followed, along with many angry voices spewing “i told you so’s” and “this is what democracy looks like” type messages.

The correction came a few minutes later from the NYPD:

Immediately following that, both @NBCNewYork and @NBCNews came out with corrections. But at that stage, the original report was already at the peak of its cascade. In this case, the published correction didn’t reach as far and wide of an audience. People are much more likely to retweet what they want to be true, their aspirations and values.

Does misinformation always spread further than the correction? Not necessarily. I’ve seen it go either way. But I can safely say that the more sensationalized a story, the more likely it is to travel far. Many times the story about about misinformation is what spreads, rather than the false information itself (for example: the Steve Jobs false death tweet which cost Shira Lazar her CBS gig)

 

Lacking the Right Network

This is a very different case where information was fed to the network, but didn’t spread. Effectively nobody listened until the press came in.

Many claim that Aja Dior (@AjaDiorNavy) on Twitter broke the unfortunate news about Whitney Houston’s death. While it is true that @AjaDiorNavy’s tweet appeared on Twitter 42 minutes before the news hit the press, its content spread only to a handful of users. The story did not actually break until the AP and TMZ posted the announcement that came through formal channels.

The graph below shows the information flow during the first hour, as the story was publicized. Nodes represent Twitter users, and the connections between then represents that path that information flowed. The larger the node, the more retweets it generated. The yellow nodes were earlier to Tweet. The silo-ed group at the center left (#1) was first to mention and respond to the news, but it stayed confined within that group. (source)

 

This is a classic case of information that had the potential to spread, yet did not.

 

Quantifying Trust

The examples above teach us a few things about information and misinformation flows. The first clearly shows that one does not need to have a large audience, but rather the right kind of people in place who will provide context and generate trust. The second highlights that attention is limited, once a message is published, especially in the context of a heated struggle, it is difficult to retract. Additionally, it is harder to persuade folks an event is taking place when they don’t want to believe it is. The last teaches us that even one of the hottest pieces of information will not spread without the right network in place. So what can we do to assess “truthiness” of information in realtime as events are unfolding?

A hybrid approach may be ideal. We can use algorithmic methods to quickly identify and track emerging events. Model specific keywords that tend to show up around breaking news events (think “bomb”, “death”) and identify deviations from the norm. At the same time, its important to have humans constantly verifying information sources, part based on intuition, and part by activating their networks.

Andy Carvin does a phenomenal job building up a network of informants. He notoriously tracks events and rumors early on, consistently leveraging his network to seek rumor confirmation and validation. He teaches his network how to question and verify information, constantly learning which sources can be trusted, adding context where needed and pointing to problematic assumptions. Andy has generated a complex mental model of his audience in his head and uses it, along with his network of friends and followers, to verify rumors.

As our networks scale in size and complexity, there’s limited capacity to hold everything in our heads. But with a little help from the right type of data analysis tools, I am certain this can scale.

Big Questions in Journalism 2.0

Had the pleasure to participate in the Hackers/Hack meetup at the Boston Globe last night, where we continued the never-ending Journalism 2.0 discussion. I’m so used to being surrounded by folks from the tech world, that this meetup was quite refreshing as it had mostly non-tech journalists as participants. The meetup began with a presentation by the #TNGG (The Next Great Gen) crew. They ended the preso with a number of pertinent questions which then led to an active discussion. I didn’t take detailed notes, but will try to summarize the important points.

Personal vs. Professional

Its becoming harder to draw a clear line between personal opinions and the professional “stance”. The ease and expectation of publishing at all time leaves journalists grappling with the blurring boundaries between reporting personal experiences versus the formal “news”. At the core of this lies the question of objectivity (or should I say the myth?). News *should* come from a formal, trustworthy source that (supposedly) is unbiased and reports events from an objective viewpoint. With the advent of social media, journalists are expected to be constantly posting snippets of information as they become available, and not wait for the long form story to be published. Being first is valuable in an ecosystem where the ability to find information is frictionless. Having a personality on top of it, is becoming an invaluable asset. And whats a personality without an opinion?

Whats the right level of publicness that a journalist should seek? Can a journalist stay objective when reporting on an event from such a personal angle? The decision to mention one aspect of the evolving story versus another, the choice of the angle a photo is taken, all contribute to a personal, subjective view of an event. We all recognize how important it is to contribute this type of content and how engaging it is, drawing new types of audiences to follow the news. Additionally, the context that a journalist can add to information as a story evolves is incredibly valuable. Yet it all chips away at the ideal that news can be unbiased and objective. What are some best practices that journalists can keep in mind when adjusting to this new level of publicness?

Groups vs.  Networks

Your networks are your power. The game is no longer about maintaining a group of subscribers, but rather building up networks of folks who are interested in your content, and will vouch for your brand to their peers. One of the journalists at the meetup suggested that various niche publications have perfectly fine subscription rates, with a set group of folks who pay yearly for their magazine. She claimed that for these types of publications, social media might not be useful, as they have a small following, and are wholly focused on a specific niche (think: The Journal of Biological Chemistry).

Very wrong. While some journals and magazines still maintain solid subscription levels, the average age of those subscribers is on the rise. Someone at the meetup asked – wouldn’t you like to double the size of your subscription base? Social media can help expand audiences based on networks of interest, making it possible to reach folks you coud never reach otherwise. Information and recommendations flow through people, subscriptions are made based on relationship or interest. In an interest-based network such as Twitter, people self-organize around topics – you follow someone who interests you, track a hashtag of an event that you care about. By making sure your articles reach the right type of users on Twitter, building up the right type of network over time will do wonders for your publication.

Then came the question of ownership: who do these networks of followers belong to? They’re acquired while a journalist is employed by a media entity, yet the Twitter account belongs to the journalist. Some journalists in the room stated that their account specifically has their employer as a part of the name (e.g. ‘CNN_dave’) drawing a clear connection to their affiliation. Yet user names can be changed, and when someone leaves a media company for another, it is not too difficult to take that account while changing the affiliation. The younger crowd agreed that folks should be checking their employment contracts to make sure they have full ownership of social media accounts. It is a source of empowerment, tipping the scale towards the employee (this is not only happening in media, but across all industries).

Someone else suggested that its in the company’s best interest to have its employees active in social media and owning their accounts. When folks leave the company, hopefully on good terms, they will continue to link back to their previous employer. He believes that letting employees grow their audiences and letting them leave in good terms, will be net positive in aggregate over time, as they will be driving visibility and attention back to you.

Cons

There are obviously many issues we must overcome w/r/t social media. One commonly discussed is verification of truth – how do you know the information you’re pointing to is valid. Main suggestion here was – don’t be stupid! Just like one would verify sources when writing a story, the same should be done using social media sources. Some said that they ping folks and make sure to host a Skype call before they use content coming from an anonymous or pseudonymous source. I suggested that especially with rapid services such as Twitter, the network tends to figure it out very fast. People will call you out on a false piece of information. What tends to spread and be sensationalized is the story about the misinformation, not the actual piece of false information (Chuck Tanowitz tweeted me below):

Additionally, as much as the media world has gone ga-ga over Twitter, its important to recognize its biases, for two main reasons:

  1. Not everyone’s on Twitter. While it is becoming more mainstream, the service is still heavily used by certain types of power users. Facebook is a much more representational sample of the population. As Twitter is rapidly becoming the default service journalists and media entities use to lookup and publish information, it is important to remember the population that’s using the service (and whose left behind).
  2. Homophily /Filter Bubble – online (as offline) we tend to organize within our familiar neighborhoods. We choose to follow or friend others based on interest, professional circles, friendship or status. Social spaces feed off these networks, and recommend content that our friends tend to like. Its a feedback loop that keeps us from jumping outside our safe, familiar context.

An then there’s the ego-boost. Indeed getting attention in social media spaces does a great job at feeding our ego.

Be humble. No need to brag.

Recent Media Coverage

Over the past couple of weeks I’ve been honored to have my work has covered in a number of awesome publications.

Viral Information Flows / MIT Technology Review

Mike Orcutt of MIT’s Technology Review published a fantastic post, Information’s Social Highways, available in this month’s magazine as well as on their website. Mike got in touch with me over the summer. He wanted to highlight various interesting aspects of information dissemination within social networks, including their visual representations. We threw around a number of ideas and agreed that it’d be fantastic to identify a number of interesting information flows that emerged from Twitter, visualize, and highlight similarities and differences in the way their networks had formed.

A quote from the article:

There is no recipe for virality, says Gilad Lotan, head of R&D for a startup called SocialFlow, which aims to help clients from the Economist to Pepsi more effectively capture attention on Twitter. But the deluges of data that viral tweets generate hold potentially valuable insights into how and why certain things spread beyond their author’s network of regular contacts.

The article compared two very different information flows. The first, providing hot information about the Osama Bin Laden operation, was incredibly fast. Within a few minutes, there were over one thousand users reposting the message, along with prominent journalist accounts. In comparison, the second flow is one initiated by my close friend Deb Chachra. In reaction to the London authorities threatening to shut down Twitter during the riots this summer, she posted the following tweet:

Urban rioting existed before SMS/social media. You know what didn’t? Large-scale community cleanups, spontaneously organized within hours.

Her post went viral, but in a very different manner. Over a period of two and a half days, Deb’s tweet saw a sustained growth in the number of folks reposting it. Every few hours, the post would get a boost from someone with a large audience who reposted it, continuing on this way. While in the previous example, the path to an important curator (Brian Stelter) took one minute and not more than one hop, in Deb’s case, it was several hours and 11 hops before the message reached Graham Linehan (@Gilnner) who has a large audience with which the message resonated.

Mike wraps up the article, making the case for what we do at SocialFlow:

Being heard isn’t always easy in an age when anyone can become a broadcaster. But analyzing and visualizing such data helps SocialFlow guide customers about how, when, and what they should tweet to have the best chance of disseminating their messages widely.

News as a Process: how journalism works in the age of Twitter / GigaOm

Mathew Ingram published a piece called ‘News as a Process: how journalism works in the age of Twitter‘, on GigaOm covering our IJOC study – “The Revolutions Were Tweeted: information flows during the 2011 Tunisian and Egyptian revolutions“. Matthew highlights one of our key findings on homophily within Twitter’s media ecosystem: journalists tend to retweet other journalists, bloggers tend to retweet other bloggers, and so on). Finally, the article links to the visualization I posted on Global Voices, highlighting GV authors who were central figures in disseminating news about the turn of events during the height of the Tunisian and Egyptian revolutions.

Network of news dissemination during the Tunisian and Egyptian revolutions (green nodes are Global Voices authors)

Network of news dissemination during the Tunisian and Egyptian revolutions (green nodes are Global Voices authors)

Quote from the article:

As we look at the way news and information flows in this new world of social networks, and what Andy Carvin has called “random acts of journalism” by those who may not even see themselves as journalists, it’s easy to get distracted by how chaotic the process seems, and how difficult it is to separate the signal from the noise. But more information is better — even if it requires new skills on the part of journalists when it comes to filtering that information — and journalism, as Jay Rosen has pointed out, tends to get better when more people do it.

Visualizing.org

Lastly, the Osama Bin Laden Twitter visualization that I worked on earlier in May 2011 was highlighted as one of Visualizing.org’s visualizations of the year. wo00t! For those of you not familiar with Visualizing.org, it is a fantastic community of creative folks with the goal of making data visualization more accessible to the general public. The site hosts hundreds of datasets, and encourages users to create visualizations through challenges which run on the website.

I’m extremely excited and humbled by the range of awesome coverage!

Now – back to work :)

The Algorithmic Newsroom

I just came back from News Foo, an un-conference for technologists, academics and journalists in Phoenix on the future of news. The following post details my thoughts, heavily inspired by the conversations and sessions I had the privilege to be a part of.

There are a growing number of algorithms that are deciding what topics people’s attention should be given to. Algorithms are taking over the historical raison d’etre of news editors, generating top news lists, hot trends and personalized recommendations. Algorithms have the perception of being neutral, yet they encode political choices and have cultural values baked in. At a time when audience attention has become a scarce commodity, an algorithm’s ability to command user attention is true power within our media ecosystem. As curatorial power is handed over to automated systems, we must make sure that the public understands the biases at play and that product engineers are optimizing for the wanted outcome – an informed public – not just what generates traffic.

Human vs. Algorithm

An algorithm is a finite list of instructions that a machine performs in order to calculate a function. From simple counting operations to complex information sorting, a good algorithm is thought through and well defined to give the wanted output in the least computationally complex manner. Algorithms are extremely good at scale. They can be used to efficiently classify text from millions of documents within micro-seconds, extract images of a certain type, and identify complex correlations between multiple data points. Recommendation systems such as the ones used by Netflix and Amazon employ algorithms that learn about user preferences through their actions, and personalize the information presented for every user, an impossible task to be completed manually.

Algorithmically curated, personalized recommendations have become popular within digital media spaces. “Most read articles” modules are based on simple math: the top 10 articles in terms of page views. On the other hand, “hottest articles” lists are more ambiguous and vary based on what the organization defines as “hot”. Is it new content? Is it popular? Spiking? How far back is the data being compared? Are there white listed or blacklisted topics? Whats hot is an intuitive and very humane assessment of an ecosystem, yet a mathematically complex formula, if at all possible to reproduce.

Yet humans are still unbeatable for many types of tasks. Journalists and editors drive agendas, made up of qualities that are difficult to determine in a formula: trust, excitement, impression and intuition. Humans aren’t always rational, and may trust a source despite a bad reputation. The intuition that an experienced editor or journalist brings to the table could never be replaced by automated formulas.

Algorithmic Bias vs. Perception of Neutrality

As soon as digital information providers add any form of curation and recommendation mechanisms (a common practice within social network spaces), the technology loses its neutrality. In some ways, “Twitter’s trending topics algorithm acts like a lot of human news editors, who are more interested in the latest news rather than ongoing stories”, says Tarleton Gillespie of Cornell University. Values are coded into the way these systems make recommendations:

  • Twitter’s trending topics highlight novel events rather than events that slowly grow, simmer, thus making it very hard for events like Occupy Wall Street to trend, in comparison to events like Kim Kardashian’s wedding or Steve Job’s death which easily trend.
  • Google’s search algorithm was recently adjusted (Panda update) to highlight fresh content, affecting some 35% of all search queries.
  • Facebook is known to promote content that references any brand that is also one of their ad partners on people’s personal “walls”.

As these systems grow, a single engineer or product designer may not fully understand the logic behind all of the pieces that make up the whole. We’ve seen a number of examples where uninteded consequences of algorithmically designed results led to awkward outcomes, such as Amazon’s $23,698,655.93 priced book about flies or Google’s past ‘Florida release’ which had a catastrophic effect on a large number of websites, causing SMEs to go bankrupt. Mike Ananny describes how the Android marketplace recommended the “Sex Offender Search” application for anyone interested in Grindr, a gay dating app. And most recently, Siri’s inability to find abortion clinics in New York city.

These are not Google, Apple, Amazon or Twitter conspiracies, but rather the unexpected consequences of algorithmic recommendations being misaligned with people’s value systems and expectations of how the technology should work. The larger the gap between people’s expectations and the algorithmic output, the more user trust will be violated. Liz Strauss eloquently describes why she quit Klout, feeling cheated by an algorithm that constantly changes under her feet. She wanted to trust the algorithm, even through initial doubts, but broke down and quit after multiple algorithm changes.

As designers and builders of these technologies, we need to strike a fine balance between making sure our users understand enough about the choices we encode into our algorithms, but not too much to enable them to game the system. People’s perception affects trust. And once trust is violated, it is incredibly difficult to gain back. There’s a misplaced faith in the algorithm, assuming that the algorithm should accurately represent what we think is true.

Ryan Rawson's tweet in response to claims that Twitter is censoring #OWS from trending

Ryan Rawson's tweet in response to claims that Twitter is censoring #OWS from trending

While it is clear for technologists that algorithms are biased, the general public perception is that of neutrality. Someone at News FOO brought up the famous Rumsfeld quote, adding that it is the unknown unknowns that we should be most worried about. When people don’t know that they don’t know how the algorithms that govern their interfaces work, they may get burned, angry and blame the technology.

Claire Diaz Ortiz leads social innovation at Twitter and is constantly managing the gap between people's expectation of its Trending Topics algorithm

Claire Diaz Ortiz leads social innovation at Twitter and is constantly managing the gap between people's expectation of its Trending Topics algorithm

The Augmented Journalist

We need to be thinking about hybrid approaches. On the news production side, how do we utilize algorithms for scale while using journalists and editors for compelling narratives and thoughtful judgement. Algorithmic Investigative Journalism may hold a treasure trove of possibilities for new types of stories, where journalists will use the output of a complex data query to feed their intuitions and draw conclusions from correlations in the data. Tom Lee at Sunlight Labs is doing an amazing job pushing projects that derive insight from big data, while Kris Hammond uses machines to write stories where automation is possible.

On the flip side, we need to make sure the general public has a better understanding of the algorithms at play, the algorithms that feed their attention, without giving away too much of the special sauce. We must come up with the right vocabulary to define editorial workflows, and work with engineers to code them into the algorithms. As danah boyd stressed during the session, it is important to be constantly thinking through what we’re optimizing for. The editor and journalist’s job is to inform the public. Is it possible to design and implement algorithms that optimize for an informed public? How do we even start to quantify a person’s level of “informed-ness”?

Pete Skomoroch posts an important question

Pete Skomoroch raises a similar question. We need to strike the right balance between automated news personalization and curated, editorialized feeds. Advanced chess (or computer-assisted chess) is a relatively new form of chess, wherein each human player uses a computer chess program to help explore the possible results of candidate moves. The human players, despite this computer assistance, are still fully in control of what moves their “team” (of one human and one computer) make. What would the augmented journalist or editor look like? How can technology and algorithms be used effectively in the newsroom to inform both journalists and the general public?

The conversation should not be focused on humans vs. algorithms, but rather how we utilize algorithms to take our media ecosystem to the next level.

The Anatomy of a Viral Tweet

Here’s the crib of my 140#conf NYC talk, given on June 15th at the 92nd st Y:

I’m here to talk to you about my work on digital audiences, with a focus on information flows. I’m sure that to this crowd I don’t have to stress the potential that social media is unlocking. Whether you’re a brand, knitting circle or just an individual surfing the web, social media is an invaluable medium to seek and disseminate important information in realtime.

We are all part of the emerging information economy, building and using applications that create overflowing streams of information. Social network sites create compelling spaces, where social interactions act as lubricants, accelerating the flow of information. Users are encouraged to respond, add to, consume and redirect content. As information flows by, some may grab a piece when it is most relevant, valuable, entertaining or insightful, and at times, choose to pass it onwards.

Attention = Power

While the threshold to publishing nears zero, attention has become the bottleneck. One cannot demand attention anymore, or expect to have it at certain times of the day. We all need to understand the preferences and behavior of our respective audiences, and adapt our own behavior in order to attract the attention of others. The ability to attract attention is power, and in this 140-character economy, understanding how people manage their attention is incredibly powerful.

Because information spreads through people, networks of friends, fans and followers, by understanding information flows we have the ability to unlock insight about where people place their attention. Some data spread at an unpredictable viral speed, while the majority are only seen by a handful. In order for messages to propagate, people along the way must be attentive: notice them at the right time, and pass them onwards. How this happens is the million dollar question. Here are some examples:

Gaining your Network’s Trust

This is a visualization from a recent study we published about the spread of the tweet on the Osama Bin Laden operation. Media that monday morning was focused on the story that “Twitter broke the news”. Over an hour before the formal white house announcement, people on twitter had figured out that it was Bin Laden. There was much speculation on why the presidential announcement had to take place on sunday night. Some were on the Gaddafi side, and others, Bin Laden.

It was a single tweet that triggered an in credibly fast information cascade. A single tweet from Keith Urbahn, Donald Rumsfeld’s chief of staff, drew 80 retweets within a minute, and generated over 300 within two. This message spread like wildfire. (see study for more detail)

Before May 1st, not even the smartest of machine learning algorithms could have predicted Keith Urbahn’s likelihood to spread information on this topic, or his potential to spark an incredibly viral information flow. While politicos “in the know” certainly knew him or of him, his previous interactions and size and nature of his social graph did little to reflect his potential to generate thousands of people’s willingness to trust within a matter of minutes.

Tight Knit Network or “Tribe”

Another example of an interesting information spread is that of Urban Outfitters vs. the NYC crafters community. In this case, the ‘I heart NYC’ necklace design was ripped off an independent designer by UO. The artists put up a blog post, and Amber Karnes published the following post to Twitter:

my boycott urban outfitters tweet

This led to an avalanche of reactions, to the level that her username was trending in LA, Portland, New York, Toronto and then the United States.

In an incredibly insightful post, Amber wrote:

I am not a Twitter celebrity by any means. I barely had over 1,000 followers when the day began and I’m pretty sure about 200 of those are spam-bots. What I do have – and the reason that my call for a boycott on Urban Outfitters spread so fast and wide – is a tribe. A tight knit group of independent artists and crafters that follow me. My cause resounded with them. They spread it, and their friends spread it, and a few big influencers on Twitter spread it, and then it was gone.

Topic + Network + Timing

We see this over and over again: The right social-professional networked audience, along with a relevant piece of information, all at the right time, led to an explosion of public affirmation, many times, unexpected by the author.

Paradox of Social Networks

While networked sociality promises us equal opportunity, the ease and frictionless connection to literally anyone across the globe. But what plays out is in effect very different. As James Gleik notes in his seminal book ‘The Information’:  “The structure of the social web stands upon a paradox. Everything is close, and everything is far at the same time.”

These small world networks usually offer 4 degrees of separation. And even though the distances between people may seem short, finding the right route that will provide us with the wanted outcome is extremely difficult. This is why cyberspace can feel not just crowded but lonely .You can drop a stone into a well and never hear a splash. But alternatively, you can be received with a flood of water. And while the latter is less common, the more people spend their time in SNS, we’re seeing that happen again and again.

the promise of data

Whether you’re interested in socializing or in selling a product, understanding your network’s habits around information consumption and production is imperative to attaining people’s attention, and building an engaged audience. We all build these mental models in our heads, imagining our invisible audiences; the people who give us attention. But as long as its all in our heads, it doesn’t scale. We need to build and use tools that drive insight and help us find effective ways to makes sense of all the digital breadcrumbs left by our online audiences.

Psyched to be working at the heart of this.

Goodbye Microsoft, Hello SocialFlow – Betaworks!

Today is my last day at Microsoft.
In two weeks I’m joining SocialFlow, a hot Betaworks startup, as VP R&D. ::bounce::

SocialFlow is working on truly innovative ways to optimize communications through social media. There I’ll be playing with lots and lots of data, working on data analysis, methodology and identifying interesting patterns that emerge from social streams (Twitter, Facebook, bit.ly…). Additionally, I’ll be designing and building tools that optimize interactions with online audiences in these spaces. I’m a big believer in SocialFlow’s potential impact, and am absolutely psyched to be joining the crew. Good to be in a startup again, but thrilled to be part of the Betaworks family. Some of the smartest people in social data analysis sit in that incubation space, providing droolworthy opportunities to learn and collaborate. (I’ll be writing much more about SocialFlow soon enough)

With all the excitement, I have to say, leaving Microsoft is not easy. The past two years have been an incredible adventure. As part of the labs, I’ve had quite a unique experience for a Softie. I got a chance to work with teams across the org. From Microsoft Research to Bing, office, XBOX, Israel Labs, Online Services and even the new Microsoft Retail Stores, I worked closely with so many passionate, smart people in this company. We worked in small agile teams, with many chances to get my hands dirty and contribute code even as a Program Manager.

I had numerous opportunities to take part in industry events. I presented a thorough Twitter research paper at HICSS, and displayed interactive visualization work at both TED Active and the Summit Series. I showed our work at the Twitter #140 conference, and MSR’s TechFest. I worked on an embeddable Twitter visualization kit, FUSE Social Gadgets, and helped design and build the interactive displays at the Microsoft Retail Stores. When I look back at the breadth of projects I contributed to over the past two years, I’m extremely thankful.

I feel like I’ve grown in leaps and bounds when I compare present me to pre-Microsoft me. Much more focused, with an understanding of team dynamics and how to prioritize features to get the product out the door as fast as possible. Learning soft skills of group collaboration, and also managing to play with the politics of the larger org. Sure, the massive machine gets frustrating at times, but the impact and reach of the company’s products makes it all worthwhile.

My biggest frustration was definitely being far away from the mothership. Had I lived in Redmond or Seattle, there’s little chance I would have considered leaving Microsoft. With all of its advanced communication and co-presence technology, it is still extremely challenging to have substantial impact on product at Microsoft when not in Redmond. You’re always calling in or an afterthought when meetings get shuffled around. I’d seriously consider against joining a large corporation again, unless I’m located at the headquarters.

So a new adventure begins. I’ll be moving down to NYC mid March, and have already started looking for apartments in chelsea or the village. d is staying in Boston and will be commuting down to NYC for the weekends. Not ideal, but I’m confident we can do it. The up side is that weekends will have to become “work-less”, which is probably a very, very good thing.

I’m deeply thankful to Microsoft for everything. All the opportunities and the wicked smart friends I was lucky to get to know.

Here’s to old friendships and new beginnings.

::gulp::

Reaction to Brian Solis’s Interest Graphs

I was pointed to the following blog post by Brian Solis, The Interest graph on Twitter is Alive – studying Starbucks top followers. Brian’s post defines an “interest graph” as a subset of the social graph around a certain topic. He claims that while Social Graphs of follower/following relationships were interesting, the interest graph is a step beyond, it is a focused network that shares “more than just a relationship”. While I’m a strong believer in topical graphs (I’ve been creating them for various analyses over the past couple years), I find his argument generalized and terminology around data problematic. There’s a lot of hand waiving, and little acknowledgement of the assumptions and biases of the data that’s coming from public Twitter profiles.

“While we are what we say in our Tweets, our bios also reveal a telling side of who we really are.”

Our tweets represent moments of time in which we displayed interest in a topic, person or thing, while our bios represent our aspirational selves. There are so many more people who write “activist” or “blogger” in their bio, whom in real-life wouldn’t be considered either one of those. Additionally, stated location is usually only updated upon profile creation and never touched again, thus making it obsolete in a highly mobile society. Only about 10% of users share geo-location while Tweeting, which leaves us with a lot of guess-timation work.

Statistical Bias
It is extremely important to keep in mind that Twitter is used by a subset of the population. Of course many users will use the terms ‘geek’, ‘technology’ or ‘social media’ in their bios! There’s substantial statistical bias when looking at Twitter across the board, so as a brand, you must not consider a Twitter audience as representative or your real life audience, but rather a slice.

Shminfluencers
Solis throws the term ‘influencers’ around. In one case, he links to this ReSearch.ly page that supposedly points to “Starbucks influencers”. However all I can see on top are spam bots

spam

or foursquare checkins:

checkin

Or people who mention the word ‘starbucks’ in their tweet. I see no “influencers” in this list, nor do I suspect any of these posts affected others to gain more interest in the brand. By generalizing across anyone who posts the term ‘starbucks’, research.ly is contaminating its data. And this is precisely my point of contention with Brian’s post.

Graphs of Influence

Influece is complex. Certainly not binary. Influence is represented by a hodgepodge of human behavior, social dynamics and serendipity. Many experts are trying to define it, and the truth is, there’s no recipe. Solis calls a version of this the ‘brand graph’ – a group of highly connected individuals within a given topic. Apparently the tool looks at users who mention the term ‘starbucks’ and then sees if their followers also mentioned the word ‘starbucks’. Assuming that this represents a transaction of “influence is naive.

1. How do you account for timing?
2. Do you even look at whose following who? Perhaps a user was influenced by another profile who mentioned “starbucks”?
3. Maybe there was no influence at all. There’s a well-known property within social networks called – homophily (birds of the same feather stick together). We tend to connect with people who are similar to us. Most likely my friends will talk about topics that interest me. Doesn’t mean that i’m influenced by them.
4. Even if we agree that user A mentioned ‘starbucks’ because she saw user B posting about starbucks, why do we automatically assume that this is influence?

Before using the term influence, we must understand and acknowledge where our data is coming from, and its statistical bias. We must understand that Twitter is a highly engaging conversational space. And if we’re seeing a conversation about a topic, there doesn’t necessarily have to be a transaction of “influence”. We shouldn’t use that term lightly.

Interest Graph + Social Graph = Magic

While both the social graph and the interest graphs are interesting on their own, the real magic happens when we put them together. By overlaying the dynamic topical discussions on top of the social graph, we are able to identify clusters of users engaged in conversation over a topic. By following the spread of these topics, or the information cascades, we are able to start mapping out the spread of topics across the network. And by analyzing structural positioning of users (within the graph), we can start to get a sense for their level of influence, in creating and sustaining information flows.

#Sidibouzid Twitter Hashtag: an analysis of the people spreading the news

There have been numerous articles and discussions on the role Twitter played during the recent Tunisia uprising. An excellent Techcrunch post by Alexia Tsotsis analyzed Twitter traffic over time (using data provided by backtype. According to their report, Tunisia related Twitter traffic peaked at 28 tweets per second, at 21:27:56 Tunisian time, a couple hours after the first reports that Tunisian president had left the country. At the end of the cycle, total tweets mentioning Tunisia were over 196K. Total tweets mentioning #sidibouzid (the provice where the protests started) were over 103K.

While this is great analysis on the content itself, I found little to no analysis of the participants on Twitter. Who are these people that chose to pass on and amplify messages? How did the information spread? Who were pivotal points that enabled this? By answering some of these questions can we reach a understanding on the role that Twitter plays in diffusing information to public attention around the world?

Participating Users

My dataset includes 170,000 Tweets all containing the term ‘#sidibouzid’, posted between Jan 12th and 19th by some 40,000 different Twitter users. This is not the complete dataset, but what I could grab using the public Twitter APIs. The following chart below maps out the distribution of Twitter users who joined the conversation by posting a message with the ‘#sidibouzid’ hashtag. We see a huge spike between Jan. 13th and 14th, reaching almost 12,000 new users at its peak. This is not surprising, given all the other analyses pointing to a huge spike in “attention” that the story received on Jan. 14th, when Ben Ali fled Tunisia.

first-time-users

Participation amongst users (i.e. – number of times users posted a message with the ‘#sidibouzid’ hashtag) follows a power-law distribution:
participation

Top 10 participants of the Hashtag (in terms of volume posted) are:

    Dima_Khatib (883) – Arab Journalist, Al Jazeera’s Latin America Correspondent
    ibnkafka (641) – Moroccan lawyer and Twitter enthusiast

Some of these accounts are broadcasting into the ether, like our top participant, griffinworks_3. This profile was only created on January 12th 2011, has since then posted around 4,000 Tweets, and has acquired only some 100 followers. From my dataset, looks like this profile got around 20 ReTweets between Jan. 15th – 18th. Not much activation, nor audience. The profile also doesn’t follow anyone else. Possibly a bot that auto-forwards content.

On the other hand, if we look at Dima_Khatib, an Arab journalist with Al Jazeera, we see an extremely active profile (over 9,000 posts) who is quite new to twitter (created mid October, 2010), but with a high following of almost 5,000, and a high rate of mentions/RTs (over 5,000 times).

User Bios

Using wordle to visualize the users profile information (the “write something about yourself” field), it is quite clear that as the events unravel and spread out to the world, we see a drastic shift in the kinds of people who are joining the hashtag. Dominating words that represent the initial Twitter participants are ‘Tunisian’, ‘journalist’, ‘politics’, ‘activist’, and a variety of French stop words:
wordle0

Once the topic started trending, we see the people joining the hashtag represented by the following words: ‘news’, ‘twitter’,'music’,'marketing’,'media’,'student’…
wordle2

Geographic Distribution

What can we learn about the spread of this topic by looking at people’s geographic location? If we had a precise indication of every profile’s exact location, this would be fascinating. My assumption is that we would see small discussions happening around the Middle East, France and Morocco in the days before the uprising. Relatives and Tunisian expats from neighboring countries sould be Tweeting about the events, much before they reach world headlines. Could we actually see how the conversation moves from being regional/local into global? And if so, what does that movement look like?

There are three profile attributes that can give us clues about someone’s location: 1) User inputed ‘location’ field 2) User inputed ‘time-zone’ field 3) geo-location. When a user creates a Twitter account, the Time Zone may be automatically updated to the current location (depending on browser and connection), otherwise it receives the default value of ‘Quito’. Tunisia and Paris share the same timezone (CET). If someone in Tunisia creates a new profile, their timezone may automatically be set to ‘Paris’. The location field has no default, while the timezone field receives a default value of ‘Quito’. This makes it extremely tricky to draw solid conclusions out of the timezone field.

Since only 15% of users enabled geo-location, I chose the location field as the best indicator. Since it has to be entered manually, it may not be the most updated location, especially if the profile travels, but at least indicates a solid connection between the user and a country. For this analysis I chose to look at all profiles who stated their location.

Its interesting to see how comparatively strong of a role Egypt and France play initially:

And then how Saudi Arabia, Indonesia the US and UK folks get heavily involved:

Social Graph and Connectedness

Knowing how an individual is embedded in the structure of groups within a network may be critical to understanding his/her behavior. For example, some people may act as “bridges” between groups (connectors or “brokers” of information). Others may have all of their relationships within a single group (locals or insiders). Some may be part of a tightly connected and closed elite, while others are completely isolated from this group. Such differences in the ways that individuals are embedded in the structure of groups within in a network can have profound consequences for the ways these “nodes” receive information or reach an opinion.

This is probably the most interesting part of the analysis, but also the most complex. I used the Twitter API to mine the publically available relationships between all hashtag participants. There are two important measures that I used to make sense of all this data:

    In Degree: how many users who participated in the hashtag are following this person. Effectively, how popular/reputable this person is within the group of all those participating.
    Clustering Coefficient: measures how closely clustered this person’s “neighborhood” is inter-connected. If all your followers and friends are friends with each other, your CC will equal one.

I chose two different participants so that I could map out their network and see what we can identify.

ifikra
The graph below represents Sami Ben Gharbia‘s network. Sami showed up as one of the most prominent Twitter users on January 13th. He was one of the most central nodes within the group of people who were passionately posting the ‘#sidibouzid’ hashtag prior to the peak of events. Sami shares a large chunk of his audience with two key users: an Egyptian journalist (mfatta7) and a Channel 4 News foreign affairs correspondent (jrug). This is a mapping of only his first degree followers and friends:
ifikra

Dima_Khatib
The following graph represents Twitter user Dima_Khatib‘s network. Dima_Khatib was one of the most active participants, posting over 800 messages to the hashtag. Dima is a journalist at Al Jazeera, and as I mentioned previously, is quite new to Twitter (began tweeting in October ’10). Dima shares a number of her audience with a fellow Al Jazeera journalist (Mskayyali):
Dima_Khatib

SBZ_news
SBZ_news is a profile that functions as a typical broadcast media outlet, with a very high in-count, yet a very low out-count (has many followers, and follows almost none). Whats interesting here is that its community of followers includes a number of key players, who themselves have a fairly large audience. This seems to have been an important source of information from the ground in Tunisia.
SBZ_news

What Next?

This post is merely touching the tip of the iceberg. There’s still so much that can be understood by slicing and dicing this data. As we start to grasp the power of Twitter as a worldwide information diffusion network, we must build tools that help analyze the structures that enable information to flow.

Understanding Information Flows: the True Power of Social Media

With all the excitement about Tunisia and the numerous debates on whether this was/is another “Twitter Revolution”, it was the perfect time to dig into Clay Shirky’s recently published piece ‘The Political Power of Social Media’ in the Journal for Foreign Affairs. I actually like the journal and usually buy a copy, but sadly there’s no existing text online, which means, the article is not part of the current debate (a shame!). Many agree that the revolution in Tunisia did not happen because of Twitter, nor did Twitter *actually* help much for those fighting in the streets of Tunis. While social media play an important role in easing the flow of information during and after the peak of events, Clay argues that there’s an important and usually overseen long-term effect that Social Media has in strengthening public spheres.

In the article, Shirky claims that the US government overestimates the value of access to information, particularly that hosted in the west, and underestimates the value of tools for local coordination. There’s a need to think of social media as long term tools that can strengthen civil society, and thus the public sphere. Clay argues that a strong public sphere plays a crucial role in social change. For example, communication tools during the Cold War did not cause governments to collapse, but they helped the people take power from the state when it was weak. They played a supporting role in social change by strengthening the public sphere. It is imperative for the US to rely on countries’ economic incentives to allow widespread media use. It should work for conditions that appeal to states’ self-interest rather than the contentious virtue of freedom, a way to create or strengthen countries’ public spheres.

Clay describes a fascinating study of political opinion by sociologists Elihu Katz and Paul Lazarsfeld:

in a study of political opinion after the 1948 US presidential elections, sociologists Elihu Katz and Paul Lazarsfeld discovered that mass media alone do not change people’s minds; instead there is a two-step process. Opinions are first transmitted by the media, and then they get echoed by friends, family members, and colleagues. It is in this second, social step that political opinions are formed. This is the step in which the Internet in general, and social media in particular, can make a difference. As with the printing press, the Internet spreads not just media consumption but media production as well – it allows people to privately and publicly articulate and debate a welter of conflicting views.

The fascinating thing about Twitter, is that for the first time, we are able to actually SEE some of these psychologically triggered processes happen. We see the described first step happen all the time: media outlets and corporations tend to broadcast messages using their accounts. These messages may or may not be picked up by the general audience who follows their accounts. But the second step is where things get really interesting. Posts may be picked up and echoed by friends, family members and colleagues, sometimes bounced around so much that the messages turn “viral”.

This second step, the social flow of ideas and opinions between people based on realtime public data is at the crux of an emerging new field that fuses machine learning and statistics with the social sciences. Access to information is important, but understanding information flows is truly powerful in order to do in-depth analyses of people’s behavior and create systems that are smarter and substantially more effective. Clay talks about a notion of ‘shared awareness’ – people who are part of intertwined networks, posting and consuming each other’s information. Shared awareness binds and strengthens groups, helping millions who are not part of any hierarchical organization spread messages and reach a common understanding. Understanding how people are inter-connected not only helps us build better systems, but also helps us get a sense for the strength of a country’s public sphere.

As the web continues to evolve into a dense network of social links, we need to focus on getting a better understanding of networked information flow. Additionally we must build tools that will help us slice and dice massive social graphs of nodes and edges. Whether a breaking news story, social coupon or a TV show, information flows are the underlying force powering the web, and affecting the DNA of our society. I am certain that making sense of them will bring huge rewards.