Over the past couple of weeks I’ve been honored to have my work has covered in a number of awesome publications.
Viral Information Flows / MIT Technology Review
Mike Orcutt of MIT’s Technology Review published a fantastic post, Information’s Social Highways, available in this month’s magazine as well as on their website. Mike got in touch with me over the summer. He wanted to highlight various interesting aspects of information dissemination within social networks, including their visual representations. We threw around a number of ideas and agreed that it’d be fantastic to identify a number of interesting information flows that emerged from Twitter, visualize, and highlight similarities and differences in the way their networks had formed.
There is no recipe for virality, says Gilad Lotan, head of R&D for a startup called SocialFlow, which aims to help clients from the Economist to Pepsi more effectively capture attention on Twitter. But the deluges of data that viral tweets generate hold potentially valuable insights into how and why certain things spread beyond their author’s network of regular contacts.
The article compared two very different information flows. The first, providing hot information about the Osama Bin Laden operation, was incredibly fast. Within a few minutes, there were over one thousand users reposting the message, along with prominent journalist accounts. In comparison, the second flow is one initiated by my close friend Deb Chachra. In reaction to the London authorities threatening to shut down Twitter during the riots this summer, she posted the following tweet:
Urban rioting existed before SMS/social media. You know what didn’t? Large-scale community cleanups, spontaneously organized within hours.
Her post went viral, but in a very different manner. Over a period of two and a half days, Deb’s tweet saw a sustained growth in the number of folks reposting it. Every few hours, the post would get a boost from someone with a large audience who reposted it, continuing on this way. While in the previous example, the path to an important curator (Brian Stelter) took one minute and not more than one hop, in Deb’s case, it was several hours and 11 hops before the message reached Graham Linehan (@Gilnner) who has a large audience with which the message resonated.
Mike wraps up the article, making the case for what we do at SocialFlow:
Being heard isn’t always easy in an age when anyone can become a broadcaster. But analyzing and visualizing such data helps SocialFlow guide customers about how, when, and what they should tweet to have the best chance of disseminating their messages widely.
News as a Process: how journalism works in the age of Twitter / GigaOm
Mathew Ingram published a piece called ‘News as a Process: how journalism works in the age of Twitter‘, on GigaOm covering our IJOC study – “The Revolutions Were Tweeted: information flows during the 2011 Tunisian and Egyptian revolutions“. Matthew highlights one of our key findings on homophily within Twitter’s media ecosystem: journalists tend to retweet other journalists, bloggers tend to retweet other bloggers, and so on). Finally, the article links to the visualization I posted on Global Voices, highlighting GV authors who were central figures in disseminating news about the turn of events during the height of the Tunisian and Egyptian revolutions.
Network of news dissemination during the Tunisian and Egyptian revolutions (green nodes are Global Voices authors)
Lastly, the Osama Bin Laden Twitter visualization that I worked on earlier in May 2011 was highlighted as one of Visualizing.org’s visualizations of the year. wo00t! For those of you not familiar with Visualizing.org, it is a fantastic community of creative folks with the goal of making data visualization more accessible to the general public. The site hosts hundreds of datasets, and encourages users to create visualizations through challenges which run on the website.
I’m extremely excited and humbled by the range of awesome coverage!
I just came back from News Foo, an un-conference for technologists, academics and journalists in Phoenix on the future of news. The following post details my thoughts, heavily inspired by the conversations and sessions I had the privilege to be a part of.
There are a growing number of algorithms that are deciding what topics people’s attention should be given to. Algorithms are taking over the historical raison d’etre of news editors, generating top news lists, hot trends and personalized recommendations. Algorithms have the perception of being neutral, yet they encode political choices and have cultural values baked in. At a time when audience attention has become a scarce commodity, an algorithm’s ability to command user attention is true power within our media ecosystem. As curatorial power is handed over to automated systems, we must make sure that the public understands the biases at play and that product engineers are optimizing for the wanted outcome – an informed public – not just what generates traffic.
Human vs. Algorithm
An algorithm is a finite list of instructions that a machine performs in order to calculate a function. From simple counting operations to complex information sorting, a good algorithm is thought through and well defined to give the wanted output in the least computationally complex manner. Algorithms are extremely good at scale. They can be used to efficiently classify text from millions of documents within micro-seconds, extract images of a certain type, and identify complex correlations between multiple data points. Recommendation systems such as the ones used by Netflix and Amazon employ algorithms that learn about user preferences through their actions, and personalize the information presented for every user, an impossible task to be completed manually.
Algorithmically curated, personalized recommendations have become popular within digital media spaces. “Most read articles” modules are based on simple math: the top 10 articles in terms of page views. On the other hand, “hottest articles” lists are more ambiguous and vary based on what the organization defines as “hot”. Is it new content? Is it popular? Spiking? How far back is the data being compared? Are there white listed or blacklisted topics? Whats hot is an intuitive and very humane assessment of an ecosystem, yet a mathematically complex formula, if at all possible to reproduce.
Yet humans are still unbeatable for many types of tasks. Journalists and editors drive agendas, made up of qualities that are difficult to determine in a formula: trust, excitement, impression and intuition. Humans aren’t always rational, and may trust a source despite a bad reputation. The intuition that an experienced editor or journalist brings to the table could never be replaced by automated formulas.
Algorithmic Bias vs. Perception of Neutrality
As soon as digital information providers add any form of curation and recommendation mechanisms (a common practice within social network spaces), the technology loses its neutrality. In some ways, “Twitter’s trending topics algorithm acts like a lot of human news editors, who are more interested in the latest news rather than ongoing stories”, says Tarleton Gillespie of Cornell University. Values are coded into the way these systems make recommendations:
Twitter’s trending topics highlight novel events rather than events that slowly grow, simmer, thus making it very hard for events like Occupy Wall Street to trend, in comparison to events like Kim Kardashian’s wedding or Steve Job’s death which easily trend.
Google’s search algorithm was recently adjusted (Panda update) to highlight fresh content, affecting some 35% of all search queries.
Facebook is known to promote content that references any brand that is also one of their ad partners on people’s personal “walls”.
As these systems grow, a single engineer or product designer may not fully understand the logic behind all of the pieces that make up the whole. We’ve seen a number of examples where uninteded consequences of algorithmically designed results led to awkward outcomes, such as Amazon’s $23,698,655.93 priced book about flies or Google’s past ‘Florida release’ which had a catastrophic effect on a large number of websites, causing SMEs to go bankrupt. Mike Ananny describes how the Android marketplace recommended the “Sex Offender Search” application for anyone interested in Grindr, a gay dating app. And most recently, Siri’s inability to find abortion clinics in New York city.
These are not Google, Apple, Amazon or Twitter conspiracies, but rather the unexpected consequences of algorithmic recommendations being misaligned with people’s value systems and expectations of how the technology should work. The larger the gap between people’s expectations and the algorithmic output, the more user trust will be violated. Liz Strauss eloquently describes why she quit Klout, feeling cheated by an algorithm that constantly changes under her feet. She wanted to trust the algorithm, even through initial doubts, but broke down and quit after multiple algorithm changes.
As designers and builders of these technologies, we need to strike a fine balance between making sure our users understand enough about the choices we encode into our algorithms, but not too much to enable them to game the system. People’s perception affects trust. And once trust is violated, it is incredibly difficult to gain back. There’s a misplaced faith in the algorithm, assuming that the algorithm should accurately represent what we think is true.
Ryan Rawson's tweet in response to claims that Twitter is censoring #OWS from trending
While it is clear for technologists that algorithms are biased, the general public perception is that of neutrality. Someone at News FOO brought up the famous Rumsfeld quote, adding that it is the unknown unknowns that we should be most worried about. When people don’t know that they don’t know how the algorithms that govern their interfaces work, they may get burned, angry and blame the technology.
Claire Diaz Ortiz leads social innovation at Twitter and is constantly managing the gap between people's expectation of its Trending Topics algorithm
The Augmented Journalist
We need to be thinking about hybrid approaches. On the news production side, how do we utilize algorithms for scale while using journalists and editors for compelling narratives and thoughtful judgement. Algorithmic Investigative Journalism may hold a treasure trove of possibilities for new types of stories, where journalists will use the output of a complex data query to feed their intuitions and draw conclusions from correlations in the data. Tom Lee at Sunlight Labs is doing an amazing job pushing projects that derive insight from big data, while Kris Hammond uses machines to write stories where automation is possible.
On the flip side, we need to make sure the general public has a better understanding of the algorithms at play, the algorithms that feed their attention, without giving away too much of the special sauce. We must come up with the right vocabulary to define editorial workflows, and work with engineers to code them into the algorithms. As danah boyd stressed during the session, it is important to be constantly thinking through what we’re optimizing for. The editor and journalist’s job is to inform the public. Is it possible to design and implement algorithms that optimize for an informed public? How do we even start to quantify a person’s level of “informed-ness”?
Pete Skomoroch raises a similar question. We need to strike the right balance between automated news personalization and curated, editorialized feeds. Advanced chess (or computer-assisted chess) is a relatively new form of chess, wherein each human player uses a computer chess program to help explore the possible results of candidate moves. The human players, despite this computer assistance, are still fully in control of what moves their “team” (of one human and one computer) make. What would the augmented journalist or editor look like? How can technology and algorithms be used effectively in the newsroom to inform both journalists and the general public?
The conversation should not be focused on humans vs. algorithms, but rather how we utilize algorithms to take our media ecosystem to the next level.
Here’s the crib of my 140#conf NYC talk, given on June 15th at the 92nd st Y:
I’m here to talk to you about my work on digital audiences, with a focus on information flows. I’m sure that to this crowd I don’t have to stress the potential that social media is unlocking. Whether you’re a brand, knitting circle or just an individual surfing the web, social media is an invaluable medium to seek and disseminate important information in realtime.
We are all part of the emerging information economy, building and using applications that create overflowing streams of information. Social network sites create compelling spaces, where social interactions act as lubricants, accelerating the flow of information. Users are encouraged to respond, add to, consume and redirect content. As information flows by, some may grab a piece when it is most relevant, valuable, entertaining or insightful, and at times, choose to pass it onwards.
Attention = Power
While the threshold to publishing nears zero, attention has become the bottleneck. One cannot demand attention anymore, or expect to have it at certain times of the day. We all need to understand the preferences and behavior of our respective audiences, and adapt our own behavior in order to attract the attention of others. The ability to attract attention is power, and in this 140-character economy, understanding how people manage their attention is incredibly powerful.
Because information spreads through people, networks of friends, fans and followers, by understanding information flows we have the ability to unlock insight about where people place their attention. Some data spread at an unpredictable viral speed, while the majority are only seen by a handful. In order for messages to propagate, people along the way must be attentive: notice them at the right time, and pass them onwards. How this happens is the million dollar question. Here are some examples:
Gaining your Network’s Trust
This is a visualization from a recent study we published about the spread of the tweet on the Osama Bin Laden operation. Media that monday morning was focused on the story that “Twitter broke the news”. Over an hour before the formal white house announcement, people on twitter had figured out that it was Bin Laden. There was much speculation on why the presidential announcement had to take place on sunday night. Some were on the Gaddafi side, and others, Bin Laden.
It was a single tweet that triggered an in credibly fast information cascade. A single tweet from Keith Urbahn, Donald Rumsfeld’s chief of staff, drew 80 retweets within a minute, and generated over 300 within two. This message spread like wildfire. (see study for more detail)
Before May 1st, not even the smartest of machine learning algorithms could have predicted Keith Urbahn’s likelihood to spread information on this topic, or his potential to spark an incredibly viral information flow. While politicos “in the know” certainly knew him or of him, his previous interactions and size and nature of his social graph did little to reflect his potential to generate thousands of people’s willingness to trust within a matter of minutes.
Tight Knit Network or “Tribe”
Another example of an interesting information spread is that of Urban Outfitters vs. the NYC crafters community. In this case, the ‘I heart NYC’ necklace design was ripped off an independent designer by UO. The artists put up a blog post, and Amber Karnes published the following post to Twitter:
This led to an avalanche of reactions, to the level that her username was trending in LA, Portland, New York, Toronto and then the United States.
I am not a Twitter celebrity by any means. I barely had over 1,000 followers when the day began and I’m pretty sure about 200 of those are spam-bots. What I do have – and the reason that my call for a boycott on Urban Outfitters spread so fast and wide – is a tribe. A tight knit group of independent artists and crafters that follow me. My cause resounded with them. They spread it, and their friends spread it, and a few big influencers on Twitter spread it, and then it was gone.
Topic + Network + Timing
We see this over and over again: The right social-professional networked audience, along with a relevant piece of information, all at the right time, led to an explosion of public affirmation, many times, unexpected by the author.
Paradox of Social Networks
While networked sociality promises us equal opportunity, the ease and frictionless connection to literally anyone across the globe. But what plays out is in effect very different. As James Gleik notes in his seminal book ‘The Information’: “The structure of the social web stands upon a paradox. Everything is close, and everything is far at the same time.”
These small world networks usually offer 4 degrees of separation. And even though the distances between people may seem short, finding the right route that will provide us with the wanted outcome is extremely difficult. This is why cyberspace can feel not just crowded but lonely .You can drop a stone into a well and never hear a splash. But alternatively, you can be received with a flood of water. And while the latter is less common, the more people spend their time in SNS, we’re seeing that happen again and again.
the promise of data
Whether you’re interested in socializing or in selling a product, understanding your network’s habits around information consumption and production is imperative to attaining people’s attention, and building an engaged audience. We all build these mental models in our heads, imagining our invisible audiences; the people who give us attention. But as long as its all in our heads, it doesn’t scale. We need to build and use tools that drive insight and help us find effective ways to makes sense of all the digital breadcrumbs left by our online audiences.
Today is my last day at Microsoft.
In two weeks I’m joining SocialFlow, a hot Betaworks startup, as VP R&D. ::bounce::
SocialFlow is working on truly innovative ways to optimize communications through social media. There I’ll be playing with lots and lots of data, working on data analysis, methodology and identifying interesting patterns that emerge from social streams (Twitter, Facebook, bit.ly…). Additionally, I’ll be designing and building tools that optimize interactions with online audiences in these spaces. I’m a big believer in SocialFlow’s potential impact, and am absolutely psyched to be joining the crew. Good to be in a startup again, but thrilled to be part of the Betaworks family. Some of the smartest people in social data analysis sit in that incubation space, providing droolworthy opportunities to learn and collaborate. (I’ll be writing much more about SocialFlow soon enough)
With all the excitement, I have to say, leaving Microsoft is not easy. The past two years have been an incredible adventure. As part of the labs, I’ve had quite a unique experience for a Softie. I got a chance to work with teams across the org. From Microsoft Research to Bing, office, XBOX, Israel Labs, Online Services and even the new Microsoft Retail Stores, I worked closely with so many passionate, smart people in this company. We worked in small agile teams, with many chances to get my hands dirty and contribute code even as a Program Manager.
I had numerous opportunities to take part in industry events. I presented a thorough Twitter research paper at HICSS, and displayed interactive visualization work at both TED Active and the Summit Series. I showed our work at the Twitter #140 conference, and MSR’s TechFest. I worked on an embeddable Twitter visualization kit, FUSE Social Gadgets, and helped design and build the interactive displays at the Microsoft Retail Stores. When I look back at the breadth of projects I contributed to over the past two years, I’m extremely thankful.
I feel like I’ve grown in leaps and bounds when I compare present me to pre-Microsoft me. Much more focused, with an understanding of team dynamics and how to prioritize features to get the product out the door as fast as possible. Learning soft skills of group collaboration, and also managing to play with the politics of the larger org. Sure, the massive machine gets frustrating at times, but the impact and reach of the company’s products makes it all worthwhile.
My biggest frustration was definitely being far away from the mothership. Had I lived in Redmond or Seattle, there’s little chance I would have considered leaving Microsoft. With all of its advanced communication and co-presence technology, it is still extremely challenging to have substantial impact on product at Microsoft when not in Redmond. You’re always calling in or an afterthought when meetings get shuffled around. I’d seriously consider against joining a large corporation again, unless I’m located at the headquarters.
So a new adventure begins. I’ll be moving down to NYC mid March, and have already started looking for apartments in chelsea or the village. d is staying in Boston and will be commuting down to NYC for the weekends. Not ideal, but I’m confident we can do it. The up side is that weekends will have to become “work-less”, which is probably a very, very good thing.
I’m deeply thankful to Microsoft for everything. All the opportunities and the wicked smart friends I was lucky to get to know.
I was pointed to the following blog post by Brian Solis, The Interest graph on Twitter is Alive – studying Starbucks top followers. Brian’s post defines an “interest graph” as a subset of the social graph around a certain topic. He claims that while Social Graphs of follower/following relationships were interesting, the interest graph is a step beyond, it is a focused network that shares “more than just a relationship”. While I’m a strong believer in topical graphs (I’ve been creating them for various analyses over the past couple years), I find his argument generalized and terminology around data problematic. There’s a lot of hand waiving, and little acknowledgement of the assumptions and biases of the data that’s coming from public Twitter profiles.
“While we are what we say in our Tweets, our bios also reveal a telling side of who we really are.”
Our tweets represent moments of time in which we displayed interest in a topic, person or thing, while our bios represent our aspirational selves. There are so many more people who write “activist” or “blogger” in their bio, whom in real-life wouldn’t be considered either one of those. Additionally, stated location is usually only updated upon profile creation and never touched again, thus making it obsolete in a highly mobile society. Only about 10% of users share geo-location while Tweeting, which leaves us with a lot of guess-timation work.
Statistical Bias
It is extremely important to keep in mind that Twitter is used by a subset of the population. Of course many users will use the terms ‘geek’, ‘technology’ or ’social media’ in their bios! There’s substantial statistical bias when looking at Twitter across the board, so as a brand, you must not consider a Twitter audience as representative or your real life audience, but rather a slice.
Shminfluencers
Solis throws the term ‘influencers’ around. In one case, he links to this ReSearch.ly page that supposedly points to “Starbucks influencers”. However all I can see on top are spam bots
or foursquare checkins:
Or people who mention the word ’starbucks’ in their tweet. I see no “influencers” in this list, nor do I suspect any of these posts affected others to gain more interest in the brand. By generalizing across anyone who posts the term ’starbucks’, research.ly is contaminating its data. And this is precisely my point of contention with Brian’s post.
Graphs of Influence
Influece is complex. Certainly not binary. Influence is represented by a hodgepodge of human behavior, social dynamics and serendipity. Many experts are trying to define it, and the truth is, there’s no recipe. Solis calls a version of this the ‘brand graph’ – a group of highly connected individuals within a given topic. Apparently the tool looks at users who mention the term ’starbucks’ and then sees if their followers also mentioned the word ’starbucks’. Assuming that this represents a transaction of “influence is naive.
1. How do you account for timing?
2. Do you even look at whose following who? Perhaps a user was influenced by another profile who mentioned “starbucks”?
3. Maybe there was no influence at all. There’s a well-known property within social networks called – homophily (birds of the same feather stick together). We tend to connect with people who are similar to us. Most likely my friends will talk about topics that interest me. Doesn’t mean that i’m influenced by them.
4. Even if we agree that user A mentioned ’starbucks’ because she saw user B posting about starbucks, why do we automatically assume that this is influence?
Before using the term influence, we must understand and acknowledge where our data is coming from, and its statistical bias. We must understand that Twitter is a highly engaging conversational space. And if we’re seeing a conversation about a topic, there doesn’t necessarily have to be a transaction of “influence”. We shouldn’t use that term lightly.
Interest Graph + Social Graph = Magic
While both the social graph and the interest graphs are interesting on their own, the real magic happens when we put them together. By overlaying the dynamic topical discussions on top of the social graph, we are able to identify clusters of users engaged in conversation over a topic. By following the spread of these topics, or the information cascades, we are able to start mapping out the spread of topics across the network. And by analyzing structural positioning of users (within the graph), we can start to get a sense for their level of influence, in creating and sustaining information flows.
There have been numerous articles and discussions on the role Twitter played during the recent Tunisia uprising. An excellent Techcrunch post by Alexia Tsotsis analyzed Twitter traffic over time (using data provided by backtype. According to their report, Tunisia related Twitter traffic peaked at 28 tweets per second, at 21:27:56 Tunisian time, a couple hours after the first reports that Tunisian president had left the country. At the end of the cycle, total tweets mentioning Tunisia were over 196K. Total tweets mentioning #sidibouzid (the provice where the protests started) were over 103K.
While this is great analysis on the content itself, I found little to no analysis of the participants on Twitter. Who are these people that chose to pass on and amplify messages? How did the information spread? Who were pivotal points that enabled this? By answering some of these questions can we reach a understanding on the role that Twitter plays in diffusing information to public attention around the world?
Participating Users
My dataset includes 170,000 Tweets all containing the term ‘#sidibouzid’, posted between Jan 12th and 19th by some 40,000 different Twitter users. This is not the complete dataset, but what I could grab using the public Twitter APIs. The following chart below maps out the distribution of Twitter users who joined the conversation by posting a message with the ‘#sidibouzid’ hashtag. We see a huge spike between Jan. 13th and 14th, reaching almost 12,000 new users at its peak. This is not surprising, given all the other analyses pointing to a huge spike in “attention” that the story received on Jan. 14th, when Ben Ali fled Tunisia.
Participation amongst users (i.e. – number of times users posted a message with the ‘#sidibouzid’ hashtag) follows a power-law distribution:
Top 10 participants of the Hashtag (in terms of volume posted) are:
Some of these accounts are broadcasting into the ether, like our top participant, griffinworks_3. This profile was only created on January 12th 2011, has since then posted around 4,000 Tweets, and has acquired only some 100 followers. From my dataset, looks like this profile got around 20 ReTweets between Jan. 15th – 18th. Not much activation, nor audience. The profile also doesn’t follow anyone else. Possibly a bot that auto-forwards content.
On the other hand, if we look at Dima_Khatib, an Arab journalist with Al Jazeera, we see an extremely active profile (over 9,000 posts) who is quite new to twitter (created mid October, 2010), but with a high following of almost 5,000, and a high rate of mentions/RTs (over 5,000 times).
User Bios
Using wordle to visualize the users profile information (the “write something about yourself” field), it is quite clear that as the events unravel and spread out to the world, we see a drastic shift in the kinds of people who are joining the hashtag. Dominating words that represent the initial Twitter participants are ‘Tunisian’, ‘journalist’, ‘politics’, ‘activist’, and a variety of French stop words:
Once the topic started trending, we see the people joining the hashtag represented by the following words: ‘news’, ‘twitter’,'music’,'marketing’,'media’,’student’…
Geographic Distribution
What can we learn about the spread of this topic by looking at people’s geographic location? If we had a precise indication of every profile’s exact location, this would be fascinating. My assumption is that we would see small discussions happening around the Middle East, France and Morocco in the days before the uprising. Relatives and Tunisian expats from neighboring countries sould be Tweeting about the events, much before they reach world headlines. Could we actually see how the conversation moves from being regional/local into global? And if so, what does that movement look like?
There are three profile attributes that can give us clues about someone’s location: 1) User inputed ‘location’ field 2) User inputed ‘time-zone’ field 3) geo-location. When a user creates a Twitter account, the Time Zone may be automatically updated to the current location (depending on browser and connection), otherwise it receives the default value of ‘Quito’. Tunisia and Paris share the same timezone (CET). If someone in Tunisia creates a new profile, their timezone may automatically be set to ‘Paris’. The location field has no default, while the timezone field receives a default value of ‘Quito’. This makes it extremely tricky to draw solid conclusions out of the timezone field.
Since only 15% of users enabled geo-location, I chose the location field as the best indicator. Since it has to be entered manually, it may not be the most updated location, especially if the profile travels, but at least indicates a solid connection between the user and a country. For this analysis I chose to look at all profiles who stated their location.
Its interesting to see how comparatively strong of a role Egypt and France play initially:
And then how Saudi Arabia, Indonesia the US and UK folks get heavily involved:
Social Graph and Connectedness
Knowing how an individual is embedded in the structure of groups within a network may be critical to understanding his/her behavior. For example, some people may act as “bridges” between groups (connectors or “brokers” of information). Others may have all of their relationships within a single group (locals or insiders). Some may be part of a tightly connected and closed elite, while others are completely isolated from this group. Such differences in the ways that individuals are embedded in the structure of groups within in a network can have profound consequences for the ways these “nodes” receive information or reach an opinion.
This is probably the most interesting part of the analysis, but also the most complex. I used the Twitter API to mine the publically available relationships between all hashtag participants. There are two important measures that I used to make sense of all this data:
In Degree: how many users who participated in the hashtag are following this person. Effectively, how popular/reputable this person is within the group of all those participating.
Clustering Coefficient: measures how closely clustered this person’s “neighborhood” is inter-connected. If all your followers and friends are friends with each other, your CC will equal one.
I chose two different participants so that I could map out their network and see what we can identify.
ifikra
The graph below represents Sami Ben Gharbia’s network. Sami showed up as one of the most prominent Twitter users on January 13th. He was one of the most central nodes within the group of people who were passionately posting the ‘#sidibouzid’ hashtag prior to the peak of events. Sami shares a large chunk of his audience with two key users: an Egyptian journalist (mfatta7) and a Channel 4 News foreign affairs correspondent (jrug). This is a mapping of only his first degree followers and friends:
Dima_Khatib
The following graph represents Twitter user Dima_Khatib’s network. Dima_Khatib was one of the most active participants, posting over 800 messages to the hashtag. Dima is a journalist at Al Jazeera, and as I mentioned previously, is quite new to Twitter (began tweeting in October ‘10). Dima shares a number of her audience with a fellow Al Jazeera journalist (Mskayyali):
SBZ_news
SBZ_news is a profile that functions as a typical broadcast media outlet, with a very high in-count, yet a very low out-count (has many followers, and follows almost none). Whats interesting here is that its community of followers includes a number of key players, who themselves have a fairly large audience. This seems to have been an important source of information from the ground in Tunisia.
What Next?
This post is merely touching the tip of the iceberg. There’s still so much that can be understood by slicing and dicing this data. As we start to grasp the power of Twitter as a worldwide information diffusion network, we must build tools that help analyze the structures that enable information to flow.
With all the excitement about Tunisia and the numerous debates on whether this was/is another “Twitter Revolution”, it was the perfect time to dig into Clay Shirky’s recently published piece ‘The Political Power of Social Media’ in the Journal for Foreign Affairs. I actually like the journal and usually buy a copy, but sadly there’s no existing text online, which means, the article is not part of the current debate (a shame!). Many agree that the revolution in Tunisia did not happen because of Twitter, nor did Twitter *actually* help much for those fighting in the streets of Tunis. While social media play an important role in easing the flow of information during and after the peak of events, Clay argues that there’s an important and usually overseen long-term effect that Social Media has in strengthening public spheres.
In the article, Shirky claims that the US government overestimates the value of access to information, particularly that hosted in the west, and underestimates the value of tools for local coordination. There’s a need to think of social media as long term tools that can strengthen civil society, and thus the public sphere. Clay argues that a strong public sphere plays a crucial role in social change. For example, communication tools during the Cold War did not cause governments to collapse, but they helped the people take power from the state when it was weak. They played a supporting role in social change by strengthening the public sphere. It is imperative for the US to rely on countries’ economic incentives to allow widespread media use. It should work for conditions that appeal to states’ self-interest rather than the contentious virtue of freedom, a way to create or strengthen countries’ public spheres.
Clay describes a fascinating study of political opinion by sociologists Elihu Katz and Paul Lazarsfeld:
in a study of political opinion after the 1948 US presidential elections, sociologists Elihu Katz and Paul Lazarsfeld discovered that mass media alone do not change people’s minds; instead there is a two-step process. Opinions are first transmitted by the media, and then they get echoed by friends, family members, and colleagues. It is in this second, social step that political opinions are formed. This is the step in which the Internet in general, and social media in particular, can make a difference. As with the printing press, the Internet spreads not just media consumption but media production as well – it allows people to privately and publicly articulate and debate a welter of conflicting views.
The fascinating thing about Twitter, is that for the first time, we are able to actually SEE some of these psychologically triggered processes happen. We see the described first step happen all the time: media outlets and corporations tend to broadcast messages using their accounts. These messages may or may not be picked up by the general audience who follows their accounts. But the second step is where things get really interesting. Posts may be picked up and echoed by friends, family members and colleagues, sometimes bounced around so much that the messages turn “viral”.
This second step, the social flow of ideas and opinions between people based on realtime public data is at the crux of an emerging new field that fuses machine learning and statistics with the social sciences. Access to information is important, but understanding information flows is truly powerful in order to do in-depth analyses of people’s behavior and create systems that are smarter and substantially more effective. Clay talks about a notion of ’shared awareness’ – people who are part of intertwined networks, posting and consuming each other’s information. Shared awareness binds and strengthens groups, helping millions who are not part of any hierarchical organization spread messages and reach a common understanding. Understanding how people are inter-connected not only helps us build better systems, but also helps us get a sense for the strength of a country’s public sphere.
As the web continues to evolve into a dense network of social links, we need to focus on getting a better understanding of networked information flow. Additionally we must build tools that will help us slice and dice massive social graphs of nodes and edges. Whether a breaking news story, social coupon or a TV show, information flows are the underlying force powering the web, and affecting the DNA of our society. I am certain that making sense of them will bring huge rewards.
::Making Sense of the Ebbs and Flow of Social Data
Below are notes + slides of my presentation at the BRANDSconf. I’d like to acknowledge Hunter Whitney. Portions of this content were based on a discussion and an upcoming article he is writing on this topic (link coming shortly):
I’m extremely passionate about data analysis and design. My work focuses on the intersection of the two. I play with data, and figure out ways to make it more accessible to people. I’m here to talk about why the art of making sense of massive amounts of social data is critical not only for geeks like me, but any professional using Twitter. And my goal is to get YOU all excited about the opportunity that understanding data unveils for us.
Whether you’re a multi-national enterprise, a local deli or a mah-jong meetup, the proliferation of social network services like Twitter have created an expectation that you interact with your customers, users and followers. There’s an expectation to connect rather than broadcast. We’ve been hearing this over and over this morning – you are a brand. And as a brand you are expected to interact with your audience like a person would interact with others. You need to engage in conversations, provide and receive feedback, network, create hype, and do all this in a timely manner.
But how can we be expected to interact with an ever growing and diverse group of people when we can’t really “see” them?
Giving Shape to our Audience
Judith Donath of Harvard’s Berkman Center talks about human signaling and how that translates to digital spaces. I get a variety of signals from merely standing in front of you all – your age, what you’re wearing, how you’re feeling, whose smiling and whose already fallen asleep. Being here, with you, part of this event, I have context that helps me understand how best to interact with you all. I’ll happily switch to speaking Hebrew, but obviously that will not be helpful. Even the little bit that I know about you helps me make some useful assumptions – speak English, tune down the analytics/mathematics terms, tune up the user experience/brand jargon.
Social network spaces are fueled by social interactions. Think of people’s interactions online as digital breadcrumbs, trails of connections, likes, thoughts and opinions. By piecing together these crumbs we can start making sense of the people giving us attention on Social Network sites. We must use as much of the tools available to mine the data about our audience – location, time of day, language, interests. In order to interact with an audience we need to be able to sense it.
There are a variety of tools that give us this opportunity to mine content. This is only the first step. We need to put an emphasis on looking at the connections between people, and not only the content that is being published.
The Social Graph
Social Graph is a term that I’m certain you all will hear more and more as social network spaces become a fundamental component of our lives. A social graph is a dataset that represents people and their inter-connections within a group. Mark Zuckerberg is known for popularizing the term in his description of the value that Facebook Connect brings to websites. Facebook’s social graph is made up of you all who I’m sure have accounts, and all your connections. Additionally, that graph distinguishes between types of connections – whether colleagues, friends or family.
Twitter’s social graph is different. Its a directed, which means that connections have directions. The person who you follow does not necessarily follow you back. Twitter’s social graph is fascinating because it maps people’s interests; what people are willing to give their attention to. By understanding people’s interests over time as well as their interconnections, we have the ability to identify we can reveal valuable points such as (1) bridges: people who connect two distinct communities (2) influencers: those who can get their audience to participate (3) experts: people who specialize on a specific topic (4) hustlers: culture creators.
While it is fairly straightforward to aggregate large datasets, we are still challenged by making sense of graph based data. These constantly changing graph indexes are massive at scale and may require complex queries in realtime: whats the shortest path between person A and person B, whats the intersection between group C and D or whats the clustering coefficients amongst group E. Once calculated, these results reflect on the intricacies of people’s relationships, and shedding light on properties that directly affect their behavior: influence, trust, authority and personal preference.
Understanding information flows
In the social web, information spreads through people, networks of friends, fans and followers. Social network sites create compelling spaces where users feel comfortable to hang out, interact, consume, poke and publish. Social interactions lubricate the flow of information within these spaces, creating a plethora of dynamics. These spaces are filled with endless streams of content, encouraging users to participate, add to, consume from and redirect content. As information flows by, users grab content when it is most relevant, valuable, entertaining or insightful, and at times, choose to pass it on.
Because information flows through networks of people, attention has become a scarce commodity. This is truly a game changer. Media companies no longer control people’s attention, but are rather fighting for a smaller section of the pie. True power lies in understanding how information flows and its effect of where people choose to focus their attention. In order for messages to propagate through social networks, people along the way must be attentive to the pieces of information, see them at the right time, and pass them onwards.
Whether you’re interested in socializing or in selling a product, understanding people’s habits around information consumption and production is imperative to attaining people’s attention and building an audience. By leveraging the publicly available data around people’s practices, we can create services that shed a light on people’s habits and preferences. Additionally, by mining this data over time, we can infer their value in affecting information flows.
I’ve been following @jeffpulver for a while now and know that he’s quite generous in terms of attention. A great time to catch Jeff is in the morning (wherever he is), as he sends out a ‘good morning’ Tweet, there tend to be reciprocal pings and messages. I also know Jeff is interested in new developments in the Israeli startup scene. If I have any juicy piece of information on that topic, I’d make sure to post it, possibly with a /cc/ to Jeff, and ideally around his morning time. I have a mental model in my head, around Jeff’s practices in consuming and producing content.
We all do this, but can only capture so much in our heads. We need tools that scale and capture our networks as a whole and not just individuals. Remember, its not necessarily about the size of an audience or someone’s number of followers, but rather who they are and who they’re connected to.
That all sounds really great, but in effect, representing large graph datasets can easily get out of hand, however loved by geeks, usually becomes a tangled mass of lines and dots. We must remember that this data is beneficial only if people are able to make sense of it. We need to think about interfaces that will let us play with the data; slice and dice the parts that we deem relevant or interesting. In addition to an intuitive interface, we need controls that will help us dive into and observe patterns or connections that would have otherwise been hidden.
Closing
There are three points I want to make sure you all come out of this talk thinking about:
1) Mine Digital Breadcrumbs – use the exiting tools to get a sense for how our audience looks and its segmentation (I’ve made a oneforty kit here)
2) Social Graphs are Extremely Useful – yet complex to aggregate and mine.
3) understanding information flows is Powerful – especially as we’re shifting from broadcast mode to that of engagement
Social network analytics tools may fundamentally change the way we engage with our online audiences. We need to build better tools that do the above mentioned tasks. But I need people like you all to be vocal about your needs and frustrations. As we’re building out these technologies, we want to make sure they are tailored to real needs. We’re only at the start of the journey, and I’m super excited to be a part of it!
Towards the end of Bob Woodward’s Obama’s Wars, there’s a detailed description of an hour long meeting that the author had with President Obama at the oval office. He recounts the scene with such detail, that I felt as if I was there in the room. The body language, attitude, charisma and humor.
At the end of their meeting, Bob hands Obama a passage from The Day of Battle by Rick Atkinson, which I found both inspiring and saddening.
…for war was not just a military campaign but also a parable. There were lessons of camaraderie and beauty and inscrutable fate. There were lessons of honor and courage; of compassion and sacrifice. And then there was the saddest lesson to be learned again and again. That war is corrupting. That it corrodes the soul and tarnishes the spirit. That even the excellent and the superior can be defiled. That no heart can remain unstained…
Obama reads this quote, and responds by pointing Bob to his Nobel Peace Prize acceptance speech.
No matter how justified, war promises human tragedy. The soldiers courage and sacrifice is full of glory, expressing dedication to county, to a cause, to comrades in arms. But war itself is never glorious, and we must never trumpet it as such. So part of our challenge is reconciling these two seemingly irreconcilable truths.
That war is sometimes necessary.
And that war in some level is an expression of human falling.
War used to be such an dominant part of my reality. But now feels so distant. I’m not wishing for the stress, worry and fear that came along with that. What I am worried about is living in a country where there’s such a lack of concern and connection to where its own soldiers are fighting, or to the major fronts that see daily battles. From reading this book, I’m invigorated by Obama’s seeming concern to gather as much information as possible in order to make the best decision about the continuation of these wars. I see little hope in finding a policy that will not cost the US military many years and high involvement in Afghanistan and Iraq. To make things right, they must work with local communities, build trust and a solid social infrastructure, using counter insurgency techniques. However, with the general public so disengaged, how the heck are they going to pull it off?
Today I sat through an incredibly frustrating talk by the newly appointed Israel ambassador to the UN, Meron Reuben. Meron recently replaced Gabriela shalev who most likely resigned from her post after serving for the last two years. Meron gave a generic politico talk, spanning the ways in which Israel is helping the UN reach its millenium goals through its innovation in clean energy tech and agriculture. He also noted how tricky Israel’s relationship with the UN has been (historically) and how challenging of a role this is (I totally agree) – especially as the ambassador does not have any say in the political agenda, but merely represents the decisions made by the Israeli gov’t in front of the assembly.
The Q&A section was where things got both interesting and frustrating. Meron used the commonly heard Israeli political narrative. I’ll try to map out his main arguments, and then include my point of view.
Uneven Representation
“They have much more representation than Israel in the UN. There’s only one Israel and many Arab states.” This makes it extremely difficult to “be heard”, especially when your enemies repeat the same arguments over and over again. Meron mentioned that he can only speak so much, while the Arab nations have, in aggregate, substantially more time on the stage.
Antisemites are Out to Get Us
Meron uses the same techniques that politicians and the Israeli media know so well. He depicts Israel as the scapegoat, being harrassed and bullied. Constantly pointed at and given an unjustified amount of attention. He called this a “new form of anti-semitism”, something that he claims, is quite common in the UN.
Lies! Unlike us, They Don’t Fact Check
Meron claims that it has become hip to point a finger against Israel. “It is the trendy thing to do, especially if you’re part of the political left”, he claims. It is apealing for people to amplify messages that are anti-Israeli, even if they are not true, or fact-checked. When asked about how we can affect people’s perception about Israel, his response was that there’s not much we can do. That Israel is cautious and investigates claims, but by the time results and proof come back, nobody really cares anymore.
My take:
I am extremely weary of the language that Meron uses, which is reflective of the general way that Israeli politicians have been framing political reality in the Middle East. Creating an “us vs. them” narrative and never admitting any mistakes, but rather constantly justifying. Calling out anti-semitism whenever there is critique against the State of Israel is absurd and counter productive. You can only get away with that so many time before that term loses its value. Perhaps this is what one must do strategically when playing political power games, but it is certainly not convicing me to keep supporting the country that I would like to support.
Truth is, it’s driving me away.
Perception vs. Reality
Israeli politicians and diplomats are so focused on the hard facts that they are absent from the public discourse, and thus lose support worldwide. They need to be actively addressing events as they occur, engaging in conversation, and in effect “fighting” to affect people’s perception in real time. Because once an event occurs, and an opinion is engraved in someone’s mind, it is extremely difficult to change.
I’ve watched this happen too many times. Israel makes a military move that incites worldwide critique. In the first couple days, Israeli gov’t heavily controls all communications around the hot zone – always failing to completely stop the flow of information. Like in the flotilla incident, and during operation Cast Lead, only a number of formal military channels release information. People are left to asses the military sources, versus numerous leaks coming from Palestinians or activists under attack. Why would anyone be rooting for Israel in these cases? Instead of utilizing diplomats and representatives to engage in discussions with the public, Israel blocks all channels, and while it supposedly investigates all claims, the major source of real time information is coming from the other side.
You lose the battle over people’s perception.
And you lose the battle.