Hi I’m Gilad

I love data, analysis and visualization. Chief data scientist at beteaworks.

#Sidibouzid Twitter Hashtag: an analysis of the people spreading the news

There have been numerous articles and discussions on the role Twitter played during the recent Tunisia uprising. An excellent Techcrunch post by Alexia Tsotsis analyzed Twitter traffic over time (using data provided by backtype. According to their report, Tunisia related Twitter traffic peaked at 28 tweets per second, at 21:27:56 Tunisian time, a couple hours after the first reports that Tunisian president had left the country. At the end of the cycle, total tweets mentioning Tunisia were over 196K. Total tweets mentioning #sidibouzid (the provice where the protests started) were over 103K.

While this is great analysis on the content itself, I found little to no analysis of the participants on Twitter. Who are these people that chose to pass on and amplify messages? How did the information spread? Who were pivotal points that enabled this? By answering some of these questions can we reach a understanding on the role that Twitter plays in diffusing information to public attention around the world?

Participating Users

My dataset includes 170,000 Tweets all containing the term ‘#sidibouzid’, posted between Jan 12th and 19th by some 40,000 different Twitter users. This is not the complete dataset, but what I could grab using the public Twitter APIs. The following chart below maps out the distribution of Twitter users who joined the conversation by posting a message with the ‘#sidibouzid’ hashtag. We see a huge spike between Jan. 13th and 14th, reaching almost 12,000 new users at its peak. This is not surprising, given all the other analyses pointing to a huge spike in “attention” that the story received on Jan. 14th, when Ben Ali fled Tunisia.

first-time-users

Participation amongst users (i.e. – number of times users posted a message with the ‘#sidibouzid’ hashtag) follows a power-law distribution:
participation

Top 10 participants of the Hashtag (in terms of volume posted) are:

    Dima_Khatib (883) – Arab Journalist, Al Jazeera’s Latin America Correspondent
    ibnkafka (641) – Moroccan lawyer and Twitter enthusiast

Some of these accounts are broadcasting into the ether, like our top participant, griffinworks_3. This profile was only created on January 12th 2011, has since then posted around 4,000 Tweets, and has acquired only some 100 followers. From my dataset, looks like this profile got around 20 ReTweets between Jan. 15th – 18th. Not much activation, nor audience. The profile also doesn’t follow anyone else. Possibly a bot that auto-forwards content.

On the other hand, if we look at Dima_Khatib, an Arab journalist with Al Jazeera, we see an extremely active profile (over 9,000 posts) who is quite new to twitter (created mid October, 2010), but with a high following of almost 5,000, and a high rate of mentions/RTs (over 5,000 times).

User Bios

Using wordle to visualize the users profile information (the “write something about yourself” field), it is quite clear that as the events unravel and spread out to the world, we see a drastic shift in the kinds of people who are joining the hashtag. Dominating words that represent the initial Twitter participants are ‘Tunisian’, ‘journalist’, ‘politics’, ‘activist’, and a variety of French stop words:
wordle0

Once the topic started trending, we see the people joining the hashtag represented by the following words: ‘news’, ‘twitter’,’music’,’marketing’,’media’,’student’…
wordle2

Geographic Distribution

What can we learn about the spread of this topic by looking at people’s geographic location? If we had a precise indication of every profile’s exact location, this would be fascinating. My assumption is that we would see small discussions happening around the Middle East, France and Morocco in the days before the uprising. Relatives and Tunisian expats from neighboring countries sould be Tweeting about the events, much before they reach world headlines. Could we actually see how the conversation moves from being regional/local into global? And if so, what does that movement look like?

There are three profile attributes that can give us clues about someone’s location: 1) User inputed ‘location’ field 2) User inputed ‘time-zone’ field 3) geo-location. When a user creates a Twitter account, the Time Zone may be automatically updated to the current location (depending on browser and connection), otherwise it receives the default value of ‘Quito’. Tunisia and Paris share the same timezone (CET). If someone in Tunisia creates a new profile, their timezone may automatically be set to ‘Paris’. The location field has no default, while the timezone field receives a default value of ‘Quito’. This makes it extremely tricky to draw solid conclusions out of the timezone field.

Since only 15% of users enabled geo-location, I chose the location field as the best indicator. Since it has to be entered manually, it may not be the most updated location, especially if the profile travels, but at least indicates a solid connection between the user and a country. For this analysis I chose to look at all profiles who stated their location.

Its interesting to see how comparatively strong of a role Egypt and France play initially:

And then how Saudi Arabia, Indonesia the US and UK folks get heavily involved:

Social Graph and Connectedness

Knowing how an individual is embedded in the structure of groups within a network may be critical to understanding his/her behavior. For example, some people may act as “bridges” between groups (connectors or “brokers” of information). Others may have all of their relationships within a single group (locals or insiders). Some may be part of a tightly connected and closed elite, while others are completely isolated from this group. Such differences in the ways that individuals are embedded in the structure of groups within in a network can have profound consequences for the ways these “nodes” receive information or reach an opinion.

This is probably the most interesting part of the analysis, but also the most complex. I used the Twitter API to mine the publically available relationships between all hashtag participants. There are two important measures that I used to make sense of all this data:

    In Degree: how many users who participated in the hashtag are following this person. Effectively, how popular/reputable this person is within the group of all those participating.
    Clustering Coefficient: measures how closely clustered this person’s “neighborhood” is inter-connected. If all your followers and friends are friends with each other, your CC will equal one.

I chose two different participants so that I could map out their network and see what we can identify.

ifikra
The graph below represents Sami Ben Gharbia‘s network. Sami showed up as one of the most prominent Twitter users on January 13th. He was one of the most central nodes within the group of people who were passionately posting the ‘#sidibouzid’ hashtag prior to the peak of events. Sami shares a large chunk of his audience with two key users: an Egyptian journalist (mfatta7) and a Channel 4 News foreign affairs correspondent (jrug). This is a mapping of only his first degree followers and friends:
ifikra

Dima_Khatib
The following graph represents Twitter user Dima_Khatib‘s network. Dima_Khatib was one of the most active participants, posting over 800 messages to the hashtag. Dima is a journalist at Al Jazeera, and as I mentioned previously, is quite new to Twitter (began tweeting in October ’10). Dima shares a number of her audience with a fellow Al Jazeera journalist (Mskayyali):
Dima_Khatib

SBZ_news
SBZ_news is a profile that functions as a typical broadcast media outlet, with a very high in-count, yet a very low out-count (has many followers, and follows almost none). Whats interesting here is that its community of followers includes a number of key players, who themselves have a fairly large audience. This seems to have been an important source of information from the ground in Tunisia.
SBZ_news

What Next?

This post is merely touching the tip of the iceberg. There’s still so much that can be understood by slicing and dicing this data. As we start to grasp the power of Twitter as a worldwide information diffusion network, we must build tools that help analyze the structures that enable information to flow.

13 comments to #Sidibouzid Twitter Hashtag: an analysis of the people spreading the news

Leave a Reply