A few days ago I published an in depth analysis of Apple’s iTunes top free chart algorithm, boosting, rank manipulation and algorithmic glitching – on medium.com.
Here’s the overview:
On October 29th and December 18th, 2014, something very strange happened to the iTunes top apps chart. Like an earthquake shaking up the region, all app positions in the chart were massively rearranged, some booted off completely. These two extremely volatile days displayed rank changes that are orders of magnitude higher than the norm — lots of apps moving around, lots of uncertainly.
If you build apps for iOS devices, you know that the success of your app is contingent on chart placement. If you use apps on iPhones and iPads, you should realize just how difficult it is for app developers to get you to download their app. Apple deploys an algorithm that identifies the Top Apps across various categories within its iTunes app store. This is effectively a black box. We don’t know exactly how it works, yet many have come to the conclusion that the dominant factor affecting chart placement is the number of downloads within a short period of time.
If a bunch of people all of a sudden download your app, you climb up the charts, and as a results, gain significant visibility, which results in many more downloads. Some estimate that topping the charts may lead to tens of thousands of downloads per day.
Encoded within the iTunes app store algorithm is the power to make or break an app. If you get on its good side, you do really well, and if not, you lose.
If these volatile days are deliberate, shouldn’t we be informed? There are over 9 million registered developers who have shipped 1.2 million apps into iTunes. Algorithmic glitches on wall street can set off hundreds of millions of dollars in losses. What’s the dollar cost to entrepreneurs affected by these iTunes glitches? These are people who pour countless hours and resources into adding value to Apple’s ecosystem. Whether running experiments or A/B tests, shouldn’t Apple show due respect by taking issues like this seriously?
While the app store’s ranking algorithm is opaque, there’s much to be learned by looking at it’s output over time. In his work on Algorithmic Accountability, Nick Diakopoulos highlights ways to investigate the inner-workings of algorithmic systems by tracking inputs and outputs.
Analyzing this type of data gives us a way to hold accountable systems of power, in this case, Apple and its algorithm.
Perhaps Apple is not aware of these glitches? Or maybe my data is flawed? I’ll let you be the judge of that. I did manage to find another person complaining about abnormal chart rank fluctuation around the same time. If you’ve witnessed something similar, please add a note or get in touch.
Read the full piece here.
Had the pleasure to be a part of this BBC Click episode. Check it out:
My interview with Elaine Ellis for the Gnip Blog data stories series went live today. If you’re interested in how we’re thinking about using data within the product development cycle at betaworks, definitely give it a read. [The TL;DR version: best data science gig evah! ]
Here’s a recording of my DataGotham’13 talk:
I had the pleasure to meet BBC’s Matt Danzico last week and chat about measuring the effectivity of online activism. We (unsurprisingly) chatted about Twitter data, which gives us the ability to identify dense clusters of users who are actively participating in the observed event. In the case of the #equality hashtag, heavily used last week during the height of the marriage equality / prop 8 supreme court hearing, a number of distinct clusters of users emerge. For example, the close-knit group of users illustrated by the light green cluster in the bottom left portion of this graph represents a community of users who are all somehow affiliated with Lady Gaga’s Born This Way foundation. Both the official No on 8 campaign and the Human Rights Campaign twitter handles are central to the network of users posting to the hashtag and the campaign at large. Perez Hilton is also clearly a central figure within this community.
I can’t figure out how to embed the video, but feel free to see it here!
Since this Pew report came out, researchers and journalists in my circles have been trying to untangle what it actually means. One of its’ interpretations is that Twitter is full of haters. Another reads that Twitter is a mainstream liberal but a conservative wonk (Srsly?). The notion that opinions raised on Twitter are biased since the population of active users on the network is not representative of the general public makes a lot of sense. In the report, Pew researchers monitored opinions on Twitter across a number of political events during 2012 using Crimson Hexagon’s sentiment analysis service. At the same time, they ran national polls for sentiment around the chosen events. While they conclude that ‘Twitter reaction to events (is) often at odds with overall public opinion’, it seems like what they actually prove is that Crimson Hexagon’s (CH) sentiment analysis method for Twitter doesn’t reflect public opinion… and therefore is meaningless for assessing public sentiment on Twitter.
These two conclusions are very different. The former suggests that Twitter cannot be used at all to assess opinions, while what I’m suggesting is that language-based statistical models from Tweets in aggregate will not provide meaningful results when evaluating general public sentiment around an event. If context around users is not taken into account, specifically their historical propensity to respond to a topic, as well as their positioning within the network, we lose the ability to gain interesting insight from data coming from social networked spaces. Let me explain.
I spent some time looking into CH’s documentation online, and while I feel like I have much better handle on what they do, I’m still partly guessing, as much of the meat counts as “proprietary algorithms”. From my understanding, CH gets access to the Twitter firehose, then for every project, a series of keywords and phrases contained within a time period are chosen, and all tweets where the keywords appear are extracted as the observed dataset. All non-English tweets are then filtered out (not sure exactly how this is done, and what happens with mixed language Tweets and/or slang). The chosen period seems to vary, from several hours to multiple days. For the Pew studies over the past year seems like the chosen time periods vary for every every observed event.
In the next step, sentiment-related “assertions” are identified across all tweets. This is most likely done using a pre-existing dictionary of words and phrases that are based on a manual classification (labeled dataset) of tweets from the past. Then a random sample of assertions from the observed event is manually labeled. This is used to train a classifier which then runs across the whole dataset of assertions, breaking them down into 4 bins: positive, negative, neutral (informational) and irrelevant.
I’ve seen this type of methodology work well for long-form text, but have yet to see interesting results come out of Twitter data. Some of my main concerns are outlined below:
1. We cannot expect to assign sentiment correctly 100% of the time. Even humans often disagree about the sentiment of text:
- Twitter is spoken, non-homogeneous language that is constantly evolving. It is very hard to train a classifier to accurately represent a single model for “language” on Twitter.
- Cultural differences (“sick” – positive for some, negative for others).
- Sarcasm and innuendo (“old men and women” – Chomsky’s constructional homonymity) – this is a crucial problem, especially around political content.
- How are hashtags dealt with? Many of them, especially around political events, are words never previously seen by a model.
- How does CH know that a person is situated in the US? They filter out English content, but this doesn’t necessarily mean the person is located in the US.
2. There’s no user context around the tweets:
- What if a user tweets multiple times during an event? This should be taken into account, as it highly skews the results.
- What about retweets? They are an interesting signal reflecting user opinion, but there are many reasons why someone might retweet a message, and repetitive retweets from a single person reflecting effectively a single opinion should not be counted multiple times.
- Why do we think that the general sentiment of an event is simply the sum of sentiment in individual tweets? At the end of the day, the data is coming from people. If we don’t understand who these users are, especially how they’re interconnected and what a meaningful sample would be, the sum of all tweets is meaningless.
3. If a service claims a 97% accuracy rate, especially on a problem that’s not considered solvable, you should be highly suspicious. How do they define accuracy here? Based on how their models classify their trained dataset? We have to be very careful here.
Given that this is all proprietary technology makes it incredibly difficult to critique. I’ve yet to see Twitter sentiment analysis results that are actually meaningful – highlighting interesting, important and timely insight. Topsy claims that with #Twindex they accurately predicted election results for almost all US states. I haven’t played around with CH or Topsy as they are quite pricey. But I have spent the past couple of years working on insight from Twitter data, and while it is straightforward to identify extremely positive and extremely negative posts, the majority of content tends to land somewhere in the middle. From my experience, using natural language-based models on Twitter data without any context around the users or the observed event will not bring sufficiently valuable insight.
Is it totally useless? I don’t think so. There’s value in understanding amount of buzz around an event, especially if there’s enough data about what normal behavior looks like. Yet IMHO, some of the most interesting insight can come from taking a networked approach to analyzing user response to an event.
For example, on election day 2012, we saw over 100,000 users who self reported their vote, literally tweeting out “I voted for …”. Obviously we know that the network is biased and that there are many more young liberals using Twitter (as noted by Pew’s latest study on the demographics of social media users). We can account for this bias, but additionally, we can start to identify communities of users: teenagers in Florida, moms in Ohio, media professionals across the US. Communities are inherent to the organizational scheme of interest-based social networks such as Twitter. By getting more context on how heavily each community is involved within an observed event, and sampling accordingly, we may be able to significantly improve the way we gage the general public opinion through the lens of Twitter.
Does this approach better align with the formal polls? I have no idea.
But a network-based approach may help give us important context around event polling – a smarter way to sample user data coming from social networks. In any case, we need to continue to experiment in this space and be much more critical of what we’re told by companies who promise algorithmic accuracy.
Some related links:
- A recent example of community cluster analysis looking at a specific event – the Harlem Shake.
- Watch out for Marshall Kirkpatrick’s new startup – Little Bird - an interesting actor in this space.
- Alex Johnson, Journalist at NBC, blogs about his usage of Crimson Hexagon’s sentiment analysis technology.
This entry is cross-posted on Huffington Post.
If you still have not heard of the Harlem Shake you must be living in a cave. Much has been written about the rapid and global spread of this catchy internet meme, yet little is understood about how it spread. In the following post, we look at the meme’s emergence through the lens of Twitter data. A series of remixed videos along with a number of key communities around the world triggered a rapid escalation, giving the meme widespread global visibility. What can we learn from data? Who were the initial communities behind this mega-trend? Who were some of the trend-setters, and what did the Jamaica techno-DJ scene have to do with this?
The Harlem Shake is a dance style born in New York City more than 30 years ago: “During halftime at street ball games held in Rucker Park, a skinny man known in the neighborhood as Al. B. would entertain the crowd with his own brand of moves, a dance that around Harlem became known as ‘The Al. B.'” Though it started in 1981, the Harlem shake became mainstream in 2001 when G. Dep featured the dance in his music video “Let’s Get It”.
While mining Twitter data, references to Harlem Shake (the original dance) were seen quite often prior to it becoming a popular meme. For example, users would post Tweets referencing the dance in the following manner:
There are numerous examples of Tweets using the phrase in a similar context (here are a few examples), many of them using the * character as an emphasis. Kimberly Ellis, a Scholar of American and Africana Studies, claims that this type of language is being referenced via cultural memory. And users are very dramatic, hence they place “action items” in tweets:
When someone tweets, “I just passed my final exams! *harlem shakes*,” it’s the equivalent of saying “I just passed my final exams! Look at me dancing!” As you can see, the Harlem Shake of cultural memory is SO energetic, recalling the visual in a tweet makes it all the more hysterical and another shared, cultural moment for African Americans on Twitter.
While Bauuer’s now infamous track was released on Diplo’s Mad Decent label back in August 2012 (posted to YouTube on August 23rd 2012), it only accrued minor visibility for the first few months. Then February hit, and something changed.
The timeline below highlights the very first days as the meme was taking off. In blue, we see references to the 1980’s dance *harlem shakes*. Note the diurnal pattern, rising and falling steadily on a daily basis. In contrast, the green curve represents Tweets that use the phrase ‘The Harlem Shake’, many of them linking to one of the first three versions of the meme on YouTube.
On February 2nd, The Sunny Coast Skate (TSCS) group establish the form of the meme in a YouTube video they upload. On the 5th, PHL_On_NAN posts a remix (v2), gaining 300,000 views within 24 hours, and prompting further parodies shortly after. On Feb. 7th, YouTuber hiimrawn uploaded a version titled “Harlem Shake v3 (office edition)” featuring the staff of online video production company Maker Studios. The video becomes is a hit, amassing more than 7.4 million views over the following week, and inspiring a number of contributions from well-known Internet companies, including BuzzFeed, CollegeHumor, Vimeo and Facebook.
In a video interview, Vernon Shaw, Channel Development Coordinator at Maker Studios (produced v3), claims that he spotted the first two versions on Reddit. It was evident that a form was emerging, and after v2 accrued 100k views, it was clear to him that this was the “pre-viral” stage. Vernon attributes Reddit for being first to highlight the remixes, claiming that “you can tell when a trend is about to start by catching it on Reddit first… a day or two ahead of Facebook”.
Here’s a graph that shows retweets during the first week, as the meme was being established. We can identify dominant profiles who helped make the videos visible on Twitter, key information brokers. Each node represents a Twitter user, and the larger a node, the more Retweets that user generated when posting to the meme. The lighter colored participated earlier, hence we see @baauer, @dipio and @maddecent very early on, posting to Twitter and accruing Retweets. On the bottom right region, we identify influential YouTubers who were key to passing on the meme, such as @kingsleyyy, @KSIOlajidebt, @ConnorFranta, and @Jenna_Marbles. Note the general size of these profiles versus @StephenAtHome (Colbert) or even @YouTube. These influential YouTubers clearly played a prominent role in generating buzz across Twitter, much more than significantly larger accounts such as Stephen Colbert’s or YouTube itself.
Next, instead of mapping out Retweets, we look at the social connections amongst users who were posting to the meme. This gives us the ability to identify the underlying communities engaging with the meme at a very early stage. In the following graph each node represents a user that was actively posting and referencing the Harlem Shake meme on Feb 7th or 8th to Twitter. Connections between users reflect follow/friendship relationships. The graph is organized using a force directed algorithm, and colored based on modularity, highlighting dominant clusters – regions in the graph which are much more interconnected. These clusters represent groups of users who tend to have some attribute in common.
One of the most dense clusters includes @baauer, @diplo, @maddecent and other DJs and musicians. They are clearly a core community who were posting the meme early on. We identified this clearly in the previous Retweet graph. In red and green (top, right) we see regions of the graph highlighting various YouTube communities. These are users whose dominant web identity is their YouTube page. While many of them have Twitter handles, they all link to their YouTube page as a primary identity, while many describe themselves as ‘YouTubers’. We see a dense Brazillian user community (right), Jamaican rappers (top center-left), cape town (bottom) and users from Paris, France (bottom center-left). In the center, there are accounts such as BroBible and theBERRY/theCHIVE who were one of the first new-media outlets to identify the meme as interesting.
The purple region in the graph (left side) represents African American Twitter users who are referencing Harlem Shake in its original context. There’s very little density there as it is not really a tight-knit community, but rather a segment of users who are culturally aligned, and are clearly much more interconnected amongst themselves than with other groups.
If we run a similar analysis on the following two days (Feb 9th and 10th) we see different communities emerge, and a much more tightly knit graph structure:
While the same dense cluster of musicians and DJs (in turquoise) still exists, there are substantially more self-identified YouTubers both across the US and the UK. At the same time there’s a significant gamer / machinima cluster that’s also participating, as well as a growing Jamaican contingent, and quite a few dutch profiles (purple – left). Additionally, we see various celebrity and media accounts who caught on to the meme – @jimmyfallon, @mashable and @huffingtonpost.
By capturing the two snapshots, we can also make sense of the evolution of the meme as it becomes more and more visible. At first, loosely connected communities separately humored by the videos. Within days, we see major media outlets jump on board, and a much more intertwined landscape. We see different regions in the world light up, and identify communities of important YouTube enthusiasts who effectively get this content to spread.
In this case we see a clear network of influential YouTubers across the US and the UK combined with a dense cluster of musicians and DJs who helped make this meme incredibly visible. We also see how it very quickly spread around the world, with dense contingents in Jamaica, South Africa, Brazil, France and the Netherlands. By comparing two snapshots in time, we literally see the difference between an emergent trend amongst loosely connected interest-based communities, to a dense more-connected cluster where digital-media outlets do significant amplification.
Memes have become a sort of distributed mass spectacle. Culture is being created, remixed and reinforced within social networks, and memes are becoming a mechanism that both capture people’s attention, and define what is “cool” or “trendy”. We see more and more companies and brands try to associate themselves with certain memes, as a way to maintain a connection with their audience, gain the cool factor. Pepsi did this with the Harlem Shake and saw an incredibly positive response. As we get better at identifying these trends and trend-setting communities early on, the pressure to participate will rise.
As social networks become globally-intertwined, we’re witnessing a growing number of memes conquer the world at large. These moments are critical points in time, where there are significant levels of attention given towards a specific entity – be it a joke, funny video or a political topic. Piecing together data from social networks can help us identify critical points in time, as well as the underlying communities and trendsetters for the humor-based memes, or the agenda setters for politically-slanted ones.
The analysis is based on 1.9 million Tweets collected between February 1st and 16th, all referencing variations of the phrase ‘harlem shake’.
I had the honor to participate in Harvard Law School’s behavioral economics and social media conference, organized by Cass Sunstein last week. Scholars from across Harvard along with folks from Facebook, Twitter, Microsoft Research and SocialFlow discussed important trends around social media, theory and practice and its potential to help us assess behavioral change. As part of the ‘theory and practice’ session led by Yochai Benkler, I presented alongside Facebook’s Eytan Bakshy and Sharad Goel of MSR.
Nate Matias, research assistant at the MIT Media Lab’s Center for Civic Media put together a comprehensive writeup of the session. Sean Laurence of Boston Startup School put together full audio of the event here. Following is a crib of my presentation on the promise of realtime data from social networks.
I just moved into a larger apartment in New York City and finally have enough space for a piano. So I did what many do, and start obsessively researching the web for used upright pianos. From Craigslist to Google to rental stores, the task is actually quite difficult given the variety in types, sizes and prices.
It didn’t take long before piano ads started following me around the internet. As I consumed the news, I saw ads for Yamaha pianos. When I went to YouTube, ads for Steinway. Even when reading my daily Mashable quota…, more piano ads. Following me around as I browse the web, regardless what I was doing, making me feel terrible for being that indecisive procrastinator who can’t seem to make up his mind.
My anger at the ads quickly turned into pity. Faust Harrison Pianos were clearly users of the latest in digital marketing strategy wonders, buying against user behavior stored in cookies within people’s browsers. I must’ve clicked on their website at some point in time, and since then, my browser has a cookie that signals my interest in acquiring a piano. True the intent is there. But believe it or not, it is not the only thing I think about throughout the day. The last thing I’d want is to be reminded every minute of every day that I still have to make this decision.
As ads attempt to become more “relevant” either by matching to our browsing history or to friend association, they are doing more harm than good if they do not understand the user’s context, and more importantly what someone is willing to be attentive to. Intent used to be the biggest buzzword around search engine conversations. Back in the day, the thought was that If we could identify someone’s intent we could present them with relevant information. They got that right with my search. But where these ads completely failed at was understanding my context as well as my personal psychology around purchasing. Its been over 10 years since Google innovated and changed the world of advertising. Is cookie-based ad targeting *really* the best we can do?
Enter Social Media.
So much has been written, discussed and examined about the shift that we’re seeing with the popularity of social networked spaces. The networked nature of these spaces mean that our old ways of dealing with audiences has got to change. Power has to be renegotiated, and in many cases doesn’t come top-down, but rather from loosely connected points across the network.
In order for information to spread, people along the way must be attentive and choose to pass the tweet or status update onwards. As the threshold to publishing content nears zero, getting people to be attentive has become a scarce commodity. One cannot demand or even expect someone’s attention at any given point in time. As James Gleik puts it in his seminal book, The Information, “When information is cheap, attention becomes expensive“.
The following plot does a great job at expressing how attention quickly shifts within social networked spaces. The green line represents the number of tweets over time that had the word ‘Superbowl’ in them while the blue, the word ‘power’. This is measured over time across all publicly posted Tweets between February 3rd and 4th. Note the clear switch that happens when the Superdome goes dark. Attention shifts from the game which abruptly stops, to focus on the fact that half of the stadium loses power.
What evolves online, is the poster child example of how realtime information can be used to inform marketing campaigns. It took minutes for Oreo to come up with an innovative advertisement in response to the blackout, which got them a significant level of visibility (16k retweets and 6k favorites so far only on Twitter). Twitter reported that it took just 4 minutes for someone to buy promoted tweets against searches for the phrase “power outage”. Other brands quickly responded as well, catering to the millions of sports fans who were following the chain of events happening in the stadium. Having flexibility and changing the frame to what people were attentive to, the power outage, clearly paid off.
We see these kinds of attention shifts happening all the time, whether affecting a wider region of the network, or a localized audience.
Using information from social networks can help us understand the context switches happening amongst audiences and generally within populations in realtime and over time. What people are attentive to and how that changes over time. In a study that Suman Deb Roy, our summer intern from last year defined and measured what he called audience volatility – the frequency of change in topics at the focus of an observed group of users. The higher the volatility of an audience, the less focused it is, as there’s a wide array of topics at play. The lower the volatility score, the more focused an audience.
For example, when we measured the volatility in Twitter’s trending topics across different cities we could see clear peaks and troughs in volatility. Remember, the higher the graph, the more volatile the trends within that city. Whats fascinating about this plot is the lowest point marked with an arrow. This happens around the second week of March, 2012, and represents a point of heightened focus across all major cities in the United States.
The lowest point on this graph is the day that Invisible Children launched their #Kony2012 campaign. This is the point of lowest volatility / maximum focus, showing just how good that campaign was at capturing people’s attention in all major cities across the United States.
Why is all this important?
This is the first time that we can clearly identify spikes in user attention, what groups of people are focused on, in realtime, and over time. We don’t have to wait for market research and poll results, but rather we can plug into this information. Additionally, we have a way to quantify these shifts, seeing just how much effect real-world events have on groups of people online, how much focus they choose to devote to said event.
As we get better at understanding of user interaction within social networks, we’ll get a more holistic view of whats going on. While there is still benefit in planning campaigns and taking the time to think through their design, social networked spaces bring with them the hope for a more nuanced understanding of user behavior, intent as well as context.
Maybe soon advertisements will stop following us around the web, and pop up in the right context, at the right time.
Am I too optimistic? Maybe. But I still want to get that piano!
Questions, thoughts? Feel free to ask me on Twitter: @gilgul