The thoughts of a web 2.0 research fellow on all things in the technological sphere that capture his interest.

Sunday 17 February 2008

What's Everyone Twittering About?

Whilst I am not personally a big Twitter fan, I am interested in discovering what people are Twittering about and how the posts differ from other forms of communication. With such thoughts in mind I started my first tentative Twitter steps this evening.

Adapting an open source RSS feed reader I set about downloading the public timeline (http://twitter.com/statuses/public_timeline.rss), for which Twitter has no restrictions on the number of requests that you can send. Whilst the original plan was to download an hour's worth of data for a small pilot investigation, unfortunately I had to stop after about 45 minutes when I received Http 502 Status Code ('Twitter is down or being upgraded' rather than 'exceeded the rate limit').

The first post that was downloaded was numbered 723435732 (just after 7pm), whilst the last was numbered 723547592 (about 45 mins later). As the last number seems to be superfluous, there were a potential 11,186 posts to be downloaded, of which 6,422 posts were successfully downloaded. Many of the 'missing posts' will have been private, whilst others may have been missed due to delays in sending and receiving the RSS feed.

I have not, as yet, had time to do anything more interesting with the collected data than look at the frequency of terms using Text-Stat. So in true informetric style, here is the log-log graph of word frequency in rank-order:

Most noticeable in the frequency data is:
-Over 58% of twitter links are via tinyurl: 'http' appeared 588 times, 'tinyurl' 343 times.
-Twitterers are generally a polite bunch. The more 'popular' swear-words don't appear that often, in 6,422 posts: shit (11), fuck (6), & cunt (zero). Admittedly a large proportion are not in English and the are a few variations on the words, but nonetheless I probably swear more than all these people in my average email.
-And they are not celeb-obsessed: Britney only gets three mentions, whilst there is no word on mention of Winehouse. Instead they err on the side of the geek: windows (19), Mac (25), iPhone (20).

As the analysis shows, these are early (childish) days. But hopefully I will have the opportunity, later in the week, to create the tools to investigate the data more thoroughly before downloading a larger sample.

Labels: , , ,

posted by David at

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home