Webometric Thoughts

Sunday 4 April 2010

The 4am Project

Social media is great for bringing a diverse set of people with similar interests together for a particular project. An excellent example of which has been Karen Strunks extremely successful 4am Project:

The aim of the 4amproject is to gather a collection of photos from around the world at the magical time of 4am. Everyone can take part and join in! All you need is a camera. We want to see what you see at that moment in time on that one day. What’s your view at 4am?

Obviously, as a man who needs at least eight hours sleep a night, my view is that I should have been fast asleep dreaming of unicorns or some such tosh. However, my girlfriend had other ideas. Although I contemplated sending her out into the night with hundreds of pounds worth of photographic equipment to confront the last drunken stragglers staggering home from the pubs and clubs of Wolverhampton, I knew everyone would blame me if she ended up mugged or dead in a ditch (however misplaced such blame would be).

It's been about 15 years since I was serious about photography: with multiple lenses, filters, films, and access to a dark-room. 4am didn't strike me as the best time to start again, so I went along purely in the role of observer - with the exception of 'twitpicing' a single photo from the worst camera-phone in the world at 4am:

The world is very different at 4am, and all in all it was a pleasant stroll around Wolverhampton's West-End:

View 4am Project in a larger map
Without a doubt, the most interesting - and least tiring - part of the day has been watching some of the other pictures, posts, and films that have been put online throughout the day.

-Lee Allen's video of other 4am participants in Wolves
-4am Project Flickr Group
..and of course...
-My girlfriend's view of the world at 4am

Labels: 4amproject

posted by David at 19:16 | 1 Comments

Thursday 18 March 2010

Welcome StumbleUpon - and other members of my recent spike

Unsurprisingly, my Webometric Thoughts aren't massively popular. There are few people who start the day checking the BBC, the Guardian, and then Webometric Thoughts. However, over the last few days my traffic has gone through the relative roof, from a steady 100 unique visitors a day, on Tuesday it leaped to 602!

Way beyond the previous high of 262. The reason: For a brief moment I was the TechCrunch pin-up boy, thanks to my (now-very-old) QR code T-shirt - nb. it goes without saying that this rather large company that clears $200,000 a month (according to Wikipedia) didn't bother asking my penniless permission.

What's particularly interesting is that hardly any of the traffic has come directly from TechCrunch, in fact only 112 of the visits over the last three days. Instead the traffic has been mostly a massive surge of visits to my home page from StumbleUpon. I'm not sure why, but nonetheless - Hello Stumbleupon Users *waves*

Labels: Stumbleupon, techcrunch

posted by David at 10:57 | 1 Comments

Tuesday 9 March 2010

How bad is Chatroulette?

Everywhere I turn at the moment there seems to be a story about Chatroulette.com. Press a button and you are in a random video chat with a stranger somewhere else in the world. Unsurprisingly it is painted as the latest sign of the world going to hell in a handcart: "Who will protect the children?"

As a particularly unsocial social media researcher I decided to do a quick quantitative study of first impressions of the people I came across on the site: clothed or naked/obscene, male or female. As I didn't particularly want to engage with anyone, but needed to put the web cam on to encourage the broadest cross-section, I set it up for Mr Shifter:

Results
Out of 100 web cams in which the subject was identifiable.
79% were men.
5 contained more than one man.
11 were obscene.
10% were female.
2 contained more than one woman.
1 was obscene.
2% were mixed sex groups
9% were objects- mostly signs saying "show me you boobs".
In addition, I also came across one camera supposedly of a man who had just hung himself...I wasn't too sure where to place that one.

So what did I find out? The world is mostly just looking to talk, there's some weirdos out there, and one bloke who wanted to see the monkey dance...and was thrilled when he obliged.

Labels: Chatroulette, Mr Shifter

posted by David at 22:15 | 0 Comments

Academic Search Engine Optimization: An inevitable evil?

The money available for public science is finite, and it is understandable that governments want to get value for public money spent, and show the value in the form of bibliometric and webometric indicators. Unfortunately scientists are far from perfect, and the indicators and metrics that are meant to reflect the merits of an academic's work can quickly become the focus of the academics work.

I've just finished reading Academic Search Engine Optimization (ASEO): Optimizing Scholarly Literature for Google Scholar & Co. (via @research_inform), which gives advice on making sure your journal articles are indexed and highly ranked by academic search engines (e.g., Google Scholar). There are numerous points I disagree with on both an ethical and a practical level:

"...tools that help in selecting the right keywords, Google Trends, Google Insights, Google Adwords"
"Synonyms of important keywords should also be mentioned a few times in the body of your text, so that the article may be found by someone who does not know the common terminology used in the research field."

When I write an academic paper my primary audience is academics in my specialised field, not the wider public that are likely to use different vocabulary and dominate services like Google Trends by their shear numbers. As an academic reading a paper I wouldn't appreciate the introduction of inconsistency and ambiguity through the use of synonyms, which are necessarily near-synonyms in the precise scientific world.

"..to achieve a good ranking in Google Scholar, many citations are essential. Google Scholar seems not to differentiate between self-citations and citations by third parties."

Self citation has always been rife and needs little encouragement. Later they state that "...any articles you have read that relate to your current research paper should be cited"; although surely discretion is an important factor unless we are going to shoe-horn in crap and further exaggerate the Mathew effect of the high ranked papers.

"...publish the article on the author's home page...an author who does not have a Web page might post the article on an institutional Web page"

Ignoring the curious turn of phrase, the general consensus is that the vast majority of academics should publish in their institutional repository irrespective of whether they have their own web site. The institutional repositories should have the procedures in place to ensure long-term archiving.

"...an article that includes outdated words might be replaced by either updating the existing article or publishing a new version on the author's web site."

As the authors acknowledge "...it may be considered misbehaviour by other researchers." At last we have a point we agree on.

As you have probably guessed from the above criticisms, I thought that the article was a piece of crap. Academic SEO should in no way effect how you write an academic paper, or the subjects we choose to write about. Unfortunately academic SEO is a topic that is likely to get a lot more attention amongst bad scientists if another practice I recently heard of takes off: Paying academics bonuses per article. A colleague told me last week how his former university had a pot of money from which academics were paid €4,000 (split between the number of authors) for articles published in certain 'quality' journals. It is a small step to start paying individuals for articles that reach a certain threshold of citations, at which point we will have finally dumbed-down science.

"Researchers need to think seriously about how to get their articles indexed by academic search engines" - No, they need to think seriously about doing worthwhile research and writing quality publications. If your focus is on SEO then you are in the wrong field.

Labels: Academic SEO, REF, Scientific publishing

posted by David at 14:01 | 4 Comments

Friday 5 March 2010

A quick SPARQL of Dbpedia.org says I'm past it!

I've spent the last couple of days having a play around with some of the Linked Data that is increasingly being made available online - data that is made available through dereferencable URIs. One of the most interesting sources is Dbpedia.org, a project that extracts structured data from Wikipedia. Whilst it suffers from a lack of consistency, its crowd-sourced nature potentially offers unique insights into the nature of society (or at least the world as wikipedia users see it).

Today I downloaded a list of all the pages of people in dbpedia with dates of birth in the 20th century. Requests were sent using the SPARQL query language - with only one month requested at a time as dbpedia only provides the first 1,000 results for each query.

SELECT DISTINCT ?page ?dob {
?s foaf:page ?page.
?s ?dob .
Filter (?dob >= "1900-01-01"^^xsd:date) .
Filter (?dob <= "1900-01-31"^^xsd:date) . } Limit 1000

[run query at dbpedia]

It's not particularly surprising to find that in the current celebrity obsessed world there are more wikipedia-famous people towards the end of the century than at the beginning, and that there are relatively few people under the age of twenty.

At 35 it would seem as though my best years for getting my own wikipedia page are behind me - although as I was never counting on my sporting prowess, there is probably still a chance.

The real power of Linked Data comes not from these data sets in isolation, but investigating how they link together...but you have to start somewhere.

Labels: dbpedia, Linked Data, SPARQL

posted by David at 17:33 | 0 Comments

Thursday 4 March 2010

Microscopes and Micrographia

My home office is increasingly turning into a home lab: circuit boards, sensors, switches, wires, wire cutters, soldering iron, even a robot. My latest acquisition is a USB digital microscope with 200x magnification. I've been tempted by the thought of a USB microscope for a while, and whilst there are more powerful microscopes out there, at £29.99 it would have been churlish not to give this one a go.

Unbeknown to the Maplin's sales assistants, their sale was made that much easier by the fact I am currently reading Lisa Jardine's The Curious Life of Robert Hook. The man who through his Micrographia (1665) showed the world at large the hidden details they had never seen before. Painstaking drawing by hand the objects he placed under his slides.

Today the man on the street can pick a USB miscroscope of the shelf, and within minutes share his close-ups of the world. It remains to be seen however, whether it will encourge a generation of entomologists, or navel gazers.

Labels: entomology, Microscope, Robert Hooke

posted by David at 09:52 | 0 Comments

Friday 22 January 2010

Semantic Webometrics - A few thoughts

The other day an academic colleague asked what I was working on at the moment, in my answer I included - semantic webometrics - unsurprisingly he wanted some more detail. However 'working on' would be a bit of an exaggeration, 'have a few ideas but nothing on paper yet' would have been more appropriate. As such I thought I'd write down some of my rough thoughts on semantic webometrics.

Webometrics
For those who may have stumbled upon this blog from a non-webometric background, Webometrics as defined by Björneborn (2004), and as used by most of the webometrics community, means the:

...study of the quantitative aspects of the construction and use of information resources, structures and technologies on the Web drawing on bibliometric and informetric approaches.

Many of these quantitative studies have focused on hyperlinks. For example, investigating whether there is a correlation between a university's inlinks (a.k.a. backlinks) and a university's research ranking, or whether the interconnectedness of organisations in a region (as seen through interlinking web sites) can give an indication of a region's level of innovation [outrageous self-citation].

One of the problems with many of these link-analyses is that they include a lot of noise. For example, when counting a university's inlinks you will be counting both those from an academic highlighting a university's quality research, and those from the disgruntled student highlighting his most hated tutor. Traditionally we have tried to understand the extent of this noise through large scale content analysis - the extremely tedious manual classification of web links and web pages.

The semantic web
A semantic web is one where information on the web is structured so that it is meaningful to computers. Well known examples of the semantic web include FOAF ontology allowing people to express the relationships with one another (e.g., the FOAF of Tim Berners-Lee) and the use of microformats for certain types of structured content including contact details (as included at www.davidstuart.co.uk) and reviews (which are now indexed by Google as Rich Snippets). This extra information information can be used to reduce the amount noise and enable meaningful webometric studies.

Semantic webometrics
So when I say semantic webometrics I mean - webometric studies that make use of the additional information included in an increasingly semantic web.

For example, a semantic webometic study of the connection between an institution's inlinks and research ranking would take into consideration who had placed the links and the attributes that they had associated with them. A semantic webometric study of the relationships between organisations would look at the explicit relationships contained in FOAF files as well as the implicit information on web pages.

Conclusions
Unfortunately there is relatively little semantic information embedded in the majority of web pages/sites, and where it is widespread, e.g., with the nofollow link attribute, webometricians have yet to develop the tools to make use of them.

As such we need to take an information-centred approach to semantic webometric research rather than a problem-centred approach. Whilst still small, there is an increasing amounts of semantic data being embedded in the web all the time, webometricians need to investigate what is available and how they can use it.

Labels: semantics, webometrics

posted by David at 09:37 | 1 Comments