Webometric Thoughts: January 2010

Friday, 22 January 2010

Semantic Webometrics - A few thoughts

The other day an academic colleague asked what I was working on at the moment, in my answer I included - semantic webometrics - unsurprisingly he wanted some more detail. However 'working on' would be a bit of an exaggeration, 'have a few ideas but nothing on paper yet' would have been more appropriate. As such I thought I'd write down some of my rough thoughts on semantic webometrics.

Webometrics
For those who may have stumbled upon this blog from a non-webometric background, Webometrics as defined by Björneborn (2004), and as used by most of the webometrics community, means the:

...study of the quantitative aspects of the construction and use of information resources, structures and technologies on the Web drawing on bibliometric and informetric approaches.

Many of these quantitative studies have focused on hyperlinks. For example, investigating whether there is a correlation between a university's inlinks (a.k.a. backlinks) and a university's research ranking, or whether the interconnectedness of organisations in a region (as seen through interlinking web sites) can give an indication of a region's level of innovation [outrageous self-citation].

One of the problems with many of these link-analyses is that they include a lot of noise. For example, when counting a university's inlinks you will be counting both those from an academic highlighting a university's quality research, and those from the disgruntled student highlighting his most hated tutor. Traditionally we have tried to understand the extent of this noise through large scale content analysis - the extremely tedious manual classification of web links and web pages.

The semantic web
A semantic web is one where information on the web is structured so that it is meaningful to computers. Well known examples of the semantic web include FOAF ontology allowing people to express the relationships with one another (e.g., the FOAF of Tim Berners-Lee) and the use of microformats for certain types of structured content including contact details (as included at www.davidstuart.co.uk) and reviews (which are now indexed by Google as Rich Snippets). This extra information information can be used to reduce the amount noise and enable meaningful webometric studies.

Semantic webometrics
So when I say semantic webometrics I mean - webometric studies that make use of the additional information included in an increasingly semantic web.

For example, a semantic webometic study of the connection between an institution's inlinks and research ranking would take into consideration who had placed the links and the attributes that they had associated with them. A semantic webometric study of the relationships between organisations would look at the explicit relationships contained in FOAF files as well as the implicit information on web pages.

Conclusions
Unfortunately there is relatively little semantic information embedded in the majority of web pages/sites, and where it is widespread, e.g., with the nofollow link attribute, webometricians have yet to develop the tools to make use of them.

As such we need to take an information-centred approach to semantic webometric research rather than a problem-centred approach. Whilst still small, there is an increasing amounts of semantic data being embedded in the web all the time, webometricians need to investigate what is available and how they can use it.

Labels: semantics, webometrics

posted by David at 09:37 | 1 Comments

Monday, 4 January 2010

Predictions: What are they good for?

At this time of year (or rather a few weeks ago if they weren't drowning under a pile of work) technology bloggers all around the world make predictions about the coming year, and reflect upon the predictions they made the previous year. Looking back on my previous predictions I can't help but realise how slowly the world of technology moves.

Last year's predictions
1. N97 takes Nokia back to the top of the pile. Unfortunately I have only come across one person with an N97 in the past year, Apple and its apps continue to beguile everyone in their path.
2. Distributed social networks will shrink Facebook traffic. Unfortunately Google Wave launched too late in the year, and with too many problems, for it to make any real impact. But the notion of a distributed system has been well and truly planted in people's minds.
3. Project Kangaroo will hit UK desktops.The legal watching of video online is increasing, with new entrants in the market such as Blinkbox, but unfortunately Project Kangaroo fell foul of the Competition Commission.
4. The general public continue to ignore QR codes. Despite my pessimism QR codes have actually started to creep into some unexpected places. For example, the University of Bath in numerous places, including their library catalogue. Whilst they have become more popular than I imagined, they are still ignored by most of the public.
5. No Google alternative will emerge. Yahoo Search closes up shop, Bing has more money than sense, and Google marches on.

This year's predictions-On a similar theme
1. iPhone + Augmented Reality = Increased Market Share. I hate the iPhone because if you want to install anything on an iPhone you have to check it's OK with Apple first, for which they will take 30% cut of the price of the app. Unfortunately the centralised app-store is the reason so many people like it. It simplifies the process of downloading new applications, and as we see an increase in glossy augmented reality mobile applications the iPhone will continue to be perceived as the obvious choice.
2. Google Wave takes off. Despite hating Google, I'm backing Google Wave for two reasons: i) We need something better than email, ii) I really want to see an open distributed system. It still has a lot of teething problems, but nothing that can't be overcome.
3. Project Canvas fails. Project Kangaroo failed because of the complaints of Murdoch, and I'm sure Project Canvas will as well, especially if we see a Tory government after the next election.
4. No change in search. Market share will stay the same and no one will embrace the potential of the wisdom of the crowd. Search strikes me as one of the more antiquated areas of the web, with little real innovation occurring. I think things will start to change in 2011, if the semantic web takes a foothold this year.
5. The year of the Semantic Web. After years of talk, I have the feeling that this could be the one where we start to see the semantic web making an impact both through the opening up of large data sets, and the marking up of web pages with microformats. As someone who is fed up with poking and tweeting, I'm looking to the semantic web to inject a bit of life into the web.

As for Twitter, I don't really care. I'm bored of it now.

Labels: 2010 Predictions

posted by David at 12:07 | 0 Comments

Sunday, 3 January 2010

2009 in Books: 47

Whilst I have little doubt that the web is a wonderful thing, I personally waste a lot of time online reading half-formed, half-baked, off-the-cuff opinions. There are a lot of things that are better said in 300 pages than 140 characters. Unfortunately my mindless clicking online leaves far less room for books than I would like. At a minimum I would expect to read 50 books in a year, unfortunately (thanks to that ever encrouching web) 2009 saw me read a mere 47, or rather, finish 47 books; my shelves are littered with half-read books which if I return to I will feel it necessary to start again from the start.

The work related books: 16
'Work' can be stretched to cover a multitude of subjects that I am interested in, from sociology, through the narrative, to Second Life.

Amazon.co.uk Widgets

Unfortunately some of the work related books are far less enjoyable. Often (although not always) these were the ones that I had offered to review for a journal and therefore have to struggle through to the end.

Amazon.co.uk Widgets

Whilst some books are always worse than others, without a doubt Knowledge Networks: The Social Software Perspective (Premier Reference Source) was not only the worst book I read this year, but one of the worst publishing efforts I have ever seen.

Other non-fiction: 19
There isn't much of a theme to the rest of my non-fiction, although I possible got a bit carried away with books about Samuel Johnson.

Amazon.co.uk Widgets

The one with least merit is The Impulse Factor: Why Some of Us Play it Safe and Others Risk it All; don't even think about buying this book. The keen-eyed wondering what happened to book number 19, it was HOW TO USE BOOKS, I can only presume that it was the lack of picture that mean't Amazon would let me add it to a widget.

The Fiction Books: 12
Curiously my fictional reads of 2009 both started and ended with an Adrian Mole, and there are the usual inclusion of personal favourites such as Grisham and Irving. But beyond that it is a curious selection of odds and ends.

Amazon.co.uk Widgets

Conclusions
Clumped together it looks a slightly bizarre collection, especially the fiction shelves (I believe Mr Majeika was free in a cereal box a previous year), but there again I suppose a lot of people's do. As with every other year I shall resolve to read far more in 2010; maybe I should also resolve to read better books in 2010.

Labels: books

posted by David at 14:36 | 0 Comments

Webometric Thoughts

Friday, 22 January 2010

Semantic Webometrics - A few thoughts

Monday, 4 January 2010

Predictions: What are they good for?

Sunday, 3 January 2010

2009 in Books: 47

About Me

Links

SNS Profiles

Top Tags

Previous Posts

Archives