Flickr API: If you don't want to give us the data, just tell us!
Application Programming Interfaces (APIs) are a brilliant way for researchers (as well as commercial developers) to use the data of the big web organisations in new and innovative ways in a controlled and ethical manner. Whilst there are usually limitations, we find ways of working within the boundaries we are set. What is annoying, however, is if you find that the service isn't being particularly honest about the boundaries. This post's wrath is aimed at Flickr's API.
Whilst many API services will limit the number of results you can view, this is usually clearly set out in the documentation. For example, most search engines only allow you to view the first thousand results. Flickr however allows you to keep calling results, only to start sending back repeated pages of results for anything over 4,500. This can be clearly seen in the two pictures below from the Flickr API Explorer for flickr.photos.search. The first shows a partial screenshot of the results for the ninth page of 500 results for the tag 'web':
The second shows a partial screenshot of the results for the tenth page of 500 results for the tag 'web':
Basically the same results with a different page number.
I wouldn't mind the restrictions if they were clear. Whilst it may be stated in the small print somewhere, which I still haven't seen, why would you send the same data again and again and claim it as different pages of results? It is still possible to collect all the results by using some of the other arguments, e.g., min and max upload dates, it just means that I had to waste numerous hours collecting data again when the problem came to light. Flickr now owes me one Saturday.
This serves as a useful reminder to all web researchers: Make sure the API is giving you the data it is claiming to give you.
1 Comments:
I ran into this and found your page while looking for commentary on it. Then in one of my experiments I noticed that going in through the normal web search interface, if I try to go to any page from 188 on up, I get a page saying no photos were found for that search criteria. 188*24 images per page is 4512...so it's not just the API.
19 June 2009 at 02:08
Post a Comment
Subscribe to Post Comments [Atom]
<< Home