One of the things that make life easy in Barcelona is its efficient public transportation system. Living in the suburb, every morning I take the train to go to work. When I arrive at Plaza Catalunya, I am almost there but there is still a good 20 minutes walk to go to the World Trade Center. That would be without Bicing, the community bicycle program that allows me to get a bike with my RFID card, cycle down the Ramblas and drop it at one of the stations next to the entrance of the WTC. Well, in theory...

Like many good ideas, Bicing is a victim of its own success. In the morning, a hell of a lot of people use Bicing to commute to work, and as there is quite a lot of people who work in the WTC, the two stations, with merely 60 spaces, are very early full of bikes. The solution comes in the form of vans that patrol the city, collect bikes from the full stations and repopulate the empty ones, to equilibrate the flow. Unfortunately the action of these vans is not fast enough in rush hours, and in the morning it is not uncommon to have to wait for quite some time before the van, awaited as the messiah by a dozen of commuters, shows up.

So I came up with the idea of collecting data about the stations I am interested in as a user. This data would, hopefully, help me predict in a reliable manner when and where I am sure to find some space to park a bike, so as to adapt and optimize my morning routine (that is, which train I should take). Bicing provides on their website a map of the city with the stations and the availability of the bikes in real-time. They are using Google Maps' API to build this map, and although the result is quite fancy, everybody seems to agree it is not really usable because too small (and really slow). There are some alternatives provided by users, more usable but still not quite what I am looking for. Until Bicing decides to provide an open API, let's scrape some data!

To the point directly, the two questions to answer are:

  1. Where to get the data from?
  2. How to get it?
Once this is answered, it is just a matter of writing a quick script that will do the job. Now, here are the answers:
  1. http://bicing.com/localizaciones/localizaciones.php
  2. A regular expression: exml.parseString\('(.*)'\);, and an XML parser
In fact when one browses the map and clicks one station to get the information about the availability of the bikes, the data is not updated real-time. One has to reload the page for fresh data. And all the data is stored as XML in a piece of Javascript, in a call to this exml.parseString method.

I wrote a quick Python script that retrieves the data, parses the XML and populates a list of stations with the available information (name, GPS coordinates, bikes available and free spaces). It is licensed under the GPLv3, well documented and available as a bzr branch on Launchpad at: lp:~osomon/+junk/bicing (you can also browse and download the code at http://bazaar.launchpad.net/~osomon/+junk/bicing/files).

Now I need to find how to make the most of this data. I was thinking of regularly polling for a given set of stations over a given period of time, storing the data and then drawing a graph, to better understand the data. I will probably publish my findings in a next article, stay tuned!