scraping

Google maps new !-style embed format

A while ago Google changed the structure of embedded map URLs. The old format used the web-standard key1=value1&key=value2 style, and you can find a reasonably good description of these parameters here. Unfortunately the new style is less verbose and much less intelligible, which seems like a step backwards even if these links are mostly hidden under the surface. Either Google wanted them to be less human-readable, or they care enough about saving a few bytes here and there to do this. I can’t find any good explanation of how to parse these links, so here’s my morning’s attempt. Be warned that this is all guesswork based on a limited sample and some experimentation. (more…)

Getting data from the web for research

I gave an internal talk at Nesta recently on ‘getting data from the web’, covering web scraping and open APIs. Its designed for researchers who might consider using these technologies but don’t know what much about them. It is not a technical guide though, so won’t help much if you want to get straight to business.

It belongs to a growing set of ‘data’ themed skills resources Nesta is collecting.

ABC’s Q&A and (lack of) political party bias

Q&A is a live panel discussion show, filmed before a studio audience, produced by Australia’s ABC. It is virtually identical to BBC’s Question Time for British readers. A few months ago I noticed that all the transcripts are posted online, and I thought this would be an interesting way to analyse the political bias, and representativeness, of the show.

(more…)