<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Online Journalism Blog &#187; Search Results  &#187;  yahoo+pipes</title>
	<atom:link href="http://onlinejournalismblog.com/search/yahoo+pipes/feed/rss2/" rel="self" type="application/rss+xml" />
	<link>http://onlinejournalismblog.com</link>
	<description>A conversation.</description>
	<lastBuildDate>Sat, 11 Feb 2012 12:06:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<cloud domain='onlinejournalismblog.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>Different Speeches? Digital Skills Aren’t just About Coding…</title>
		<link>http://blog.ouseful.info/2012/01/12/different-speeches-digital-skills-arent-just-about-coding/</link>
		<comments>http://blog.ouseful.info/2012/01/12/different-speeches-digital-skills-arent-just-about-coding/#comments</comments>
		<pubDate>Thu, 12 Jan 2012 13:10:44 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[Anything you want]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6805</guid>
		<description><![CDATA[Secretary of State for Education, Michael Gove, gave a speech yesterday on rethinking the ICT curriculum in UK schools. You can read a copy of the speech variously on the Department for Education website, or, err, on the Guardian website. Seeing these two copies of what is apparently the same speech, I started wondering: a) [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6805&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Secretary of State for Education, Michael Gove, gave a speech yesterday on rethinking the ICT curriculum in UK schools. You can read a copy of the speech variously  on the <a href="http://www.education.gov.uk/inthenews/speeches/a00201868/michael-gove-speech-at-the-bett-show-2012" onclick="urchinTracker('/outgoing/www.education.gov.uk/inthenews/speeches/a00201868/michael-gove-speech-at-the-bett-show-2012?referer=');">Department for Education website</a>, or, err, on <a href="http://www.guardian.co.uk/education/2012/jan/11/digital-literacy-michael-gove-speech" onclick="urchinTracker('/outgoing/www.guardian.co.uk/education/2012/jan/11/digital-literacy-michael-gove-speech?referer=');">the Guardian website</a>.</p>
<p>Seeing these two copies of what is apparently the same speech, I started wondering:</p>
<p>a) which is the &#8220;best&#8221; source to reference?<br />
b) how come the Guardian doesn&#8217;t add a disclaimer about the provenance of, and link, to the DfE version? [Note the disclaimer in the DfE version - "Please note that the text below may not always reflect the exact words used by the speaker."]<br />
c) is the Guardian version an actual transcript, maybe? That is, does the Guardian reprint the &#8220;exact words&#8221; used by the speaker?</p>
<p>And that made me think I should do a diff&#8230; About which, more below&#8230;</p>
<p>Before that, however, here&#8217;s a quick piece of reflection on how these two things &#8211; the reinvention of the the IT curriculum, and the provenance of, and value added to, content published on news and tech industry blog sites &#8211; collide in my mind&#8230;</p>
<p>So for example, I&#8217;ve been pondering what the role of journalism is, lately, in part because I&#8217;m trying to clarify in my own mind what I think the practice and role of <em>data</em> journalism are (maybe I should apply for a <a href="http://www.niemanlab.org/2012/01/announcing-the-nieman-berkman-fellowship-in-journalism-innovation/" onclick="urchinTracker('/outgoing/www.niemanlab.org/2012/01/announcing-the-nieman-berkman-fellowship-in-journalism-innovation/?referer=');">Nieman-Berkman Fellowship in Journalism Innovation</a> to work on this properly?!). It seems to me that &#8220;communication&#8221; is one important part (raising awareness of particular issues, events, or decisions), and holding governments and companies to account is another. (Actually, I think Paul Bradshaw has called me out on that, before, suggesting it was more to do with providing an evidence base through verification and triangulation, as well as comment, against which governments and companies could be held to account (err, I think? As an unjournalist, I don&#8217;t have notes or a verbatim quote against which to check that statement, and I&#8217;m too lazy to email/DM/phone Paul to clarify what he may or may not have said&#8230;(The extent of my checking is typically limited to what I can find on the web or in personal archives&#8230;which appear to be lacking on this point&#8230;))</p>
<p>Another thing I&#8217;ve been mulling over recently in a couple of contexts relates to the notion of what are variously referred to as digital or information skills.</p>
<p>The first context is &#8220;data journalism&#8221;, and the extent to which data journalists need to be able to do programming (in the sense of identifying the steps in a process that can be automated and how they should be sequenced or organised) versus writing code. (I can&#8217;t write code for toffee, but I can read it well enough to copy, paste and change bits that other people have written. That is, I can appropriate and reuse other people&#8217;s code, but can&#8217;t write it from scratch very well&#8230; Partly because I can&#8217;t ever remember the syntax and low level function names. I can also use tools such as Yahoo Pipes and Google Refine to do coding like things&#8230;) Then there&#8217;s the question of what to call things like <a href="http://blog.ouseful.info/2011/05/19/whose-investor-relations-sites-do-thomson-reuters-host-a-form-of-url-hacking/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/05/19/whose-investor-relations-sites-do-thomson-reuters-host-a-form-of-url-hacking/?referer=');">URL hacking</a> or <a href="http://blog.ouseful.info/2012/01/11/googling-nasties-and-oopses-on-university-and-public-sector-websites/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2012/01/11/googling-nasties-and-oopses-on-university-and-public-sector-websites/?referer=');">(search engine) query building</a>?</p>
<p>The second context is geeky computer techie stuff in schools, the sort of thing covered by Michael Gove&#8217;s speech at the BETT show on the national ICT curriculum (or lack thereof), and about which the educational digerati were all over on Twitter yesterday. Over the weekend, houseclearing my way through various &#8220;archives&#8221;, I came across all manner of press clippings from 2000-2005 or so about the activities of the OU Robotics Outreach Group, of which I was a co-founder (the web presence has only recently been shut down, in part because of the retirement of the sys admin on whose server the websites resided.) This group ran an annual open meeting every November for several years hosting talks from the educational robotics community in the UK (from primary school to HE level). The group also co-ordinated the <a href="http://rcj.robocup.org/" onclick="urchinTracker('/outgoing/rcj.robocup.org/?referer=');">RoboCup Junior</a> competition in the UK, ran outreach events, developed various support materials and activities for use with Lego Mindstorms, and led the EPSRC/AHRC Creative Robotics Research Network.</p>
<p>At every robotics event, we&#8217;d try to involve kids and/or adults in elements of problem solving, mechanical design, programming (not really coding&#8230;) based around some sort of themed challenge: a robot fashion show, for example, or a treasure hunt (both variants on edge following/line following;-) Or a robot rescue mission, as used in a day long activity in the <a href="http://www3.open.ac.uk/study/undergraduate/course/txr120.htm" onclick="urchinTracker('/outgoing/www3.open.ac.uk/study/undergraduate/course/txr120.htm?referer=');">&#8220;Engineering: An Active Introduction&#8221; (TXR120) OU residential school</a>, or the 3 hour &#8220;Robot Theme Park&#8221; team building activity in the <a href="http://www3.open.ac.uk/study/postgraduate/course/T885.htm" onclick="urchinTracker('/outgoing/www3.open.ac.uk/study/postgraduate/course/T885.htm?referer=');">Masters level &#8220;Team Engineering&#8221; (T885) weekend school</a>. [If you're interested, we may be able to take bookings to run these events at your institution. We can make them work at a variety of difficulty levels from KS3-4 and up;-)]</p>
<p>Given that working at the bits-atoms interface is where the a lot of the not-purely-theoretical-or-hardcore-engineering innovation and application development is likely to take place over the next few years, any mandate to drop the &#8220;boring&#8221; Windows training ICT stuff in favour of programming (which I suspect can be taught in not only a really tedious way, but a really confusing and badly delivered way too) is probably Not the Best Plan.</p>
<p>Slightly better, and something  that I know is currently being mooted for reigniting interest in computing, is the <a href="http://www.raspberrypi.org/" onclick="urchinTracker('/outgoing/www.raspberrypi.org/?referer=');">Raspberry Pi</a>, a cheap, self-contained, programmable computer on a board (good for British industry, just like the BBC Micro was&#8230;;-) that allows you to work at the interface between the real world of atoms and the virtual world of bits that exists inside the computer. (See also things like the <a href="http://www.youtube.com/watch?v=Tgn4Ln47lM8" onclick="urchinTracker('/outgoing/www.youtube.com/watch?v=Tgn4Ln47lM8&amp;referer=');">OU Senseboard</a>, as used on the OU course <a href="http://www3.open.ac.uk/study/undergraduate/course/tu100.htm" onclick="urchinTracker('/outgoing/www3.open.ac.uk/study/undergraduate/course/tu100.htm?referer=');">&#8220;My Digital Life&#8221; (TU100)</a>.)</p>
<p>If schools were actually being encouraged to make a financial investment on a par with the level of investment around the introduction of the BBC Micro, back in the day, I&#8217;d suggest a <a href="http://store.makerbot.com/replicator-404.html" onclick="urchinTracker('/outgoing/store.makerbot.com/replicator-404.html?referer=');">3D printer</a> would have more of the wow factor&#8230;(I&#8217;ll doodle more on the rationale behind this in another post&#8230;) The financial climate may not allow for that (but I bet budget will manage to get spent anyway&#8230;) but whatever the case, I think Gove needs to be wary about consigning kids to lessons of coding hell. And maybe take a look at programming in a wider creative context, such as robotics (the word &#8220;robotics&#8221; is one of the reason why I think it&#8217;s seen as a very specialised, niche subject; we need a better phrase, such as &#8220;Creative Technologies&#8221;, which could combine elements of robotics, games programming, photoshop, and, yex, Powerpoint too&#8230; Hmm&#8230; thinks.. the OU has a couple of courses that have just come to the end of their life that between them provide a couple of hundred hours of content and activity on robotics (T184) and games programming (T151), and that we delivered, in part, to 6th formers under the OU&#8217;s Young Applicants in Schools Scheme.</p>
<p>Anyway, that&#8217;s all as maybe&#8230; Because there are plenty of digital skills that let you do coding like things without having to write code. Such as finding out whether there are any differences between the text in the DfE copy of Gove&#8217;s BETT speech, and the Guardian copy.</p>
<p>Copy the text from each page into a separate text file, and save it. (You&#8217;ll need a text editor for that..) Then, if you haven&#8217;t already got one, find yourself a <em>good</em> text editor. I use Text Wrangler on a Mac. (Actually, I think MS Word may have a diff function?)</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6684098463/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6684098463/?referer=');"><img src="http://farm8.staticflickr.com/7148/6684098463_d7803bb4b3.jpg" width="500" height="370" alt="FInding diffs between txt doccs in Text Wrangler" /></a></p>
<p>The difference&#8217;s all tend to be in the characters used for quotation marks (character encodings are one of the things that can make all sorts of programmes fall over, or misbehave. Just being aware that they may cause a problem, as well as how and why, would be a great step in improving the baseline level understanding of <a href="http://blog.ouseful.info/2011/10/31/appropriate-it-my-ili2011-presentation/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/10/31/appropriate-it-my-ili2011-presentation/?referer=');">folk IT</a>. Some of the line breaks don&#8217;t quite match up either, but other than that, the text is the same.</p>
<p>Now, this may be because Gove was a good little minister and read out the words exactly as they had been prepared. Or it may be the case that the Guardian just reprinted the speech without mentioning provenance, or the disclaimer that he may not actually have read the words of that speech (I have vague memories of an episode of <em>Yes, Minister</em>, here&#8230;;-)</p>
<p>Whatever the case, if you know: a) that it&#8217;s even possible to compare two documents to see if they are different (a handy piece of <em>folk IT</em> knowledge); and b) know a tool that does it (or how to find a tool that does it, or a person that may have a tool that can do it), then you can compare the texts for yourself. And along the way, maybe learn that churnalism, in a variety of forms, is endemic in the media. Or maybe just demonstrate to yourself when the media is acting in a purely comms, rather than journalistic, role?</p>
<p>PS other phrases in the area: &#8220;computational thinking&#8221;. Hear, for example: <a href="http://blog.jonudell.net/2007/06/18/a-conversation-with-jeannette-wing-about-computational-thinking/" onclick="urchinTracker('/outgoing/blog.jonudell.net/2007/06/18/a-conversation-with-jeannette-wing-about-computational-thinking/?referer=');">A conversation with Jeannette Wing about computational thinking</a> </p>
<span style='text-align:left;display:block;'><p><object type='application/x-shockwave-flash' data='http://s0.wp.com/wp-content/plugins/audio-player/player.swf' width='290' height='24' id='audioplayer1'><param name='movie' value='http://s0.wp.com/wp-content/plugins/audio-player/player.swf' /><param name='FlashVars' value='&amp;bg=0xf8f8f8&amp;leftbg=0xeeeeee&amp;lefticon=0x666666&amp;rightbg=0xcccccc&amp;rightbghover=0x999999&amp;righticon=0x666666&amp;righticonhover=0xffffff&amp;text=0x666666&amp;slider=0x666666&amp;track=0xFFFFFF&amp;border=0x666666&amp;loader=0x9FFFB8&amp;soundFile=http%3A%2F%2Fitc.conversationsnetwork.org%2Faudio%2Fdownload%2FITC.INNO-JeannetteWing-2007.06-14.mp3' /><param name='quality' value='high' /><param name='menu' value='false' /><param name='bgcolor' value='#FFFFFF' /><param name='wmode' value='opaque' /></object></p></span>
<p>PPS I just remembered &#8211; there&#8217;s a data journalism hook around this story too&#8230; from a tweet exchange last night that I was reminded of by an RT:</p>
<p><tt>josiefraser: RT @grmcall: Of the 28,000 new teachers last year in the UK, 3 had a computer-related degree. Not 3000, just 3.<br />
dlivingstone: @josiefraser Source??? Not found it yet. RT @grmcall: 28000 new UK teachers last year, 3 had a computer-related degree. Not 3000, just 3<br />
josiefraser: That ICT qualification teacher stat RT @grmcall: Source is the Guardian http://www.guardian.co.uk/education/2012/jan/09/computer-studies-in-schools</tt></p>
<p>I did a little digging and found the following document on the General Teaching Council of England website &#8211; <a href="http://www.gtce.org.uk/documents/publicationpdfs/annual_digest_psd110811.pdf" onclick="urchinTracker('/outgoing/www.gtce.org.uk/documents/publicationpdfs/annual_digest_psd110811.pdf?referer=');">Annual digest of statistics 2010–11 &#8211; Profiles of registered teachers in England [PDF]</a> &#8211; that contains demographic stats, amongst others, for UK teachers. But no stats relating to subject areas of degree level qualifications held, which is presumably the data referred to in the tweet. So I&#8217;m thinking: this is partly where the role of data journalist comes in&#8230; They may not be able to verify the numbers by checking independent sources, but they may be able to shed some light on where the numbers came from and how they were arrived at, and maybe even secure their release (albeit as a single point source?)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6805/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6805/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6805/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6805/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6805/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6805/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6805/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6805/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6805/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6805/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6805/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6805/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6805/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6805/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6805/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6805/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6805/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6805/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6805/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6805/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6805/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6805&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2012/01/12/different-speeches-digital-skills-arent-just-about-coding/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://itc.conversationsnetwork.org/audio/download/ITC.INNO-JeannetteWing-2007.06-14.mp3" length="13414671" type="audio/mpeg" />
<enclosure url="" length="" type="" />
<enclosure url="http://farm8.staticflickr.com/7148/6684098463_d7803bb4b3.jpg" length="" type="" />
<enclosure url="http://itc.conversationsnetwork.org/audio/download/ITC.INNO-JeannetteWing-2007.06-14.mp3" length="" type="" />
		</item>
		<item>
		<title>Finding Common Terms around a Twitter Hashtag</title>
		<link>http://blog.ouseful.info/2011/11/22/finding-common-twrms-around-a-twitter-hashtag/</link>
		<comments>http://blog.ouseful.info/2011/11/22/finding-common-twrms-around-a-twitter-hashtag/#comments</comments>
		<pubDate>Tue, 22 Nov 2011 13:47:33 +0000</pubDate>
		<dc:creator>Tony Hirst</dc:creator>
				<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[Tinkering]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=6494</guid>
		<description><![CDATA[@aendrew sent me a link to a StackExchange question he&#8217;s just raised, in a tweet asking: &#8220;Anyone know how to find what terms surround a Twitter trend/hashtag?&#8221; I&#8217;ve dabbled in this area before, though not addressing this question exactly, using Yahoo Pipes to find what hashtags are being used around a particular search term (Searching [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=6494&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>@aendrew sent me a link to a <a href="http://webapps.stackexchange.com/questions/21227/finding-terms-surrounding-a-trending-hashtag" onclick="urchinTracker('/outgoing/webapps.stackexchange.com/questions/21227/finding-terms-surrounding-a-trending-hashtag?referer=');">StackExchange question</a> he&#8217;s just raised, in a tweet asking: &#8220;Anyone know how to find what terms surround a Twitter trend/hashtag?&#8221;</p>
<p>I&#8217;ve dabbled in this area before, though not addressing this question exactly, using Yahoo Pipes to find what hashtags are being used around a particular search term (<a href="http://blog.ouseful.info/2009/09/23/finding-hashtag-communities/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2009/09/23/finding-hashtag-communities/?referer=');">Searching for Twitter Hashtags and Finding Hashtag Communities</a>) or by members of a particular list (<a href="http://blog.ouseful.info/2009/11/02/whats-happening-now-hashtags-on-twitter-lists/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2009/11/02/whats-happening-now-hashtags-on-twitter-lists/?referer=');">What’s Happening Now: Hashtags on Twitter Lists</a>; that post also links to a pipe that identifies names of people tweeting around a particular search term.).</p>
<p>So what would we need a pipe to do that finds terms surrounding a twitter hashtag?</p>
<p>Firstly, we need to search on the tag to pull back a list of tweets containing that tag. Then we need to split the tweets into atomic elements (i.e. separate words). At this point, it might be useful to count how many times each one occurs, and display the most popular. We might also need to generate a &#8220;stop list&#8221; containing common words we aren&#8217;t really interested in (for example, <em>the</em> or <em>and</em>.</p>
<p>So here&#8217;s a quick hack at a pipe that does just that (<a href="http://pipes.yahoo.com/pipes/pipe.info?_id=bc70b0517a440a21f72ba84627a754d1" onclick="urchinTracker('/outgoing/pipes.yahoo.com/pipes/pipe.info?_id=bc70b0517a440a21f72ba84627a754d1&amp;referer=');">Popular words round a hashtag</a>).</p>
<p>For a start, I&#8217;m going to construct a string tokeniser that just searches for 100 tweets containing a particular search term, and then splits each tweet up in separate words, where words are things that are separated by white space. The pipe output is just a list of all the words from all the tweets that the search returned:</p>
<p><a href="http://pipes.yahoo.com/pipes/pipe.info?_id=9426fd5b7bccbac61f7e7a2e0c3e7544" title="Photo Sharing" onclick="urchinTracker('/outgoing/pipes.yahoo.com/pipes/pipe.info?_id=9426fd5b7bccbac61f7e7a2e0c3e7544&amp;referer=');"><img src="http://farm7.staticflickr.com/6038/6382881075_ac82ab8f96.jpg" width="500" height="361" alt="Twitter string tokeniser" /></a></p>
<p>You might notice the pipe also allows us to choose which page of results we want&#8230;</p>
<p>We can now use the helper pipe in another pipe. Firstly, let&#8217;s grab the words from a search that returns 200 tweets on the same search term. The helper pipe is called twice, once for the first page of results, once for the second page of results. The wordlists from each search query are then merged by the union block. The Rename block relabels the .content attribute as the .title attribute of each feed item.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6382923941/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6382923941/?referer=');"><img src="http://farm7.staticflickr.com/6235/6382923941_6a2433a56e.jpg" width="500" height="209" alt="Grab 200 tweets and check we have set the title element" /></a></p>
<p>The next thing we&#8217;re going to do is identify and count the unique words in the combined wordlist using the Unique block, and then sort the list accord to the number of times each word occurs.</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6382932181/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6382932181/?referer=');"><img src="http://farm7.staticflickr.com/6041/6382932181_7afb163e43.jpg" width="500" height="332" alt="Preliminary parsing of a wordlist" /></a></p>
<p>The above pipe fragment also filters the wordlist so that only words containing alphabetic characters are allowed through, as well as words with four or more characters. (The regular expression .{4,} reads: allow any string of four or more ({4,}) characters of any type (.). An expression .{5,7} would say &#8211; allow words through with length 5 to 7 characters.)</p>
<p>I&#8217;ve also added a short routine that implements a stop list. The regular expression pattern <strong>(?i)\b(word1|word2|word3)\b</strong> says: ignoring case ((?i)),try to match any of the words word1, word2, word3. (\b denotes  word boundary.) Note that in the filter below, some of the words in my stop list are redundant (the ones with three or fewer characters. Remember, we have already filtered the word list to show only words of length four or more characters.)</p>
<p><a href="http://www.flickr.com/photos/psychemedia/6382948055/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/6382948055/?referer=');"><img src="http://farm7.staticflickr.com/6234/6382948055_1e30fd1902.jpg" width="500" height="116" alt="Stop list" /></a></p>
<p>I also added a user input that allows additional stop terms to be added (they should be pipe (|) separated, with no spaces between them). You can find the pipe <a href="http://pipes.yahoo.com/pipes/pipe.info?_id=bc70b0517a440a21f72ba84627a754d1" onclick="urchinTracker('/outgoing/pipes.yahoo.com/pipes/pipe.info?_id=bc70b0517a440a21f72ba84627a754d1&amp;referer=');">here</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6494/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/6494/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/6494/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6494/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/6494/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/6494/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6494/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/6494/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/6494/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6494/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/6494/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/6494/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6494/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/6494/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/6494/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6494/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/6494/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/6494/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6494/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/6494/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/6494/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=6494&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/11/22/finding-common-twrms-around-a-twitter-hashtag/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
<enclosure url="http://farm7.staticflickr.com/6038/6382881075_ac82ab8f96.jpg" length="" type="" />
<enclosure url="http://farm7.staticflickr.com/6235/6382923941_6a2433a56e.jpg" length="" type="" />
<enclosure url="http://farm7.staticflickr.com/6041/6382932181_7afb163e43.jpg" length="" type="" />
<enclosure url="http://farm7.staticflickr.com/6234/6382948055_1e30fd1902.jpg" length="" type="" />
		</item>
		<item>
		<title>Has investigative journalism found its feet online? (part 3)</title>
		<link>http://onlinejournalismblog.com/2011/08/25/has-investigative-journalism-found-its-feet-online-part-3/</link>
		<comments>http://onlinejournalismblog.com/2011/08/25/has-investigative-journalism-found-its-feet-online-part-3/#comments</comments>
		<pubDate>Thu, 25 Aug 2011 06:53:24 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[online journalism]]></category>
		<category><![CDATA[help me investigate]]></category>
		<category><![CDATA[investigative journalism]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=15100</guid>
		<description><![CDATA[Previously this serialised chapter for the forthcoming book Investigative Journalism: Dead or Alive? looked at new business models surrounding investigative journalism and online investigative journalism as a genre. This third and final part looks at how changing supplies of information change the context within which investigative journalism operates. What next for investigative journalism in a world of information overload? But this<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/08/25/has-investigative-journalism-found-its-feet-online-part-3/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/08/25/has-investigative-journalism-found-its-feet-online-part-3/?referer=');">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F25%2Fhas-investigative-journalism-found-its-feet-online-part-3%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F08_2F25_2Fhas-investigative-journalism-found-its-feet-online-part-3_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F25%2Fhas-investigative-journalism-found-its-feet-online-part-3%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><em>Previously this serialised chapter for the <a href="http://www.arimapublishing.co.uk/bookshopuk/bookinfo/book_184549490" onclick="urchinTracker('/outgoing/www.arimapublishing.co.uk/bookshopuk/bookinfo/book_184549490?referer=');">forthcoming book Investigative Journalism: Dead or Alive?</a> looked at <a href="http://onlinejournalismblog.com/2011/08/23/has-investigative-journalism-found-its-feet-online-part-1/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/08/23/has-investigative-journalism-found-its-feet-online-part-1/?referer=');">new business models surrounding investigative journalism</a> and <a href="http://onlinejournalismblog.com/2011/08/24/has-investigative-journalism-found-its-feet-online-part-2/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/08/24/has-investigative-journalism-found-its-feet-online-part-2/?referer=');">online investigative journalism as a genre</a>. This third and final part looks at how changing supplies of information change the context within which investigative journalism operates.</em></p>
<h2>What next for investigative journalism in a world of information overload?</h2>
<p>But this identity crisis does highlight a final, important, question to be asked: in a world where users have direct access to a wealth of information themselves, what is investigative journalism for? I would argue that it comes down to the concept of “uncovering the hidden”, and in exploring this it is useful to draw an analogy with the general journalistic idea of “reporting the new”.</p>
<p>Trainee journalists sometimes see “new” in limited terms – as simply what is happening today. But what is “new” is not limited to that. It can also be what is happening tomorrow, or what happened 30 years ago. It can be something that someone has said about an “old story” days later, or an emerging anger about something that was never seen as “newsworthy” to begin with. The talent of the journalist is to be able to spot that “newness”, and communicate it effectively.</p>
<p>Journalism typically becomes investigative when that newness involves uncovering the hidden – and that can be anything that our audience couldn’t see before – it could be a victim’s story, a buried report, 250,000 cables accessible to 2.5 million people, or even information that is publicly available but has not been connected before (“the hidden” – like “the new” is, of course, a subjective quality, dependent on the talent of a particular journalist for finding something in it – or a way of seeing it – that is newsworthy).<span id="more-15100"></span></p>
<p>So what if all of the investigative journalist’s material was public: documents, sources (witnesses, experts, victims, actors in the story), and information? The role of the investigative journalist would perhaps be as follows:</p>
<ul>
<li>to make the “hidden” (to their audience) “visible”;</li>
<li>to hold power to account;</li>
<li>to make connections;</li>
<li>to verify;</li>
<li>to test hypotheses &#8211; the why and how of journalism.</li>
</ul>
<p>This doesn’t sound very different to how we see the role now. Of course, in reality, all of the investigative journalist’s material will most likely not be online, so if we leave that thought experiment behind we can add other roles to acknowledge this:</p>
<ul>
<li>to make the invisible visible (i.e. digitising offline material, from paper documents and witness accounts to the “invisible web” of databases);</li>
<li>to make the disconnected connected: publishing information in such a way that others can make further connections with other sources of data;</li>
<li>to identify gaps in information – and fill them.</li>
</ul>
<p>These are all, in fact, “making the hidden visible” in another form, whether they fill those gaps with material that is in the public domain or which only exists in a single witness’s diary.</p>
<h2>Narrative and authority</h2>
<p>The role of a journalist in creating a narrative comes through strongly here: hypotheses are about narratives; making connections is about making narratives. Narratives are important – they help people find their place in a story &#8211; but onlline investigations can have multiple narratives and different users can find different entry points across those.</p>
<p>The other role that comes through strongly is institutional: holding power to account involves (but does not require) being in a position of power to do so; verification involves (but does not require) the stamp of institutional “due process”.</p>
<p>My own experience with Help Me Investigate suggests that these two roles remain important bases for journalism as a profession: in crowdsourced journalism, “writing the story up” did not particularly appeal to non-journalists (the story was in their minds already) – only journalists wanted to do that. And it took an established media outlet to get official reaction.</p>
<p>This is not to suggest that only journalists can “have impact” as was mentioned at the conference – there are plenty of examples of groundswells of opinion online instigating media coverage: Memogate is perhaps the best known example. But this does not mean we need journalists so much as it means that we need publishers and broadcasters. There is a difference.</p>
<h2>No excuses</h2>
<p>So what does this mean for future investigative journalists online? Firstly, we may have to accept that many parts of investigative journalism will lose their air of mystery: from gathering information to publishing and distributing it, there are now dozens of new opportunities for the aspiring investigator: FoI tools such as WhatDoTheyKnow; free data gathering and interrogation tools such as Yahoo! Pipes and OutWit Hub, leak-hosting sites, and tools to combine and clean data such as Google Refine and SQLite. I see students now able to do work that would baffle many full time journalists.</p>
<p>That’s no bad thing: distinguishing investigative journalism from other types of reporting was always problematic: “All journalism should be investigative” is a near-cliché because it goes to the heart of what we should be doing as journalists. Now we have the opportunity to act on that sentiment.</p>
<p>Journalists will be – and already are – more collaborative, learning how to work with and across networks. The internet has made it possible to separate the “investigative” from the “journalism”: students, bloggers, activists, and anyone else with a burning question can begin to investigate it. They can raise questions openly with thousands of others online, submit FoI requests at the click of a button or analyse datasets and documents with free tools, regardless of whether or not they are employed as a journalist. The vast majority do not want to be a journalist. What they want are answers.</p>
<p>Their efforts can have value regardless of their job title. The role of investigative journalism – online, and then in print and broadcast – will increasingly be to build on their work: to make it visible; to verify it; to connect it to other information; to hold power to account over it. (If that threatens your <a href="http://en.wikipedia.org/wiki/Romanticism" onclick="urchinTracker('/outgoing/en.wikipedia.org/wiki/Romanticism?referer=');">romantic idea of the “individual genius”</a> and makes you feel somehow less important, then you can take comfort in calling those people mere “sources” and yourself a “proper journalist”. It doesn’t matter what you call it as long as journalism gets done – although you may alienate potential sources by doing so in public.)</p>
<p>Secondly, journalists will need to carefully judge how and when to tell different parts of their story. The medium and channel of presentation is one new element of judgement &#8211; but they will also have to balance publicness against privateness at every stage, and judge how either might improve or speed up their work, and increase its impact and reach. There are no easy answers to these questions.</p>
<p>Finally, aspiring investigative journalists will no longer wait for a job title – or even a job – to begin investigating. At the conference that inspired this book two journalism students were asked how they saw the problems facing investigative journalism. The first felt that institutional restrictions on time or money should not be an excuse for journalists failing to investigate important questions in their own time; the second felt that people no longer needed institutional validation to investigate something: they could publish on a blog and build an audience that way.</p>
<p>These are hugely encouraging sentiments for anyone who worries about the state of modern journalism. Of course, some people will always look for excuses, but thanks to the access to information, documents, sources, collaborators and tools that the internet presents, we have fewer excuses to make. Not only that, journalists and aspiring journalists have the opportunity of a generation: to define the shape of investigative journalism to come.</p>
<p>&nbsp;</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F08%2F25%2Fhas-investigative-journalism-found-its-feet-online-part-3%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/08/25/has-investigative-journalism-found-its-feet-online-part-3/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How I hacked my journalism workflow (#jcarn)</title>
		<link>http://onlinejournalismblog.com/2011/06/13/how-i-hacked-my-journalism-workflow-jcarn/</link>
		<comments>http://onlinejournalismblog.com/2011/06/13/how-i-hacked-my-journalism-workflow-jcarn/#comments</comments>
		<pubDate>Mon, 13 Jun 2011 19:30:29 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[faq]]></category>
		<category><![CDATA[mobile journalism]]></category>
		<category><![CDATA[#jcarn]]></category>
		<category><![CDATA[automation]]></category>
		<category><![CDATA[automator]]></category>
		<category><![CDATA[Carnival of Journalism]]></category>
		<category><![CDATA[chrome]]></category>
		<category><![CDATA[delicious]]></category>
		<category><![CDATA[easy youtube downloader]]></category>
		<category><![CDATA[errorzilla]]></category>
		<category><![CDATA[firefox]]></category>
		<category><![CDATA[ifttt]]></category>
		<category><![CDATA[imacros]]></category>
		<category><![CDATA[packrati.us]]></category>
		<category><![CDATA[plugins]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[shortcuts]]></category>
		<category><![CDATA[tineye]]></category>
		<category><![CDATA[transpose]]></category>
		<category><![CDATA[vlookup]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=14722</guid>
		<description><![CDATA[I&#8217;ve been meaning to write a post for some time breaking down all the habits and hacks I&#8217;ve acquired over the years &#8211; so this month&#8217;s Carnival of Journalism question on &#8216;Hacking your journalism workflow&#8217; gave me the perfect nudge. Picking those habits apart is akin to an act of archaeology. What might on the surface look very complicated is simply<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/06/13/how-i-hacked-my-journalism-workflow-jcarn/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/06/13/how-i-hacked-my-journalism-workflow-jcarn/?referer=');">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F06%2F13%2Fhow-i-hacked-my-journalism-workflow-jcarn%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F06_2F13_2Fhow-i-hacked-my-journalism-workflow-jcarn_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F06%2F13%2Fhow-i-hacked-my-journalism-workflow-jcarn%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>I&#8217;ve been meaning to write a post for some time breaking down all the habits and hacks I&#8217;ve acquired over the years &#8211; so <a title="http://carnivalofjournalism.com/2011/05/11/june-carnival-of-journalism/" rel="nofollow" href="http://carnivalofjournalism.com/2011/05/11/june-carnival-of-journalism/" target="_blank" onclick="urchinTracker('/outgoing/carnivalofjournalism.com/2011/05/11/june-carnival-of-journalism/?referer=');">this month&#8217;s Carnival of Journalism question</a> on &#8216;Hacking your journalism workflow&#8217; gave me the perfect nudge.</p>
<p>Picking those habits apart is akin to an act of archaeology. What might on the surface look very complicated is simply the accumulation of small acts over several years. Those acts range from the habits themselves to creating simple shortcuts and automated systems, and learning from experience. So that&#8217;s how I&#8217;ve broken it down:</p>
<h2>1. Shortcuts</h2>
<p>Shortcuts are such a basic part of my way of working that it&#8217;s easy to forget they&#8217;re there: bookmarks in the browser bar, for example. Or using the Chrome browser because its address bar also acts as a search bar for previous pages.</p>
<p>I realise I use Twitter lists as a shortcut of sorts &#8211; to zoom in on particular groups of people I&#8217;m interested in at a particular time, such as experts in a particular area, or a group of people I&#8217;m working with. Likewise, I use folders in Google Reader to periodically check on a particular field &#8211; such as data journalism &#8211; or group &#8211; such as UK journalists.<span id="more-14722"></span></p>
<p>Getting more specific, when it comes to data journalism tasks I rely on a whole range of tools and shortcuts for cleaning and interrogating datasets: the =TRANSPOSE formula, for example, will swap a spreadsheet&#8217;s rows and columns; =VLOOKUP will copy across data from matching cells; and the free tool Google Refine will quickly identify similar entries (which may have been misspelled).</p>
<p>On my desktop I rely on plugins for Firefox and Chrome such as Firebug (check a page&#8217;s HTML), OutWit Hub (scrape a page), TinEye (check if an image has been used elsewhere), ErrorZilla (check for cached and older versions of a webpage), and Easy YouTube Downloader (download YouTube videos). Links to these and other useful plugins can be found at <a rel="nofollow" href="http://delicious.com/paulb/firefox" target="_blank" onclick="urchinTracker('/outgoing/delicious.com/paulb/firefox?referer=');">http://delicious.com/paulb/firefox</a></p>
<p>But the most frequently used shortcuts are the bookmarklets that are installed on my mobile phone browser &#8211; &#8216;Read Later&#8217; (Instapaper); &#8216;Bookmark on Delicious&#8217;; &#8216;Tweet with Echofon&#8217;; &#8216;save on Springpad&#8217; or Evernote; and &#8216;Blog on Tumblr&#8217;. These are made even more powerful through automation.</p>
<h2><strong>2. Automation</strong></h2>
<p>RSS can be a hugely useful technology when it comes to saving time and automating processes &#8211; and Delicious is the king of useful RSS feeds in this respect.</p>
<p>If I want to tweet a useful link as well as bookmark it, for example, I simply add the tag &#8216;t&#8217; &#8211; the RSS feed for which is automatically tweeted to my account by Twitterfeed. If I want to tweet it using the @helpmeinvestig8 account I add the tag &#8216;hmitwt&#8217;. Webpages which I think might be useful to students on the MA in Television and Interactive Content I tag &#8216;tvi&#8217; &#8211; this not only sends them to the @bcumedia_matvic account but also to an email newsletter that students receive (I use Feedburner for this). If I wanted to I could set up a Tumblr blog to automatically pull items from the RSS feed for a particular tag, too. And all of this is triggered by one click, and one tag.</p>
<p>The process works the other way: Packrati.us will bookmark any link you tweet in your Delicious account. And Trunk.ly automatically archives both your Delicious bookmarks and tweeted links, providing a backup search engine.</p>
<p>IFTTT (IF This Then That) is a new service which promises some amazing possibilities for automating processes between (currently) 32 different services, including Delicious, Google Reader, stock performances, times and dates, emails, phone calls and any RSS feed. I&#8217;ve been using it to bookmark anything I share on Google Reader, but I&#8217;m on the lookout for other uses.</p>
<p>For other tasks the Firefox plugin iMacros can automate web-based actions so you don&#8217;t have to repeat them, while Automator on the Mac will do the same for computer-based actions. For links to these and IFTTT see <a rel="nofollow" href="http://www.delicious.com/paulb/automation+tools" target="_blank" onclick="urchinTracker('/outgoing/www.delicious.com/paulb/automation+tools?referer=');">http://www.delicious.com/paulb/automation+tools</a></p>
<h2><strong>3. Habits</strong></h2>
<p>For all the above it is ultimately up to you to set balls in motion, and here I think establishing habits is key. In particular, bookmarking is one habit that I find saves me more time than anything else.</p>
<p>Every morning I check my RSS feeds and bookmark items I think may be useful in future. Bookmarking and tagging them builds a resource that I can look to whenever I need to solve a problem, help someone, or write something quickly. So if I decide to write something on data visualisation, I already have an archive of pre-filtered material to refer to. If I need data on health, I already have several health datasets that I&#8217;ve bookmarked and tagged. And if I have a Yahoo! Pipes-related problem, I can check my bookmarks first.</p>
<p>Delicious is the main place that I do this &#8211; but it&#8217;s no longer the only one. My Tumblr blog is essentially a place where I bookmark multimedia and quotes &#8211; so if I need some multimedia or a choice quote, that&#8217;s where I look first.</p>
<p>And blogging itself is a great habit to have: it makes me remember things better, provides a space where I can re-find them, and helps me (or others) identify gaps.</p>
<h2><strong>4. Discipline</strong></h2>
<p>The final journalism hack is my most recent one &#8211; and I think something that more and more online journalists are learning too as they hit information fatigue. It&#8217;s self-discipline.</p>
<p>With so many sources of information, so many things to tweet, blog and bookmark, it&#8217;s easy to lose a morning in following links, tweets and feeds, and replying to emails. Having a clear idea of what you need to achieve on a particular day, and sometimes switching off other signals in order to complete it, is a hard skill to build &#8211; but an important one.</p>
<p>And so I try to only check email three times per day (start, midday and end). At the end of the day emails that require more time to respond go into my &#8216;Starred items&#8217;, and I check those and respond if I can first thing the next day.</p>
<p>I set limits on the time I spend checking RSS feeds, and on the number of blog posts I write.</p>
<p>I email longer webpages, reports and documents to my Kindle address to be read when I&#8217;m travelling.</p>
<p>I use the Springpad app to create &#8216;To Do&#8217; items that I schedule for future days, taking them out of my head so I can focus on the here and now. And at the start of every day I go through these so that nothing is missed.</p>
<p>Then, I make time to switch off, to remove the phone from my hand, the laptop from my desk (it is set to switch itself off at a particular time every night), and sleep.</p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F06%2F13%2Fhow-i-hacked-my-journalism-workflow-jcarn%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/06/13/how-i-hacked-my-journalism-workflow-jcarn/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Merging Datasets with Common Columns in Google Refine</title>
		<link>http://blog.ouseful.info/2011/05/06/merging-datesets-with-common-columns-in-google-refine/</link>
		<comments>http://blog.ouseful.info/2011/05/06/merging-datesets-with-common-columns-in-google-refine/#comments</comments>
		<pubDate>Fri, 06 May 2011 12:44:32 +0000</pubDate>
		<dc:creator>tonyhirst</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[tony hirst]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=5396</guid>
		<description><![CDATA[It&#8217;s an often encountered situation, but one that can be a pain to address &#8211; merging data from two sources around a common column. Here&#8217;s a way of doing it in Google Refine&#8230; Here are a couple of example datasets to import into separate Google Refine projects if you want to play along, both courtesy [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=5396&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s an often encountered situation, but one that can be a pain to address &#8211; merging data from two sources around a common column. Here&#8217;s a way of doing it in Google Refine&#8230;</p>
<p>Here are a couple of example datasets to import into separate Google Refine projects if you want to play along, both courtesy of the <a href="http://www.guardian.co.uk/news/datablog" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog?referer=');">Guardian data blog</a> (pulled through the Google Spreadsheets to Yahoo pipes proxy <a href="http://blog.ouseful.info/2011/05/04/fragments-gluing-different-data-sources-together-with-google-refine/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/05/04/fragments-gluing-different-data-sources-together-with-google-refine/?referer=');">mentioned here</a>):</p>
<p>- <a href="http://www.guardian.co.uk/news/datablog/2011/mar/25/higher-education-universityfunding" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2011/mar/25/higher-education-universityfunding?referer=');">University fees data</a> (<a href="http://pipes.yahoo.com/pipes/pipe.run?_id=4562a5ec2631ce242ebd25a0756d6381&amp;_render=csv&amp;key=0AonYZs4MzlZbdHVwQlVnd0ZxQkRmQjQ4NzhFSzJ2VVE&amp;q=select+A,B,E" onclick="urchinTracker('/outgoing/pipes.yahoo.com/pipes/pipe.run?_id=4562a5ec2631ce242ebd25a0756d6381_amp_render=csv_amp_key=0AonYZs4MzlZbdHVwQlVnd0ZxQkRmQjQ4NzhFSzJ2VVE_amp_q=select+A_B_E&amp;referer=');">CSV via pipes proxy</a>)</p>
<p>- <a href="http://www.guardian.co.uk/education/datablog/2010/jun/15/university-tables-spreadsheet" onclick="urchinTracker('/outgoing/www.guardian.co.uk/education/datablog/2010/jun/15/university-tables-spreadsheet?referer=');">University HESA stats, 2010</a> (<a href="http://pipes.yahoo.com/pipes/pipe.run?_id=4562a5ec2631ce242ebd25a0756d6381&amp;_render=csv&amp;key=0AonYZs4MzlZbdHB4cHd0eWlZWndDTW93bDNnTmFJS1E&amp;q=select+C,D,E" onclick="urchinTracker('/outgoing/pipes.yahoo.com/pipes/pipe.run?_id=4562a5ec2631ce242ebd25a0756d6381_amp_render=csv_amp_key=0AonYZs4MzlZbdHB4cHd0eWlZWndDTW93bDNnTmFJS1E_amp_q=select+C_D_E&amp;referer=');">CSV via pipes proxy</a>)</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-merge-test.png?w=700&#038;h=462" alt="" width="700" height="462" class="alignnone size-full wp-image-5397" /></p>
<p>We can now merge data from the two projects by creating a new column from values an existing column within one project that are used to index into a similar column in the other project. Looking at the two datasets, both HESA Code and institution/University look like candidates for merging the data. Which should we go with? <strong>I&#8217;d go with the unique identifier (i.e. HESA code in the case) every time&#8230;</strong></p>
<p>First, create a new column:</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-merging-data-from-two-projects-step-1.png?w=700&#038;h=229" alt="" width="700" height="229" class="alignnone size-full wp-image-5398" /></p>
<p>Now do the merge, using the <a href="http://code.google.com/p/google-refine/wiki/GRELOtherFunctions" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/wiki/GRELOtherFunctions?referer=');"><em>cell.cross</em> GREL (Google Refine Expression Language) command</a>. Trivially, and pinching wholesale from the documentation example, we might use the following command to bring in <em>Average Teaching Score</em> data from the second project into the first:</p>
<p><tt>cell.cross("Merge Test B", "HESA code").cells["Average Teaching Score"].value[0]</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2011/05/google-refine-merging-two-projects-step-2.png" onclick="urchinTracker('/outgoing/ouseful.files.wordpress.com/2011/05/google-refine-merging-two-projects-step-2.png?referer=');"><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-merging-two-projects-step-2.png?w=700&#038;h=399" alt="" width="700" height="399" class="alignnone size-full wp-image-5399" /></a></p>
<p>Note that there is a <em>null</em> entry and an error entry. It&#8217;s possible to add a bit of logic to tidy things up a little:</p>
<p><tt>if (value!='null',cell.cross("Merge Test B", "HESA code").cells["Average Teaching Score"].value[0],'')</tt></p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-refined-project-data-merge.png?w=694&#038;h=503" alt="" width="694" height="503" class="alignnone size-full wp-image-5400" /></p>
<p>Here&#8217;s the result:</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-merged-project-data.png?w=700&#038;h=271" alt="" width="700" height="271" class="alignnone size-full wp-image-5401" /></p>
<p><strong>Coping with not quite matching <em>key</em> columns</strong></p>
<p>Another situation that often arises is that you have two columns that almost but don&#8217;t quite match. For example, this dataset has a different name representation that the above datasets (<a href="http://pipes.yahoo.com/pipes/pipe.run?_id=4562a5ec2631ce242ebd25a0756d6381&amp;_render=csv&amp;gid=4&amp;key=toY6cDW4YyEF3h7xuCISlNw&amp;q=select+A,B,C" onclick="urchinTracker('/outgoing/pipes.yahoo.com/pipes/pipe.run?_id=4562a5ec2631ce242ebd25a0756d6381_amp_render=csv_amp_gid=4_amp_key=toY6cDW4YyEF3h7xuCISlNw_amp_q=select+A_B_C&amp;referer=');">Merge Test C</a>):</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-different-keys.png?w=700&#038;h=303" alt="" width="700" height="303" class="alignnone size-full wp-image-5402" /></p>
<p>There are several text processing tools that we can use to try to help us match columns that differ in well-structured ways:</p>
<p><a href="http://ouseful.files.wordpress.com/2011/05/google-refine-trying-to-make-strings-matchable1.png" onclick="urchinTracker('/outgoing/ouseful.files.wordpress.com/2011/05/google-refine-trying-to-make-strings-matchable1.png?referer=');"><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-trying-to-make-strings-matchable1.png?w=700&#038;h=502" alt="" width="700" height="502" class="alignnone size-full wp-image-5404" /></a></p>
<p>In the above case, where am I creating a new column based on the contents of the <em>Institution</em> column in <em>Merge Test C</em>, I&#8217;m using a couple of string processing tricks&#8230; The GREL expression may look complicated, but if you build it up in a stepwise fashion it makes more sense.</p>
<p>For example, the command <tt>replace(value,"this", "that")</tt> will replace occurrences of &#8220;this&#8221; in the string defined by <em>value</em> with &#8220;that&#8221;. If we replace &#8220;this&#8221; with an empty string (&#8221; (two single quotes next to each other) or &#8220;&#8221; (two double quotes next to each other)), we delete it from <em>value</em>: <tt>replace(value,"this", "")</tt></p>
<p>The result of this operation can be embedded in another <em>replace</em> statement: <tt>replace(replace(value,"this", "that"),"that","the other")</tt>. In this case, the first replace will replace occurrences of &#8220;this&#8221; with &#8220;that&#8221;; the result of this operation is passed to the second (outer) <em>replace</em> function, which replaces &#8220;that&#8221; with &#8220;the other&#8221;). Try building up the expression in realtime, and see what happens. First use:<br />
<tt>toLowercase(value)</tt><br />
(what happens?); then:<br />
<tt>replace(toLowercase(value),'the','')</tt><br />
and then:<br />
<tt>replace(replace(toLowercase(value),'the',''),'of','')</tt></p>
<p>The <em>fingerprint()</em> function then separates out the individual words that are left, orders them, and returns the result (<a href="http://code.google.com/p/google-refine/wiki/ClusteringInDepth" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/wiki/ClusteringInDepth?referer=');">more detail</a>). Can you see how this might be used to transform a column that originally contains &#8220;The University of Aberdeen&#8221; to &#8220;aberdeen university&#8221;, which might be a key in another project dataset?</p>
<p>When trying to reconcile data across two different datasets, you may find you need to try to minimise the distance between almost common key columns by creating new columns in each dataset using the above sorts of technique.</p>
<p>Be careful not to create false positive matches though; and also be mindful that not everything will necessarily match up (you may get empty cells when using <em>cell.cross</em>; (to mitigate this, filter rows using a crossed column to find ones where there was no match and see if you can correct them by hand). Even if you don&#8217;t completely successful cross data from one project to another, you might manage to automate the crossing of most of the rows, minimising the amount of hand crafted copying you might have to do to tidy up the real odds and ends&#8230;</p>
<p>So for example, here&#8217;s what I ended up using to create a &#8220;Pure key&#8221; column in <em>Merge Test C</em>:<br />
<tt>fingerprint(replace(replace(replace(toLowercase(value),'the',''),'of',''),'university',''))</tt></p>
<p>And in <em>Merge Test A</em> I create a &#8220;Complementary Key&#8221; column from the <em>University</em> column using <tt>fingerprint(value)</tt></p>
<p>From the <em>Complementary Key</em> column in <em>Merge Test A</em> we call out to <em>Merge Test C</em>: <tt>cell.cross("Merge Test C", "Pure key").cells["UCAS ID"].value[0]</tt></p>
<p><a href="http://ouseful.files.wordpress.com/2011/05/google-refine-clunky-string-match-merge.png" onclick="urchinTracker('/outgoing/ouseful.files.wordpress.com/2011/05/google-refine-clunky-string-match-merge.png?referer=');"><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-clunky-string-match-merge.png?w=700&#038;h=490" alt="" width="700" height="490" class="alignnone size-full wp-image-5405" /></a></p>
<p>Obviously, this approach is far from ideal (and there may be more &#8220;correct&#8221; and/or efficient ways of doing this!) and the process described above is admittedly rather clunky, <em>but</em> it does start to reveal some of what&#8217;s involved in trying to bring data across to one Google Refine project from another using columns that don&#8217;t quite match in the original dataset, although they do (nominally) refer to the same thing, and does provide a useful introductory exercise to some of the really quite powerful text processing commands in Google Refine &#8230;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5396/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5396/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/5396/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5396/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5396/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/5396/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5396/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5396/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/5396/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5396/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5396/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/5396/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5396/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5396/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/5396/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5396/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5396/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/5396/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5396/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5396/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/5396/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=5396&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/05/06/merging-datesets-with-common-columns-in-google-refine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-merge-test.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-clunky-string-match-merge.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-different-keys.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-merged-project-data.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-trying-to-make-strings-matchable1.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-refined-project-data-merge.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-merging-two-projects-step-2.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-merging-data-from-two-projects-step-1.png" length="" type="" />
		</item>
		<item>
		<title>Fragments: Glueing Different Data Sources Together With Google Refine</title>
		<link>http://blog.ouseful.info/2011/05/04/fragments-gluing-different-data-sources-together-with-google-refine/</link>
		<comments>http://blog.ouseful.info/2011/05/04/fragments-gluing-different-data-sources-together-with-google-refine/#comments</comments>
		<pubDate>Wed, 04 May 2011 12:13:31 +0000</pubDate>
		<dc:creator>tonyhirst</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[google refine]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[tony hirst]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=5377</guid>
		<description><![CDATA[I&#8217;m working on a new pattern using Google Refine as the hub for a data fusion experiment pulling together data from different sources. I&#8217;m not sure how it&#8217;ll play out in the end, but here are some fragments&#8230;. Grab Data into Google Refine as CSV from a URL (Proxied Google Spreadsheet Query via Yahoo Pipes) [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=5377&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m working on a new pattern using Google Refine as the hub for a data fusion experiment pulling together data from different sources. I&#8217;m not sure how it&#8217;ll play out in the end, but here are some fragments&#8230;.</p>
<p><strong>Grab Data into Google Refine as CSV from a URL (Proxied Google Spreadsheet Query via Yahoo Pipes)</strong></p>
<p>Firstly, getting data into Google Refine&#8230; I had hoped to be able to pull a subset of data from a Google Spreadsheet into Google Refine by importing CSV data obtained from the spreadsheet via a query generated using my Google Spreadsheet/Guardian datastore explorer (see <a href="http://blog.ouseful.info/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2009/05/18/using-google-spreadsheets-as-a-databace-with-the-google-visualisation-api-query-language/?referer=');">Using Google Spreadsheets as a Database with the Google Visualisation API Query Language</a> for more on this) but it seems that Refine would rather pull the whole of the spreadsheet in (or at least, the whole of the first sheet (I think?!)).</p>
<p>Instead, I had to tweak create a proxy to run the query via a Yahoo Pipe (<a href="http://pipes.yahoo.com/pipes/pipe.info?_id=4562a5ec2631ce242ebd25a0756d6381" onclick="urchinTracker('/outgoing/pipes.yahoo.com/pipes/pipe.info?_id=4562a5ec2631ce242ebd25a0756d6381&amp;referer=');">Google Spreadsheet as a database proxy pipe</a>), which runs the spreadsheet query, gets the data back as CSV, and then relays it forward as JSON:</p>
<p><a href="http://pipes.yahoo.com/pipes/pipe.info?_id=4562a5ec2631ce242ebd25a0756d6381" onclick="urchinTracker('/outgoing/pipes.yahoo.com/pipes/pipe.info?_id=4562a5ec2631ce242ebd25a0756d6381&amp;referer=');"><img src="http://ouseful.files.wordpress.com/2011/05/yahoo-pipe-google-spreadsheet-as-db-proxy.png?w=700&#038;h=517" alt="" width="700" height="517" class="alignnone size-full wp-image-5378" /></a></p>
<p>Here&#8217;s the interface to the pipe &#8211; it requires the Google spreadsheet public key id, the sheet id, and the query&#8230;  The data I&#8217;m using is a spreadsheet maintained by the Guardian datastore containing <a href="http://www.guardian.co.uk/news/datablog/2011/mar/25/higher-education-universityfunding" onclick="urchinTracker('/outgoing/www.guardian.co.uk/news/datablog/2011/mar/25/higher-education-universityfunding?referer=');">UK university fees data</a> (<a href="https://spreadsheets2.google.com/spreadsheet/ccc?hl=en&amp;key=tupBUgwFqBDfB4878EK2vUQ&amp;hl=en#gid=1" onclick="urchinTracker('/outgoing/spreadsheets2.google.com/spreadsheet/ccc?hl=en_amp_key=tupBUgwFqBDfB4878EK2vUQ_amp_hl=en_gid=1&amp;referer=');">spreadsheet</a>.</p>
<p><a href="http://pipes.yahoo.com/pipes/pipe.info?_id=4562a5ec2631ce242ebd25a0756d6381" onclick="urchinTracker('/outgoing/pipes.yahoo.com/pipes/pipe.info?_id=4562a5ec2631ce242ebd25a0756d6381&amp;referer=');"><img src="http://ouseful.files.wordpress.com/2011/05/yahoo-pipe-google-spreadsheet-db-proxy.png?w=700&#038;h=366" alt="" width="700" height="366" class="alignnone size-full wp-image-5379" /></a></p>
<p>You can get the JSON version of the data out directly, or a proxied version of the CSV, <em>as CSV</em> via the <em>More options</em> menu&#8230;</p>
<p>Using the Yahoo Pipes CSV output URL, I <em>can</em> now get a subset of data from a Google Spreadsheet into Google Refine&#8230;</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/pipes-proxy-import-into-google-refine.png?w=700&#038;h=366" alt="" width="700" height="366" class="alignnone size-full wp-image-5380" /></p>
<p>Here&#8217;s the result &#8211; a subset of data as defined by the query:</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-imported-data.png?w=700&#038;h=260" alt="" width="700" height="260" class="alignnone size-full wp-image-5381" /></p>
<p>We can now augment this data with data from another source using Google Refine&#8217;s ability to <a href="http://code.google.com/p/google-refine/wiki/FetchingURLsFromWebServices" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/wiki/FetchingURLsFromWebServices?referer=');">import/fetch data from a URL</a>. In particular, I&#8217;m going to use the Yahoo Pipe described above to grab data from a different spreadsheet and pass it back to Google Refine as a JSON data feed. (Google spreadsheets will publish data as JSON, but the format is a bit clunky&#8230;)</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-generate-column-from-url.png?w=447&#038;h=441" alt="" width="447" height="441" class="alignnone size-full wp-image-5384" /></p>
<p>To test out my query, I&#8217;m going to create a test query in my <a href="http://ouseful.open.ac.uk/datastore/gspreadsheetdb4.php?gsKey=tpxpwtyiYZwCMowl3gNaIKQ#gid=0" onclick="urchinTracker('/outgoing/ouseful.open.ac.uk/datastore/gspreadsheetdb4.php?gsKey=tpxpwtyiYZwCMowl3gNaIKQ_gid=0&amp;referer=');">datastore explorer</a> using the Guardian datastore HESA returns (2010) spreadsheet URL (<em>http://spreadsheets1.google.com/spreadsheet/ccc?hl&amp;key=tpxpwtyiYZwCMowl3gNaIKQ#gid=0</em>) which also has a column containing HESA numbers. (Ultimately, I&#8217;m going to generate a URL that treats the Guardian datastore spreadsheet as a database that lets me get data back from the row with a particular HESA code column value. By using the HESA number column in Google Refine to provide the key, I can generate a URL for each institution that grabs its HESA data from the Datastore HESA spreadsheet.)</p>
<p><a href="http://ouseful.open.ac.uk/datastore/gspreadsheetdb4.php?gsKey=tpxpwtyiYZwCMowl3gNaIKQ#gid=0" onclick="urchinTracker('/outgoing/ouseful.open.ac.uk/datastore/gspreadsheetdb4.php?gsKey=tpxpwtyiYZwCMowl3gNaIKQ_gid=0&amp;referer=');"><img src="http://ouseful.files.wordpress.com/2011/05/datstore-explorer-set-up.png?w=700&#038;h=159" alt="" width="700" height="159" class="alignnone size-full wp-image-5391" /></a></p>
<p>Hit &#8220;Preview Table Headings&#8221;, then scroll down to try out a query:</p>
<p><a href="http://ouseful.open.ac.uk/datastore/gspreadsheetdb4.php?gsKey=tpxpwtyiYZwCMowl3gNaIKQ#gid=0" onclick="urchinTracker('/outgoing/ouseful.open.ac.uk/datastore/gspreadsheetdb4.php?gsKey=tpxpwtyiYZwCMowl3gNaIKQ_gid=0&amp;referer=');"><img src="http://ouseful.files.wordpress.com/2011/05/guardian-datastore-building-up-a-query.png?w=700&#038;h=464" alt="" width="700" height="464" class="alignnone size-full wp-image-5392" /></a></p>
<p>Having tested my query, I can now try the parameters out in the Yahoo pipe. (For example, my query is <em>select D,E,H where D=21</em> and the key is <em>tpxpwtyiYZwCMowl3gNaIKQ</em>; this grabs data from columns <em>D</em>, <em>E</em> and <em>H</em> where the value of <em>D</em> (HESA Code) is 21). Grab the JSON output URL from the pipe, and use this as a template for the URL template in Google Refine. Here&#8217;s the JSON output URL I obtained:</p>
<p><em>http://pipes.yahoo.com/pipes/pipe.run?_id=4562a5ec2631ce242ebd25a0756d6381<br />
&amp;_render=json&amp;key=tpxpwtyiYZwCMowl3gNaIKQ<br />
&amp;q=select+D%2CE%2CH+where+D%3D21</em></p>
<p>Remember, the HESA code I experiment with was <em>21</em>, so this is what we want to replace in the URL with the value from the HESA code column in Google Refine&#8230;</p>
<p>Here&#8217;s how we create the URLs built around/keyed by an appropriate HESA code&#8230;</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-add-column-from-url.png?w=696&#038;h=535" alt="" width="696" height="535" class="alignnone size-full wp-image-5385" /></p>
<p>Google Refine does its thing and fetches the data&#8230;</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-augmented-data.png?w=700&#038;h=370" alt="" width="700" height="370" class="alignnone size-full wp-image-5386" /></p>
<p>Now we process the JSON response to generate some meaningful data columns (for more on how to do this, see <a href="http://blog.ouseful.info/2011/04/12/tech-tips-making-sense-of-json-strings-follow-the-structure/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/04/12/tech-tips-making-sense-of-json-strings-follow-the-structure/?referer=');">Tech Tips: Making Sense of JSON Strings – Follow the Structure</a>).</p>
<p>First say we want to create a new column based on the imported JSON data:</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-creating-a-derived-column.png?w=428&#038;h=401" alt="" width="428" height="401" class="alignnone size-full wp-image-5387" /></p>
<p>Then parse the JSON to extract the data field required in the new column.</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-parsing-json.png?w=689&#038;h=504" alt="" width="689" height="504" class="alignnone size-full wp-image-5388" /></p>
<p>For example, from the HESA data we might extract the <em>Expenditure per student /10</em>:</p>
<p><tt>value.parseJson().value.items[0]["Expenditure per student / 10"]</tt></p>
<p>or the <em>Average Teaching Score</em> (<tt>value.parseJson().value.items[0]["Average Teaching Score"]</tt>):</p>
<p><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-json-parsing.png?w=689&#038;h=505" alt="" width="689" height="505" class="alignnone size-full wp-image-5389" /></p>
<p>And here&#8217;s the result:</p>
<p><a href="http://ouseful.files.wordpress.com/2011/05/google-refine-derived-data.png" onclick="urchinTracker('/outgoing/ouseful.files.wordpress.com/2011/05/google-refine-derived-data.png?referer=');"><img src="http://ouseful.files.wordpress.com/2011/05/google-refine-derived-data.png?w=700&#038;h=310" alt="" width="700" height="310" class="alignnone size-full wp-image-5390" /></a></p>
<p>So to recap:</p>
<p>- we use a Yahoo Pipe to query a Google spreadsheet and get a subset of data from it;<br />
- we take the CSV output from the pipe and use it to create a new Google Refine database;<br />
- we note that the data table in Google Refine has a HESA code column; we also note that the Guardian datastore HESA spreadsheet has a HESA code column;<br />
- we realise we can treat the HESA spreadsheet as a database, and further that we can create a query (prototyped in the datastore explorer) as a URL keyed by HESA code;<br />
- we create a new column based on HESA codes from a generated URL that pulls JSON data from a Yahoo pipe that is querying a Google spreadsheet;<br />
- we parse the JSON to give us a couple of new columns.</p>
<p>And there we have it &#8211; a clunky, but workable, route for merging data from two different Google spreadsheets using Google Refine.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5377/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5377/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/5377/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5377/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5377/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/5377/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5377/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5377/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/5377/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5377/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5377/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/5377/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5377/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5377/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/5377/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5377/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5377/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/5377/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5377/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5377/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/5377/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=5377&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/05/04/fragments-gluing-different-data-sources-together-with-google-refine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-creating-a-derived-column.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-augmented-data.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-json-parsing.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-derived-data.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-parsing-json.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-add-column-from-url.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/guardian-datastore-building-up-a-query.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/datstore-explorer-set-up.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-generate-column-from-url.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/google-refine-imported-data.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/pipes-proxy-import-into-google-refine.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/yahoo-pipe-google-spreadsheet-db-proxy.png" length="" type="" />
<enclosure url="" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/05/yahoo-pipe-google-spreadsheet-as-db-proxy.png" length="" type="" />
		</item>
		<item>
		<title>Tech Tips: Making Sense of JSON Strings – Follow the Structure</title>
		<link>http://blog.ouseful.info/2011/04/12/tech-tips-making-sense-of-json-strings-follow-the-structure/</link>
		<comments>http://blog.ouseful.info/2011/04/12/tech-tips-making-sense-of-json-strings-follow-the-structure/#comments</comments>
		<pubDate>Tue, 12 Apr 2011 10:25:27 +0000</pubDate>
		<dc:creator>tonyhirst</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[tony hirst]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=5252</guid>
		<description><![CDATA[Reading through the Online Journalism blog post on Getting full addresses for data from an FOI response (using APIs), the following phrase &#8211; relating to the composition of some Google Refine code to parse a JSON string from the Google geocoding API &#8211; jumped out at me: &#8220;This took a bit of trial and error&#8230;&#8221; [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=5252&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Reading through the Online Journalism blog post on <a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/?referer=');">Getting full addresses for data from an FOI response (using APIs)</a>, the following phrase &#8211; relating to the composition of some Google Refine code to parse a JSON string from the Google geocoding API &#8211; jumped out at me: &#8220;This took a bit of trial and error&#8230;&#8221;</p>
<p><a href="http://onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/03/18/getting-full-addresses-for-school-data-in-an-foi-response/?referer=');"><img src="http://ouseful.files.wordpress.com/2011/04/google-refnie-took-a-bit-of-trial-and-error.png?w=700&#038;h=358" alt="" width="700" height="358" class="alignnone size-full wp-image-5255" /></a></p>
<p>Why? Two reasons&#8230; Firstly, because it demonstrates a &#8220;have a go&#8221; attitude which you absolutely need to have if you&#8217;re going to appropriate technology and turn it to your own purposes. Secondly, because it maybe (or maybe not&#8230;) hints at a missed trick or two&#8230;</p>
<p>So what trick&#8217;s missing?</p>
<p>Here&#8217;s <a href="http://maps.googleapis.com/maps/api/geocode/json?sensor=false&amp;address=mk7%206aa,uk" onclick="urchinTracker('/outgoing/maps.googleapis.com/maps/api/geocode/json?sensor=false_amp_address=mk7_206aa_uk&amp;referer=');">an example</a> of the sort of thing you get back from the Google Geocoder:</p>
<blockquote><p><em>{  &#8220;status&#8221;: &#8220;OK&#8221;, &#8220;results&#8221;: [ { "types": [ "postal_code" ], &#8220;formatted_address&#8221;: &#8220;Milton Keynes, Buckinghamshire MK7 6AA, UK&#8221;, &#8220;address_components&#8221;: [ { "long_name": "MK7 6AA", "short_name": "MK7 6AA", "types": [ "postal_code" ] }, { &#8220;long_name&#8221;: &#8220;Milton Keynes&#8221;, &#8220;short_name&#8221;: &#8220;Milton Keynes&#8221;, &#8220;types&#8221;: [ "locality", "political" ] }, { &#8220;long_name&#8221;: &#8220;Buckinghamshire&#8221;, &#8220;short_name&#8221;: &#8220;Buckinghamshire&#8221;, &#8220;types&#8221;: [ "administrative_area_level_2", "political" ] }, { &#8220;long_name&#8221;: &#8220;Milton Keynes&#8221;, &#8220;short_name&#8221;: &#8220;Milton Keynes&#8221;, &#8220;types&#8221;: [ "administrative_area_level_2", "political" ] }, { &#8220;long_name&#8221;: &#8220;United Kingdom&#8221;, &#8220;short_name&#8221;: &#8220;GB&#8221;, &#8220;types&#8221;: [ "country", "political" ] }, { &#8220;long_name&#8221;: &#8220;MK7&#8243;, &#8220;short_name&#8221;: &#8220;MK7&#8243;, &#8220;types&#8221;: [ "postal_code_prefix", "postal_code" ] } ], &#8220;geometry&#8221;: { &#8220;location&#8221;: {  &#8220;lat&#8221;: 52.0249136,  &#8220;lng&#8221;: -0.7097474 }, &#8220;location_type&#8221;: &#8220;APPROXIMATE&#8221;, &#8220;viewport&#8221;: {  &#8220;southwest&#8221;: { &#8220;lat&#8221;: 52.0193722, &#8220;lng&#8221;: -0.7161451  },  &#8220;northeast&#8221;: { &#8220;lat&#8221;: 52.0300728, &#8220;lng&#8221;: -0.6977000  } }, &#8220;bounds&#8221;: {  &#8220;southwest&#8221;: { &#8220;lat&#8221;: 52.0193722, &#8220;lng&#8221;: -0.7161451  },  &#8220;northeast&#8221;: { &#8220;lat&#8221;: 52.0300728, &#8220;lng&#8221;: -0.6977000  } } }  } ] }</em></p></blockquote>
<p>The data represents a Javascript object (JSON = JavaScript Object Notation) and as such has a standard form, a hierarchical form.</p>
<p>Here&#8217;s another way of writing the <em>same</em> object code, only this time laid out in a way that reveals the structure of the object:</p>
<pre>{
  &quot;status&quot;: &quot;OK&quot;,
  &quot;results&quot;: [ {
    &quot;types&quot;: [ &quot;postal_code&quot; ],
    &quot;formatted_address&quot;: &quot;Milton Keynes, Buckinghamshire MK7 6AA, UK&quot;,
    &quot;address_components&quot;: [ {
      &quot;long_name&quot;: &quot;MK7 6AA&quot;,
      &quot;short_name&quot;: &quot;MK7 6AA&quot;,
      &quot;types&quot;: [ &quot;postal_code&quot; ]
    }, {
      &quot;long_name&quot;: &quot;Milton Keynes&quot;,
      &quot;short_name&quot;: &quot;Milton Keynes&quot;,
      &quot;types&quot;: [ &quot;locality&quot;, &quot;political&quot; ]
    }, {
      &quot;long_name&quot;: &quot;Buckinghamshire&quot;,
      &quot;short_name&quot;: &quot;Buckinghamshire&quot;,
      &quot;types&quot;: [ &quot;administrative_area_level_2&quot;, &quot;political&quot; ]
    }, {
      &quot;long_name&quot;: &quot;Milton Keynes&quot;,
      &quot;short_name&quot;: &quot;Milton Keynes&quot;,
      &quot;types&quot;: [ &quot;administrative_area_level_2&quot;, &quot;political&quot; ]
    }, {
      &quot;long_name&quot;: &quot;United Kingdom&quot;,
      &quot;short_name&quot;: &quot;GB&quot;,
      &quot;types&quot;: [ &quot;country&quot;, &quot;political&quot; ]
    }, {
      &quot;long_name&quot;: &quot;MK7&quot;,
      &quot;short_name&quot;: &quot;MK7&quot;,
      &quot;types&quot;: [ &quot;postal_code_prefix&quot;, &quot;postal_code&quot; ]
    } ],
    &quot;geometry&quot;: {
      &quot;location&quot;: {
        &quot;lat&quot;: 52.0249136,
        &quot;lng&quot;: -0.7097474
      },
      &quot;location_type&quot;: &quot;APPROXIMATE&quot;,
      &quot;viewport&quot;: {
        &quot;southwest&quot;: {
          &quot;lat&quot;: 52.0193722,
          &quot;lng&quot;: -0.7161451
        },
        &quot;northeast&quot;: {
          &quot;lat&quot;: 52.0300728,
          &quot;lng&quot;: -0.6977000
        }
      },
      &quot;bounds&quot;: {
        &quot;southwest&quot;: {
          &quot;lat&quot;: 52.0193722,
          &quot;lng&quot;: -0.7161451
        },
        &quot;northeast&quot;: {
          &quot;lat&quot;: 52.0300728,
          &quot;lng&quot;: -0.6977000
        }
      }
    }
  } ]
}</pre>
<h2>Making Sense of the Notation</h2>
<p>At its simplest, the structure has the form: {&#8220;attribute&#8221;:&#8221;value&#8221;}</p>
<p>If we parse this object into the <em>jsonObject</em>, we can access the value of the attribute as <em>jsonObject.attribute</em> or <em>jsonObject["attribute"]</em>. The first style of notation is called a <em>dot notation</em>.</p>
<p>We can add more attribute:value pairs into the object by separating them with commas: <em>a={&#8220;attr&#8221;:&#8221;val&#8221;,&#8221;attr2&#8243;:&#8221;val2&#8243;}</em>  and address them (that is, refer to them) uniquely: <em>a.attr</em>, for example, or <em>a["attr2"]</em>.</p>
<p>Try it out for yourself&#8230; Copy and past the following into your browser address bar (where the URL goes) and hit return (i.e. &#8220;go to&#8221; that &#8220;location&#8221;):</p>
<p><tt>javascript:a={"attr":"val","attr2":"val2"}; alert(a.attr);alert(a["attr2"])</tt></p>
<p>(As an aside, what might you learn from this? Firstly, you can &#8220;run&#8221; javascript in the browser via the location bar. Secondly, the javascript command <em>alert()</em> pops up an alert box:-)</p>
<p>Note that the value of an attribute might be another object.</p>
<p><em>obj={ attrWithObjectValue: { &#8220;childObjAttr&#8221;:&#8221;foo&#8221; } }</em></p>
<p>Another thing we can see in the Google geocoder JSON code are square brackets. These define an <em>array</em> (one might also think of it as an ordered list). Items in the list are address numerically. So for example, given:</p>
<p><em>arr[ "item1", "item2", "item3" ]</em></p>
<p>we can locate &#8220;item1&#8243; as <em>arr[0]</em> and &#8220;item3&#8243; as <em>arr[2]</em>. (Note: the index count in the square brackets starts at 0.) Try it in the browser&#8230; (for example, <tt>javascript:list=["apples","bananas","pears"]; alert( list[1] );</tt>).</p>
<p>Arrays can contain objects too:</p>
<p><em>list=[ "item1", {"innerObjectAttr":"innerObjVal"  } ]</em></p>
<p>Can you guess how to get to the <em>innerObjVal</em>? Try this in the browser location bar:</p>
<p><tt>javascript: list=[ "item1", { "innerObjectAttr":"innerObjVal"  } ]; alert( list[1].innerObjectAttr )</tt></p>
<h2>Making Life Easier</h2>
<p>Hopefully, you&#8217;ll now have a sense that there&#8217;s structure in a JSON object, and that that (<em>sic</em>) structure is what we rely on if we want to cut down on the &#8220;trial an error&#8221; when parsing such things. To make life easier, we can also use &#8220;tree widgets&#8221; to display the hierarchical JSON object in a way that makes it far easier to see how to construct the dotted path that leads to the data value we want.</p>
<p>A tool I have appropriated for previewing JSON objects is <a href="http://pipes.yahoo.com" onclick="urchinTracker('/outgoing/pipes.yahoo.com?referer=');">Yahoo Pipes</a>. Rather than necessarily using Pipes to build anything, I simply make use of it as a JSON viewer, loading JSON into the pipe from a URL via the <em>Fetch Data</em> block, and then previewing the result:</p>
<p><a href="http://pipes.yahoo.com" onclick="urchinTracker('/outgoing/pipes.yahoo.com?referer=');"><img src="http://ouseful.files.wordpress.com/2011/04/yahoo-pipes-as-a-json-previewer1.png?w=700&#038;h=455" alt="" width="700" height="455" class="alignnone size-full wp-image-5259" /></a></p>
<p>Another tool (and one I&#8217;ve just discovered) is an Air application called <a href="http://code.google.com/p/json-pad/" onclick="urchinTracker('/outgoing/code.google.com/p/json-pad/?referer=');">JSON-Pad</a>. You can paste in JSON code, or pull it in from a URL, and then preview it again via a tree widget:</p>
<p><a href="http://code.google.com/p/json-pad/" onclick="urchinTracker('/outgoing/code.google.com/p/json-pad/?referer=');"><img src="http://ouseful.files.wordpress.com/2011/04/json-pad.png?w=652&#038;h=693" alt="" width="652" height="693" class="alignnone size-full wp-image-5257" /></a></p>
<p>Clicking on one of the results in the tree widget provides a crib to the path&#8230;</p>
<h2>Summary</h2>
<p>Getting to grips with writing addresses into JSON objects helps if you have some idea of the structure of a JSON object. Tree viewers make the structure of an object explicit. By walking down the tree to the part of it you want, and &#8220;dotting&#8221; together* the nodes/attributes you select as you do so, you can quickly and easily construct the path you need.</p>
<p>* If the JSON attributes have spaces or non-alphanumeric characters in them, use the <em>obj["attr"]</em> notation rather than the dotted <em>obj.attr</em> notation&#8230;</p>
<p>PS Via my feeds today, though something I had bookmarked already, this <a href="http://www.shancarter.com/data_converter/index.html" onclick="urchinTracker('/outgoing/www.shancarter.com/data_converter/index.html?referer=');">Data Converter</a> tool may be helpful in going the other way&#8230; (Disclaimer: I haven&#8217;t tried using it&#8230;)</p>
<p><a href="http://www.shancarter.com/data_converter/index.html" onclick="urchinTracker('/outgoing/www.shancarter.com/data_converter/index.html?referer=');"><img src="http://ouseful.files.wordpress.com/2011/04/data-converter.png?w=700&#038;h=313" alt="" width="700" height="313" class="alignnone size-full wp-image-5260" /></a></p>
<p>If you know of any other related tools, please feel free to post a link to them in the comments:-)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/5252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/5252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/5252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/5252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/5252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/5252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5252/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/5252/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/5252/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=5252&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2011/04/12/tech-tips-making-sense-of-json-strings-follow-the-structure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://ouseful.files.wordpress.com/2011/04/yahoo-pipes-as-a-json-previewer1.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/json-pad.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/data-converter.png" length="" type="" />
<enclosure url="" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/google-refnie-took-a-bit-of-trial-and-error.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/google-refnie-took-a-bit-of-trial-and-error.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/yahoo-pipes-as-a-json-previewer1.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/data-converter.png" length="" type="" />
<enclosure url="http://ouseful.files.wordpress.com/2011/04/json-pad.png" length="" type="" />
<enclosure url="" length="" type="" />
		</item>
		<item>
		<title>Data for journalists: understanding XML and RSS</title>
		<link>http://onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/</link>
		<comments>http://onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/#comments</comments>
		<pubDate>Mon, 11 Apr 2011 13:00:27 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[online journalism]]></category>
		<category><![CDATA[f1]]></category>
		<category><![CDATA[feedburner]]></category>
		<category><![CDATA[firebug]]></category>
		<category><![CDATA[MySociety]]></category>
		<category><![CDATA[parliament parser]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[Yahoo! Pipes]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=14110</guid>
		<description><![CDATA[If you are working with data chances are that sooner or later you will come across XML &#8211; or if you don&#8217;t, then, well, you should do. Really. There are some very useful resources in XML format &#8211; and in RSS, which is based on XML &#8211; from ongoing feeds and static reference files to XML that is provided in<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/?referer=');">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F04%2F11%2Fdata-for-journalists-understanding-xml-and-rss%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F04_2F11_2Fdata-for-journalists-understanding-xml-and-rss_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F04%2F11%2Fdata-for-journalists-understanding-xml-and-rss%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<div>
<p>If you are working with data chances are that sooner or later you will come across XML &#8211; or if you don&#8217;t, then, well, you should do. Really.</p>
<p>There are some very useful resources in XML format &#8211; and in RSS, which is based on XML &#8211; from ongoing feeds and static reference files to XML that is provided in response to a question that you ask. All of that is for future posts &#8211; <em>this post</em> attempts to explain how XML is relevant to journalism, and how it is made up.</p>
<h2>What is XML?</h2>
<p>XML is a language which is used for describing information, which makes it particularly relevant to journalists &#8211; especially when it comes to interrogating large sets of data.</p>
<p>If you wanted to know how many doctors were privately educated, or what the most common score was in the Premiership last season, or which documents were authored by a particular civil servant, then XML may be useful to you.<span id="more-14110"></span></p>
<p>(That said, this post doesn&#8217;t show you how to do any of that &#8211; it is mainly aimed at explaining how XML works so that you can begin to think about those possibilities.)</p>
<p>XML stands for &#8220;eXtensible Markup Language&#8221;. It&#8217;s the &#8216;markup&#8217; bit which is key: XML &#8216;marks up&#8217; information as being something in particular: relating to a particular date, for example; or a particular person; or referring to a particular location.</p>
<p>For example, a snippet of XML like this -</p>
<pre>&lt;city&gt;Paris&lt;/city&gt;</pre>
<pre>&lt;country&gt;France&lt;/country&gt;</pre>
<p>- tells you that the &#8216;Paris&#8217; in this instance is a city, rather than a celebrity. And that it&#8217;s in France, not Texas.</p>
<p>That makes it easier for you to filter out information that isn&#8217;t relevant, or combine particular bits of information with data from elsewhere.</p>
<p>For example, if an XML file contains information on authors, you can filter out all but those by the person you&#8217;re interested in; if it contains publication dates, you can use that to plot associated content on a timeline.</p>
<p>Most usefully, if you have a set of data yourself such as a spreadsheet, you can pull related data from a relevant XML file. If your spreadsheet contains football teams and the XML provides locations, images, and history for each, then you can pull that in to create a fuller picture. If it contains addresses, there are <a href="http://www.uk-postcodes.com/api.php" onclick="urchinTracker('/outgoing/www.uk-postcodes.com/api.php?referer=');">services that will give you XML files with the constituency for those postcodes</a>.</p>
<h2>What is RSS?</h2>
<p>RSS is a whole family of formats which are essentially based on XML &#8211; so they are structured in the same way, containing &#8216;markup&#8217; that might tell you the author, publication date, location or other details about the information it relates to.</p>
<p>There is <a href="http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html" onclick="urchinTracker('/outgoing/www.xml.com/pub/a/2002/12/18/dive-into-xml.html?referer=');">a lot of variation between different versions of RSS</a>, but the main thing for the purposes of this post is that the various versions of RSS, and XML, share a structure which journalists can use if they know how to.</p>
<p>Which version isn&#8217;t particularly important: as long as you understand the principles, you can adapt what you do to suit the document or feed you&#8217;re working with.</p>
<h2>Looking at XML and RSS</h2>
<p>XML documents (for simplicity&#8217;s sake I&#8217;ll mostly just refer to &#8216;XML&#8217; for the rest of this post, although I&#8217;m talking about both XML and RSS) contain two things that are of interest to us: content, and information about the content (&#8216;markup&#8217;).</p>
<p>Information about the content is contained within tags in <a href="http://en.wikipedia.org/wiki/Bracket#Angle_brackets_or_chevrons_.E2.9F.A8_.E2.9F.A9" onclick="urchinTracker('/outgoing/en.wikipedia.org/wiki/Bracket_Angle_brackets_or_chevrons_.E2.9F.A8_.E2.9F.A9?referer=');">angle brackets (also known as chevrons</a>): &#8216;&lt;&#8217; and &#8216;&gt;&#8217;</p>
<p>For example: &lt;name&gt; or &lt;pubDate&gt; (publication date).</p>
<p>The tag is followed by the content itself, and a closing tag that has a forward slash, e.g. &lt;/name&gt; or &lt;/pubDate&gt;, so one line might look like this:</p>
<pre>&lt;name&gt;Paul Bradshaw&lt;/name&gt;</pre>
<p>At this point it&#8217;s useful to have some XML or RSS in front of you. For a random example go to <a href="http://www.scotland.gov.uk/RSS/News/Latest" onclick="urchinTracker('/outgoing/www.scotland.gov.uk/RSS/News/Latest?referer=');">the RSS feed for the Scottish Government News</a>.</p>
<p>To see the code right-click on that page and select <strong>View Source</strong> or similar &#8211; Firefox is worth using if another browser does not work; the <a href="http://getfirebug.com/" onclick="urchinTracker('/outgoing/getfirebug.com/?referer=');">Firebug extension</a> also helps. (Note: if the feed is generated by Feedburner this won&#8217;t work: look for the &#8216;<strong>View Feed XML</strong>&#8216; button in the middle right area or add <strong>?format=xml</strong> to the feed URL).</p>
<p>What you should see will include the following:</p>
<pre>&lt;item&gt;
&lt;title&gt;Manufactured Exports Q4 2010&lt;/title&gt;
&lt;link&gt;http://www.scotland.gov.uk/News/Releases/2011/04/06100351&lt;/link&gt;
&lt;description&gt;A National Statistics publication for Scotland.&lt;/description&gt;
&lt;guid isPermaLink="true"&gt;http://www.scotland.gov.uk/News/Releases/2011/04/06100351&lt;/guid&gt;
&lt;pubDate&gt;Wed, 06 Apr 2011 00:00:00 GMT&lt;/pubDate&gt;
&lt;/item&gt;</pre>
<p>In the RSS feed itself this doesn&#8217;t start until line 14 (the first 13 lines are used to provide information about the feed as a whole, such as the version of RSS, title, copyright etc).</p>
<p>But from line 14 onwards this pattern repeats itself for a number of different &#8216;items&#8217;.</p>
<p>As you can see, each item has a title, a link, a description, a permalink, and a publication date. These are known as child elements (the item is the parent, or the &#8216;root element&#8217;).</p>
<p>More journalistic examples can be found at Mercedes GP&#8217;s <a href="http://www3.mercedes-gp.com/cmsmedia/adrivo/xml/championship/championship.xml" onclick="urchinTracker('/outgoing/www3.mercedes-gp.com/cmsmedia/adrivo/xml/championship/championship.xml?referer=');">XML file of the latest F1 Championship Standings</a> (see <a href="http://blog.ouseful.info/2011/04/10/data-liberation-formula-one-press-release-timing-sheets/" onclick="urchinTracker('/outgoing/blog.ouseful.info/2011/04/10/data-liberation-formula-one-press-release-timing-sheets/?referer=');">the PS at the end of Tony Hirst&#8217;s post</a> for an explanation of how this is structured), and <a href="http://ukparse.kforge.net/parlparse/" onclick="urchinTracker('/outgoing/ukparse.kforge.net/parlparse/?referer=');">MySociety&#8217;s Parliament Parser</a>, which provides XML files on all parts of government, from MPs and peers to debates and constituencies, going back over a decade. Look at the <a href="http://ukparse.kforge.net/svn/parlparse/members/ministers.xml" onclick="urchinTracker('/outgoing/ukparse.kforge.net/svn/parlparse/members/ministers.xml?referer=');">Ministers XML file</a> in Firefox and scroll down until you get to the first item tagged &lt;ministerofficegroup&gt;. Within each of those are details on ministerial positions. As the Parliament Parser page explains:</p>
<blockquote><p>&#8220;Each one has a date range, the MP or Lord became a minister at some time on the start day, and stopped being one at some time on the end day. The matchid field is one sample MP or Lord office which that person also held. Alternatively, use the people.xml file to find out which person held the ministerial post.&#8221;</p></blockquote>
<p>You&#8217;ll notice from that quote that some parts of the XML require cross-referencing to provide extra details. That&#8217;s where XML becomes very useful.</p>
<h2>Using it in practice: working with XML in Yahoo! Pipes</h2>
<p>Yahoo! Pipes provides a good introduction in working with data in XML or RSS. You&#8217;ll need to sign up at <a href="http://Pipes.Yahoo.com" onclick="urchinTracker('/outgoing/Pipes.Yahoo.com?referer=');">Pipes.Yahoo.com</a> and click on &#8216;<strong>Create a Pipe</strong>&#8216;.</p>
<p>You&#8217;ll now be editing a new project. On the left hand column are various &#8216;modules&#8217; you can use. Click on &#8216;<strong>Sources</strong>&#8216; to expand it, and click and drag &#8216;<strong>Fetch Feed&#8217;</strong> onto the graph paper-style canvas.</p>
<div>
<dl>
<dt><a rel="attachment wp-att-14113" href="http://onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/pipes_-editing-_health_rss_filter_/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/pipes_-editing-_health_rss_filter_/?referer=');"><img src="http://onlinejournalismblog.com/files/2011/04/Pipes_-editing-_Health_RSS_filter_-400x87.jpg" alt="The 'Fetch Feed' module" width="400" height="87" /></a></dt>
<dd>The &#8216;Fetch Feed&#8217; module</dd>
</dl>
</div>
<p>Copy the address of your RSS feed and paste it into the &#8216;Fetch Feed&#8217; box. I&#8217;m using <a href="http://feeds.feedburner.com/info4localallhealthwellbeingandcare?format=xml" onclick="urchinTracker('/outgoing/feeds.feedburner.com/info4localallhealthwellbeingandcare?format=xml&amp;referer=');">this feed</a> of <a href="http://feeds.feedburner.com/info4localallhealthwellbeingandcare" onclick="urchinTracker('/outgoing/feeds.feedburner.com/info4localallhealthwellbeingandcare?referer=');">Health information from the UK government</a>.</p>
<p>If you now click on the module so that it turns orange, you should be able (after a few moments) see that feed in the Debugger window at the bottom of the screen.</p>
<p>Click on the handle in the middle to pull it up and see more, and click on the arrows on the left to drill down to the &#8216;nested&#8217; data within each item.</p>
<div>
<dl>
<dt><a rel="attachment wp-att-14114" href="http://onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/pipes_-editing-_health_rss_filter_-1/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/pipes_-editing-_health_rss_filter_-1/?referer=');"><img src="http://onlinejournalismblog.com/files/2011/04/Pipes_-editing-_Health_RSS_filter_-1-400x206.jpg" alt="Drilling down into the data within an RSS feed" width="400" height="206" /></a></dt>
<dd>Drilling down into the data within an RSS feed</dd>
</dl>
</div>
<p>As you drill down you can see elements of data you can filter. In this case, we&#8217;ll use &#8216;<strong>region</strong>&#8216;.</p>
<p>To filter the feed based on this we need the Filter module. On the left hand side click on &#8216;<strong>Operators</strong>&#8216; to expand that, and then drag the &#8216;<strong>Filter</strong>&#8216; module into the canvas.</p>
<p>Now drag a pipe from the circle at the bottom of the &#8216;Fetch Feed&#8217; module to the top of the &#8216;Filter&#8217; module.</p>
<div>
<dl>
<dt><a rel="attachment wp-att-14115" href="http://onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/pipes_-editing-_health_rss_filter_-2/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/pipes_-editing-_health_rss_filter_-2/?referer=');"><img src="http://onlinejournalismblog.com/files/2011/04/Pipes_-editing-_Health_RSS_filter_-2-300x121.jpg" alt="Drag a pipe from Fetch Feed to Filter" width="300" height="121" /></a></dt>
<dd>Drag a pipe from Fetch Feed to Filter</dd>
</dl>
</div>
<p>Wait a moment for the &#8216;Filter&#8217; module to work out what data the RSS feed contains. Then use the drop down menus so that it reads &#8220;<strong>Permit</strong> items that match <strong>all</strong> of the following&#8221;.</p>
<p>The next box determines which piece of data you will filter on. If you click on the drop-down here you should see all the pieces of data that are associated with each item.</p>
<div>
<dl>
<dt><a rel="attachment wp-att-14116" href="http://onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/pipes_-editing-_health_rss_filter_-3/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/pipes_-editing-_health_rss_filter_-3/?referer=');"><img src="http://onlinejournalismblog.com/files/2011/04/Pipes_-editing-_Health_RSS_filter_-3-400x202.jpg" alt="Select the data you are filtering on" width="400" height="202" /></a></dt>
<dd>Select the data you are filtering on</dd>
</dl>
</div>
<p>We&#8217;re going to select &#8216;region&#8217;, and say that we only want to permit items where &#8216;region&#8217; contains &#8216;North West&#8217;. If any of these don&#8217;t make any sense, look at the original RSS feed again to see what they contain.</p>
<p>Now drag a final pipe from the bottom of the &#8216;Filter&#8217; module to the top of &#8216;<strong>Pipe output</strong>&#8216; at the bottom of the canvas. If you click on either you should be able to see in the Debugger that now only those items relating specifically to the North West are displayed.</p>
<p>If you wanted to you could now save this and click &#8216;<strong>Run Pipe</strong>&#8216; to see the results. Once you do you should notice options to &#8216;<strong>Get as RSS</strong>&#8216; &#8211; this would allow you to subscribe to this feed yourself or publish it on a website or Twitter account. There&#8217;s also &#8216;Get as JSON&#8217; which is a whole other story &#8211; I&#8217;ll cover JSON in a future post.</p>
<p>You can <a href="http://pipes.yahoo.com/pipes/pipe.info?_id=0e8517f82fb1518ba16ba97e40dea113" onclick="urchinTracker('/outgoing/pipes.yahoo.com/pipes/pipe.info?_id=0e8517f82fb1518ba16ba97e40dea113&amp;referer=');">see this pipe in action &#8211; and clone it yourself &#8211; here</a>.</p>
<p>Oh, and a sidenote: if you wanted to grab an XML file in Yahoo! Pipes rather than an RSS feed, you would use &#8216;Fetch Data&#8217; instead of &#8216;Fetch Feed&#8217;.</p>
<h2>Just the start</h2>
<p>There&#8217;s much more you can do here. Some suggestions for next steps:</p>
<ul>
<li>Try using the <strong>Text Input</strong> module in Yahoo! Pipes, dragging a line from that to where you typed &#8216;North West&#8217;, for example</li>
<li>Try playing with the <a href="http://seogadget.co.uk/playing-around-with-importxml-in-google-spreadsheets/" onclick="urchinTracker('/outgoing/seogadget.co.uk/playing-around-with-importxml-in-google-spreadsheets/?referer=');">importXML formula in Google Spreadsheets</a></li>
<li>Try using matching data in a spreadsheet with data from an XML file <a href="http://code.google.com/p/google-refine/wiki/Recipes" onclick="urchinTracker('/outgoing/code.google.com/p/google-refine/wiki/Recipes?referer=');">using Google Refine</a>.</li>
</ul>
<p>Those are for future posts. For now I just want to demonstrate how XML works to add information-about-information which you can then use to search, filter, and combine data.</p>
<p>And it&#8217;s not just an esoteric language that is used by a geeky few as part of their newsgathering: journalists at Sky News, The Guardian and The Financial Times &#8211; to name just a few &#8211; all use this as a routine part of publishing, because it provides a way to dynamically update elements within a larger story without having to update the whole thing from scratch &#8211; for example by updating casualty numbers or new dates on a timeline.</p>
<p>And while I&#8217;m at it, if you have any examples of XML being used in journalism for either newsgathering or publishing, let me know.</p>
</div>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F04%2F11%2Fdata-for-journalists-understanding-xml-and-rss%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/04/11/data-for-journalists-understanding-xml-and-rss/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How to create a Facebook news feed for a journalist (or anything else)</title>
		<link>http://onlinejournalismblog.com/2011/03/28/how-to-create-a-facebook-news-feed-for-a-journalist-or-anything-else/</link>
		<comments>http://onlinejournalismblog.com/2011/03/28/how-to-create-a-facebook-news-feed-for-a-journalist-or-anything-else/#comments</comments>
		<pubDate>Mon, 28 Mar 2011 14:01:45 +0000</pubDate>
		<dc:creator>Paul Bradshaw</dc:creator>
				<category><![CDATA[online journalism]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[facebook pages]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[rss graffiti]]></category>

		<guid isPermaLink="false">http://onlinejournalismblog.com/?p=13464</guid>
		<description><![CDATA[I&#8217;ve been enjoying The Independent&#8217;s individual Facebook feeds for journalists, football teams and other &#8216;entities&#8217; of their news coverage. So much so that I wanted the work of journalists on other news organisations to be brought to me in the same way. But other newspapers are not offering the same functionality, so I thought I&#8217;d do it myself. Here&#8217;s how<br /><span class="read_more"><a href="http://onlinejournalismblog.com/2011/03/28/how-to-create-a-facebook-news-feed-for-a-journalist-or-anything-else/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/03/28/how-to-create-a-facebook-news-feed-for-a-journalist-or-anything-else/?referer=');">Read more...</a></span>]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F03%2F28%2Fhow-to-create-a-facebook-news-feed-for-a-journalist-or-anything-else%2F" onclick="urchinTracker('/outgoing/api.tweetmeme.com/share?url=http_3A_2F_2Fonlinejournalismblog.com_2F2011_2F03_2F28_2Fhow-to-create-a-facebook-news-feed-for-a-journalist-or-anything-else_2F&amp;referer=');"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F03%2F28%2Fhow-to-create-a-facebook-news-feed-for-a-journalist-or-anything-else%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p><img src="https://img.skitch.com/20110328-gpshtk6bjfwsx8cyibnknx9t85.jpg" alt="James Ball articles Facebook page" width="496" height="302" /></p>
<p>I&#8217;ve been enjoying <a href="http://onlinejournalismblog.com/2011/01/12/the-independents-facebook-revolution/" onclick="urchinTracker('/outgoing/onlinejournalismblog.com/2011/01/12/the-independents-facebook-revolution/?referer=');">The Independent&#8217;s individual Facebook feeds</a> for journalists, football teams and other &#8216;entities&#8217; of their news coverage. So much so that I wanted the work of journalists on other news organisations to be brought to me in the same way.</p>
<p>But other newspapers are not offering the same functionality, so I thought I&#8217;d do it myself. Here&#8217;s how you can do it too:</p>
<h2>Create a Facebook page for the journalist</h2>
<p>Go to the <a href="http://www.facebook.com/pages/" onclick="urchinTracker('/outgoing/www.facebook.com/pages/?referer=');">Facebook Pages page</a> and click &#8216;<strong>Create Page</strong>&#8216; in the upper right corner.<span id="more-13464"></span></p>
<p>You then need to choose one of 6 categories. Pick &#8216;<strong>Artist, band or public figure</strong>&#8216; and choose &#8216;<strong>Journalist</strong>&#8216; from the drop-down menu that then appears. Type their name in the next box, and tick the box agreeing to the terms. Then click &#8216;<strong>Get started</strong>&#8216;.</p>
<p>You&#8217;ll now be presented with your page. You can add an image and make various other customisations. But the main thing we need to do is to set it up so that the page automatically publishes updates whenever the journalist publishes a new article.</p>
<h2>Install an RSS app on that page</h2>
<p>Click &#8216;<strong>Edit page</strong>&#8216; in the upper right corner of the page. On the left hand side of the page you are taken to will be a series of further options. Click on &#8216;<strong>Apps</strong>&#8216;.</p>
<p>At the bottom of the Apps page is an option to &#8216;<strong>Browse more applications</strong>&#8216;. Click on this to get to the <a href="https://www.facebook.com/apps/directory.php" onclick="urchinTracker('/outgoing/www.facebook.com/apps/directory.php?referer=');">Apps Directory</a>.</p>
<p>You now need to find an app which will publish an RSS feed to the wall of your Page. Instead of browsing use the search box to look for &#8216;RSS&#8217;.</p>
<p style="text-align: left">There&#8217;ll be a number of possibilities on the results page. I used <a href="http://apps.facebook.com/rssgraffiti/?fb_page_id=146632022068073" onclick="urchinTracker('/outgoing/apps.facebook.com/rssgraffiti/?fb_page_id=146632022068073&amp;referer=');">RSS Graffiti</a>.</p>
<p><em>Do not click</em> on &#8216;Go to app&#8217; &#8211; instead, in the left hand column should be an option to &#8216;<strong>Add to my page&#8217;</strong>. Click on this, and select your page from the list that then appears. The app should now be added to your page &#8211; but you will still need to customise the settings.</p>
<h2>Find the RSS feed for your journalist &#8211; or create one</h2>
<p>Some news organisations provide individual RSS feeds for every journalist &#8211; try looking on the journalist&#8217;s profile page (if they have one) or one of their articles to see if you can find a feed (also look for an orange RSS icon in the address bar).</p>
<p>If that isn&#8217;t the case, try <a href="http://journalisted.com/" onclick="urchinTracker('/outgoing/journalisted.com/?referer=');">Journalisted</a>, which has RSS feeds for most journalists on national newspapers and broadcasters.</p>
<p>And if they&#8217;re not featured there, you could try using Google News to generate an RSS feed for you (do the search first then look for the RSS feed at the bottom of the page), or even use Yahoo! Pipes to filter articles by a particular journalist from a general news feed (that&#8217;s another tutorial to write).</p>
<h2>Edit the page&#8217;s RSS app settings</h2>
<p>Once you&#8217;ve got your RSS feed, copy the address (it will look something like http://journalisted.com/dominic-casciani/rss or http://www.guardian.co.uk/profile/jamesball/rss).</p>
<p>Then go to your application page (either by going back to the Facebook Page you created, clicking on &#8216;Edit Page&#8217;, then &#8216;Apps&#8217; and then &#8216;Go to application&#8217; under the appropriate one &#8211; or by finding the application page again via the <a href="https://www.facebook.com/apps/directory.php" onclick="urchinTracker('/outgoing/www.facebook.com/apps/directory.php?referer=');">Apps Directory</a>).</p>
<p>On the left hand side it will say which of your Facebook pages it has been activated on. Click on the one you need to edit it for &#8211; in the case of RSS Graffiti it will say <strong>Action required: Assign missing permissions</strong>. Click to activate it, and then &#8216;<strong>Allow</strong>&#8216; to give it permission to post to the page&#8217;s wall.</p>
<p>You&#8217;ll be taken back to the app where you can now add the RSS feed you want to publish to this page. In RSS Graffiti&#8217;s case click &#8216;<strong>Add feed</strong>&#8216;.</p>
<p>You can now paste the URL of the RSS feed into the appropriate box, and give it a name if you want. Then click &#8216;<strong>Save</strong>&#8216;</p>
<h2>Go live</h2>
<p>Now, back on your page click &#8216;<strong>Publish this page&#8217;</strong>. And don&#8217;t forget to &#8216;<strong>Like</strong>&#8216; it so you receive updates from it in your Facebook news feed.</p>
<p>Of course this will work for anything with an RSS feed &#8211; not just journalists. Here&#8217;s the <a href="http://www.facebook.com/pages/Online-Journalism-Blog/102663726486049?sk=wall" onclick="urchinTracker('/outgoing/www.facebook.com/pages/Online-Journalism-Blog/102663726486049?sk=wall&amp;referer=');">Facebook page for the Online Journalism Blog</a>, for example.</p>
<p><em>PS: A note of caution &#8211; Facebook&#8217;s terms say you cannot create a page for a person unless you represent them, so they could take the page down. Equally, the journalist could object themselves &#8211; ask them if they mind first. </em></p>
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fonlinejournalismblog.com%2F2011%2F03%2F28%2Fhow-to-create-a-facebook-news-feed-for-a-journalist-or-anything-else%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;colorscheme=light&amp;height=80" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:450px; height:80px;" allowTransparency="true"></iframe>]]></content:encoded>
			<wfw:commentRss>http://onlinejournalismblog.com/2011/03/28/how-to-create-a-facebook-news-feed-for-a-journalist-or-anything-else/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Discovering Co-location Communities – Twitter Maps of Tweets Near Wherever…</title>
		<link>http://blog.ouseful.info/2010/10/27/discovering-co-location-communities-tweets-near-wherever/</link>
		<comments>http://blog.ouseful.info/2010/10/27/discovering-co-location-communities-tweets-near-wherever/#comments</comments>
		<pubDate>Wed, 27 Oct 2010 13:15:46 +0000</pubDate>
		<dc:creator>tonyhirst</dc:creator>
				<category><![CDATA[data journalism]]></category>
		<category><![CDATA[community detection]]></category>
		<category><![CDATA[location]]></category>
		<category><![CDATA[onlinejournalismblog]]></category>
		<category><![CDATA[sna]]></category>
		<category><![CDATA[Tinkering]]></category>
		<category><![CDATA[tony hirst]]></category>

		<guid isPermaLink="false">http://blog.ouseful.info/?p=4327</guid>
		<description><![CDATA[As privacy erodes further and further, and more and more people start to reveal where they using location services, how easy is it to identify communities based on location, say, or postcode, rather than hashtag? That is, how easy is it to find people who are colocated in space, rather than topic, as in the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&#38;blog=325417&#38;post=4327&#38;subd=ouseful&#38;ref=&#38;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>As privacy erodes further and further, and more and more people start to reveal where they using location services, how easy is it to identify communities based on location, say, or postcode, rather than hashtag? That is, how easy is it to find people who are colocated in space, rather than topic, as in the hashtag communities? Very easy, it turns out&#8230;</p>
<p>One of the things I&#8217;ve been playing with lately is &#8220;community detection&#8221;, particularly in the context of people who are using a particular hashtag on Twitter. The recipe in that case runs something along the lines of: find a list of twitter user names for people using a particular hashtag, then grab their Twitter friends lists and look to see what community structures result (e.g. look for clusters within the different twitterers). The first part of that recipe is key, and generalisable: <em>find a list of twitter user names</em>&#8230;</p>
<p>So, can we create a list of names based on co-location? Yep &#8211; easy: Twitter search offers a &#8220;near:&#8221; search limit that lets you search in the vicinity of a location.</p>
<p>Here&#8217;s a Yahoo Pipe to demonstrate the concept &#8211; <a href="http://pipes.yahoo.com/pipes/pipe.info?_id=f21fb52dc7deb31f5fffc400c780c38d" onclick="urchinTracker('/outgoing/pipes.yahoo.com/pipes/pipe.info?_id=f21fb52dc7deb31f5fffc400c780c38d&amp;referer=');">Twitter hyperlocal search with map output</a>:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5119980851/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5119980851/?referer=');"><img src="http://farm5.static.flickr.com/4128/5119980851_f360622526.jpg" width="500" height="288" alt="Pipework for twitter hyperlocal search with map output" /></a></p>
<p>[UPDATE: since grabbing that screenshot, I've tweaked the pipe to make it a little more robust...]</p>
<p>And here&#8217;s the result:</p>
<p><a href="http://pipes.yahoo.com/pipes/pipe.info?_id=f21fb52dc7deb31f5fffc400c780c38d" title="Photo Sharing" onclick="urchinTracker('/outgoing/pipes.yahoo.com/pipes/pipe.info?_id=f21fb52dc7deb31f5fffc400c780c38d&amp;referer=');"><img src="http://farm2.static.flickr.com/1184/5120578262_71998ab3db.jpg" width="500" height="472" alt="Twitter local trend" /></a></p>
<p>It&#8217;s easy enough to generate a widget of the result &#8211; just click on the <em>Get as Badge</em> link to get the embeddable widget code, or add the widget direct to a dashboard such as iGoogle:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5119977161/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5119977161/?referer=');"><img src="http://farm2.static.flickr.com/1157/5119977161_86ca8e98b5.jpg" width="399" height="357" alt="Yahoo pipes map badge" /></a></p>
<p>(Note that this pipe also sets the scene for a possible demo of a &#8220;live pipe&#8221;, e.g. one that subscribes to searches via pubsubhubbub, so that whenever a new tweet appears it&#8217;s pushed to the pipe, and that makes the output live, for example by using a webhook.)</p>
<p>You can also grab the KML output of the pipe using a URL of the form:<br />
<em>http://pipes.yahoo.com/pipes/pipe.run?_id=f21fb52dc7deb31f5fffc400c780c38d&amp;_render=kml&amp;distance=1&amp;location=<strong>YOUR+LOCATION+STRING</strong></em><br />
and post it into a Google maps search box&#8230; like <a href="http://maps.google.com/maps?f=q&amp;source=s_q&amp;q=http://pipes.yahoo.com/pipes/pipe.run?_id=f21fb52dc7deb31f5fffc400c780c38d&amp;_render=kml&amp;distance=1.1&amp;location=Cardiff+Bay" onclick="urchinTracker('/outgoing/maps.google.com/maps?f=q_amp_source=s_q_amp_q=http_//pipes.yahoo.com/pipes/pipe.run?_id=f21fb52dc7deb31f5fffc400c780c38d_amp_render=kml_amp_distance=1.1_amp_location=Cardiff+Bay&amp;referer=');">this</a>:</p>
<p><a href="http://maps.google.com/maps?f=q&amp;source=s_q&amp;q=http://pipes.yahoo.com/pipes/pipe.run?_id=f21fb52dc7deb31f5fffc400c780c38d&amp;_render=kml&amp;distance=1.1&amp;location=Cardiff+Bay" title="Photo Sharing" onclick="urchinTracker('/outgoing/maps.google.com/maps?f=q_amp_source=s_q_amp_q=http_//pipes.yahoo.com/pipes/pipe.run?_id=f21fb52dc7deb31f5fffc400c780c38d_amp_render=kml_amp_distance=1.1_amp_location=Cardiff+Bay&amp;referer=');"><img src="http://farm5.static.flickr.com/4035/5121350488_6beb4f9743.jpg" width="500" height="279" alt="Yahoo pipe in google map" /></a></p>
<p>(If you try to refresh the Google map, it may suffer from result cacheing.. in which case you have to cache bust, e.g. by changing the <em>distance</em> value in the pipe URL to 1.0, 1.00, etc&#8230;;-)</p>
<p>Something else that could be useful for community detection is to search through the localised/co-located tweets for popular hashtags. Whilst we could probably do this in a separate pipe (left as an exercise for the reader), maybe by using a regular expression to extract hashtags and then the unique block filtering on hashtags to count the reoccurrences, here&#8217;s a Python recipe:</p>
<pre>import simplejson, urllib

def getYahooAppID():
  appid='YOUR_YAHOO_APP_ID_HERE'
  return appid

def placemakerGeocodeLatLon(address):
  encaddress=urllib.quote_plus(address)
  appid=getYahooAppID()
  url='http://where.yahooapis.com/geocode?location='+encaddress+'&amp;flags=J&amp;appid='+appid
  data = simplejson.load(urllib.urlopen(url))
  if data['ResultSet']['Found']&gt;0:
    for details in data['ResultSet']['Results']:
      return details['latitude'],details['longitude']
  else:
    return False,False

def twSearchNear(tweeters,tags,num,place='mk7 6aa,uk',term='',dist=1):
  t=int(num/100)
  page=1
  lat,lon=placemakerGeocodeLatLon(place)
  while page&lt;=t:
    url='http://search.twitter.com/search.json?geocode='+str(lat)+'%2C'+str(lon)+'%2C'+str(1.0*dist)+'km&amp;rpp=100&amp;page='+str(page)+'&amp;q=+within%3A'+str(dist)+'km'
    if term!='':
      url+='+'+urllib.quote_plus(term)

    page+=1
    data = simplejson.load(urllib.urlopen(url))
    for i in data['results']:
     if not i['text'].startswith('RT @'):
      u=i['from_user'].strip()
      if u in tweeters:
        tweeters[u]['count']+=1
      else:
        tweeters[u]={}
        tweeters[u]['count']=1
      ttags=re.findall(&quot;#([a-z0-9]+)&quot;, i['text'], re.I)
      for tag in ttags:
        if tag not in tags:
    	  tags[tag]=1
    	else:
    	  tags[tag]+=1

  return tweeters,tags

''' Usage:
tweeters={}
tags={}
num=100 #number of search results, best as a multiple of 100 up to max 1500
location='PLACE YOU WANT TO SEARCH AROUND'
term='OPTIONAL SEARCH TERM TO NARROW DOWN SEARCH RESULTS'
tweeters,tags=twSearchNear(tweeters,tags,num,location,searchTerm)
'''
</pre>
<p>What this code does is:<br />
- use Yahoo placemaker to geocode the address provided;<br />
- search in the vicinity of that area (note to self: allow additional distance parameter to be set; currently 1.0 km)<br />
- identify the unique twitterers, as well as counting the number of times they tweeted in the search results;<br />
- identify the unique tags, as well as counting the number of times they appeared in the search results.</p>
<p>Here&#8217;s an example output for a search around &#8220;Bath University, UK&#8221;:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5120002145/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5120002145/?referer=');"><img src="http://farm5.static.flickr.com/4066/5120002145_c921b75a8c.jpg" width="192" height="500" /></a></p>
<p>Having got the list of Twitterers (as discovered by a location based search), we can then look at their social connections as in the hashtag community visualisations:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5120046267/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5120046267/?referer=');"><img src="http://farm5.static.flickr.com/4019/5120046267_b041d17409.jpg" width="500" height="317" alt="Community detected around Bath U.. Hmm,,, people there who shouldnlt be?!" /></a></p>
<p>And wondering why the likes  @pstainthorp and @martin_hamilton appear to be in Bath? Is the location search broken, picking up stale data, or some other error&#8230;.? Or is there maybe a UKOLN event on today I wonder..?</p>
<p>PS Looking at a search near &#8220;University of Bath&#8221; in the web based Twitter search, it seems that: a) there arenlt many recent hits; b) the search results pull up tweets going back in time&#8230;</p>
<p>Which suggests to me:<br />
1) the code really should have a time window to filter the tweets by time, e.g. excluding tweets that are more than a day or even an hour old; (it would be so nice if Twitter search API offered a <em>since_time:</em> limit, although I guess it does offer <em>since_id</em>, and the web search does offer <em>since:</em> and <em>until:</em> limits that work on date, and that could be included in the pipe&#8230;)<br />
2) where there aren&#8217;t a lot of current tweets at a location, we can get a profile of that location based on people who passed through it over a period of time?</p>
<p>UPDATE: Problem solved&#8230;</p>
<p>The location search is picking up tweets like this:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5121632148/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5121632148/?referer=');"><img src="http://farm2.static.flickr.com/1178/5121632148_dfd3f583d9.jpg" width="500" height="206" alt="Twitter locations..." /></a></p>
<p>but when you click on the actual tweet link, it&#8217;s something different &#8211; a retweet:</p>
<p><a href="http://www.flickr.com/photos/psychemedia/5121036649/" title="Photo Sharing" onclick="urchinTracker('/outgoing/www.flickr.com/photos/psychemedia/5121036649/?referer=');"><img src="http://farm2.static.flickr.com/1124/5121036649_6cbe89f0bb.jpg" width="500" height="463" alt="Twitter reweets pass through the original location" /></a></p>
<p>So &#8220;official&#8221; Twitter retweets appear to pass through the location data of the original tweet, rather than the person retweeting&#8230; so I guess my script needs to identify official twitter retweets and dump them&#8230; </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/4327/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gocomments/ouseful.wordpress.com/4327/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ouseful.wordpress.com/4327/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/4327/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godelicious/ouseful.wordpress.com/4327/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/ouseful.wordpress.com/4327/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/4327/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gofacebook/ouseful.wordpress.com/4327/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/ouseful.wordpress.com/4327/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/4327/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gotwitter/ouseful.wordpress.com/4327/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/ouseful.wordpress.com/4327/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/4327/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/gostumble/ouseful.wordpress.com/4327/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/ouseful.wordpress.com/4327/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/4327/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/godigg/ouseful.wordpress.com/4327/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/ouseful.wordpress.com/4327/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/4327/" onclick="urchinTracker('/outgoing/feeds.wordpress.com/1.0/goreddit/ouseful.wordpress.com/4327/?referer=');"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/ouseful.wordpress.com/4327/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.ouseful.info&amp;blog=325417&amp;post=4327&amp;subd=ouseful&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.ouseful.info/2010/10/27/discovering-co-location-communities-tweets-near-wherever/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://farm5.static.flickr.com/4019/5120046267_b041d17409.jpg" length="" type="" />
<enclosure url="http://farm2.static.flickr.com/1124/5121036649_6cbe89f0bb.jpg" length="" type="" />
<enclosure url="http://farm2.static.flickr.com/1178/5121632148_dfd3f583d9.jpg" length="" type="" />
<enclosure url="http://farm2.static.flickr.com/1157/5119977161_86ca8e98b5.jpg" length="" type="" />
<enclosure url="http://farm5.static.flickr.com/4066/5120002145_c921b75a8c.jpg" length="" type="" />
<enclosure url="http://farm5.static.flickr.com/4035/5121350488_6beb4f9743.jpg" length="" type="" />
<enclosure url="http://farm5.static.flickr.com/4128/5119980851_f360622526.jpg" length="" type="" />
<enclosure url="http://farm2.static.flickr.com/1184/5120578262_71998ab3db.jpg" length="" type="" />
<enclosure url="" length="" type="" />
		</item>
	</channel>
</rss>

