Video is here:
http://www.youtube.com/watch?v=58Gzlq4zSDk&feature=player_embedded
Investigative journalism
Last month a local newspaper reported that a big new data center had opened in Salt Lake City with a mystery anchor client. The paper believed the client was Twitter, as the company has said it was going to open its first off-site data center in Utah at an undisclosed date.We used Needlebase to look at all the tweets from people on the Twitter list of Twitter staff members and extract the username, message body and location, if exposed. Needlebase scraped the last 1500 Tweets in less than 5 minutes. We displayed them on a map and saw that there was just one Tweet published in that time from Utah: a Twitter Site Operations Technician who had just left San Francisco to move to Salt Lake City, complaining about Qwest router problems. That wasn't quite confirmation, but it sure felt like a valuable clue and was very easy to come by thanks to Needlebase.
Data Re-Sorting
Last night I found a solution to a long-running issue I've been struggling with. I've got this list of 300 blogs around the web that cover geotechnology (that's a whole other story) and have them all run through Postrank. That service ranks them in order of most to least social media and reader engagement per blog post.Wouldn't it be great to extract that data over time, to track it and to turn it into blog posts? I think it would. I couldn't figure out how to get all the data out that I wanted though.
Enter Needlebase. Last night I pointed Needle to my Postrank pages for geotech blogs and in minutes it pulled down all the data I wanted. I exported that data as a CSV, uploaded it to Google Docs as a spreadsheet, did a little subtraction and now have the following chart tracking the top 300 geotech blogs on the web. Now in my handy spreadsheet, I was able to set up a function to show me which blogs jumped or fell in the rankings the most over the previous week. Thanks, Needlebase!Event Preparation
I've written here about how to use Mechanical Turk to get ready and rock an industry event. Needlebase can prove useful for that as well.
The DIY Data Hackers Toolkit
I put Needle in my mind in between two other wonderful tools. On one end of the spectrum is the now Yahoo-acquired Dapper, which anyone can use to build an RSS feed from changes made to any field on any web page. (See: The Glory and Bliss of Screen Scraping and How Yahoo's Latest Acquisition Stole and Broke My Heart)One the other end of the spectrum is the brand-new Extractiv, a bulk web-crawling and semantic analysis tool that's also remarkably easy to use. Earlier this month I used Extractiv to search across 300 top geotech blogs for all instances of the word "ESRI," all entities mentioned in relation to ESRI and the words used to describe those relations. The service processed 125,000 pages and spit out my results in less than an hour for less than a dollar. That's incredible - it's a game changer.
Needlebase is too. It sits somewhere in between Dapper and Extractiv, I think. These tools are democratizing the ability to extract and work with data from across the web. They are to text processing what blogging was to text publishing.
No comments:
Post a Comment