Mitt Romney’s transition team website went live for a few embarrassing minutes. Of course, he had a transition website: you don’t just build one from scratch without a little advanced planning. But what that website looks like and what his plan might have been would have been interesting info to have been able to peek at. Unfortunately, the folks who caught on to the site at Politicalwire.com were only able to get screen captures of what was there. Basically, they took a picture of their screen while looking at the website.
Screen captures are better than nothing. But they suffer from a number of pesky problems. For a start, you can only see what was on the screen at the time: any content that might have required you to scroll down the page is not visible. Also, you’re limited to the number of pages they were able to screen cap. Additional information would have been lost. Finally as one nuts-and-bolts disadvantage, you can’t copy and paste text from an image. Who wants to transcribe? What is this, 1990? ( #LazyAsHell )
What would be really good would be to actually capture the documents themselves: all the individual pages, with links, text and images, reassembled for viewing elsewhere.
And you can do just that with software variously called “offline browsers,” or “web spiders.” The role of an offline reader is to basically harvest an entire website and recreate it on your desktop. Webreaper, SuperBot, HTTrack. There are a host of options and they all do more or less the same thing. In fact, there are even plugins for Chrome and FireFox as well as apps for iPhone and Android.
Most have nice, user-friendly user interfaces and require very little tech savvy to operate. Just tell your chosen tool the URL you’d like to start on and click “Go,” or whatever. They just crawl right though the site, capturing each individual page as they go, then following the links on the page to find more to capture.
So, enterprising information-seekers – be they bloggers, journalists or just the frequently-curious – would do well to have a suite of these tools on-hand for whenever they might need them. Users should be cautioned, however, that republishing the content of another site on your own might well run afoul of copyright laws. Still, fair use is a reasonable enough argument for keeping records of government and political websites at minimum.