Recovering a website from search engine caches
As noted earlier I screwed up this blog by zapping the database and not having a working current backup. I wasn't too bothered as i'd only made 60 posts over the course of six months but I was a bit annoyed with myself for not being more careful. The blog consisted of me pontificating about how the USPS sucks, posting links to other blogs where people had posted interesting things or ZipitZ2 related items, most of which really weren't that interesting but there were a few interesting posts and comments that I really wanted to recover so I started looking ways to recover webpages from search engine caches. Pulling a single page from a search engine cache is very easy, just search for the website in the search engine and click the cached link. But I wasn't previously aware of a method of pulling a complete website from a cache automatically. I am sure that someone will tell me that it's possible using wget or cURL (almost anything is possible using a combination of wget, cURL, bash and netcat) but I don't know how. I googled a while and stumbled over a cool piece of software called Warrick. Warrick is a program written in perl (figures
) that runs *nix and Windows that will recover as much of a website from the Internet Archive, Google, bing and yahoo caches. It works best within the first few days of a website disappearing as over time the search engine caches will degrade and less and less of the desired data will be available. Using Warrick yesterday I was able to pretty much recover all the posts from the blog and i'll repost them over the next few weeks. I'm not sure if i'll post them as new posts with a new data or if i'll try to backdate them. If you know how to backdate posts in wordpress i'd love to know as the only method i can think of off the top of my head is direct db manipulation.
Doh! Error between keyboard and chair
Yesterday I was working on one of my websites. http://camputerslynx.info, which is about an old British 8bit home micro from the 80s, The Camputers Lynx, and decided that because most of the content is made up of pdf's and zip files that rather than continue to use handcrafted html (it's also been a wiki and a CMS) i'd try a document manager, OpenDocMan was the choice. Everything was going fine with the install until I answered the question "is this a new installation" I answered yes and clicked ok. I have a single database with multiple tables with table prefixes, which has worked quite well in the past. This time it didn't because because rather than just creating a new series of tables with the odm_ prefix, OpenDocMan decided that it would delete the existing database and create a new one. DOH!!!!
Now some of you will ask "well didn't you have a backup", yes i did unfortunatly it was 2 months old (I had been planning on making a new one this week) and it was corrupted ![]()
So i'm having to restart this blog from scratch. A few good things did come out of this c***up though.
1. I will make my backups more regularly.
2. I now have a pristine database for wordpress and my hoster let me have another database so I now no longer have to use a single database for all my sites.
3. I discovered a great piece of software called Warrick that has enabled me to get all the text back from the destroyed blog, although I have lost a few things I got probably 90% back. I'll write up a post about Warrick in a few days.