WiseSites: A Web Scraper, Indexer, and Website Generator

WiseSites: A Web Scraper, Indexer, and Website Generator

1. WiseSites

Several years ago, I built a web scraper, indexer, and website generator that I collectively call WiseSites. The content that it processes is typically historical texts, although any type of text is possible and I have used the system elsewhere. There is a wealth of freely accessible historical knowledge because U.S. copyright law states that anything published 95 years ago or 70 years after a publisher's death is in the public domain. Let's take an illustrative look at how a search functions in WiseSites.

2. Demonstration

wise-sites-king-of-egypt.png

Figure 1: Searching for the phrase "King of Egypt"

In the above picture, the phrase "King of Egypt" is typed into the search bar. Then, when the user presses "Enter" any sentence from a text (typically a book) that contains the phrase "King of Egypt" will appear dynamically in a scrollable list of "blocks." Each block names a text that may contain multiple sentences which contain the phrase "King of Egypt". There is also a "Go to Text" button for each matching sentence in a block, that is a hyperlink to that specific sentence in the original text. This gives you context for the information you are interested in, all in one convenient place.

The list of hyperlinks on the left side of the page will take you to a page that pretty much does the same thing but narrows down to searching within a particular topic. Let's try this out and see how it works. I like history, so let's take a look at the hyperlink "GrecoRomanWise."

greco-roman-wise.png

Figure 2: Searching for the phrase "war with" in GrecoRomanWise

The only difference with the main WiseSites page is that at the top, there is a set of titles of the actual texts that can be searched. In the above picture, I searched for the phrase "war with." Clicking the first "Go to Text" button under the title "Decline and Fall of the Roman Empire" will take us to the context of that sentence,

greco-roman-wise-context.png

Figure 3: Context

That is just about all of the user-facing functionality; that is the nature of searching text. Google or any other search engine is not that much different. Most of the work on WiseSites was in the realm of dealing with relatively large amounts of data. There was a lot of behind the scenes work such as making my own customized scraper, parser, and database… The benefits to this high degree of customization, and what satisfies me the most with WiseSites, is that after a 6 year hiatus there is no bit rot!

3. Future

WiseSites used to be publicly available but I stopped hosting it because most people do not like to read and search. As I find the time, I would like to increase the range of searching capabilities, add more texts to the repository, and find the right market for this system. For example, for a select topic that has popularity I might make a mobile app. I have roughly tested these waters.

wise-sites-phone.png

Figure 4: Searching "emolument" in the U.S. Constitution via a mobile app.