Daines Walowe NGULAMU


A web crawler is a program that searches the World Wide Web in an orderly manner in order to collect data based on a search query. Web crawling is therefore the process of finding web pages and downloading them automatically. Crawlers have a difficult time getting relevant and quality information according to the search query of the user from the web. This is due to the large volume of the World Wide Web. This characteristic of the web also challenges the web crawlers as they may download duplicate and near-duplicate web pages according to the search query. These web pages reduce the quality of search indexes as well as affect storage cost and page ranking. In order improve the performance of the web crawler, an ontology-based web crawler with a near duplicate detection system was designed. The experiment was carried out using secondary data from a sample web site which was used since crawling is an endless process. Using these two approaches, the ontology web crawler would search for relevant searches according to the search query of the user while the near-duplicate detection system would eliminate redundant data.

Key Words: Crawlers, Ontology-Based Web Crawler, Near-Duplicate Detection System

Full Text:



