Master Computers and Technology with Fun Quizzes & Brain Teasers!

IterativeCrawler The IterativeCrawler is similar to the RecursiveCrawler in that it also overrides the crawl method to visit pages. However, it adds some additional functionality that allows it to visit individual pages in a step-by-step fashion without moving on to links completely. Note that the IterativeCrawler should only put valid pages into its pendingPages which are those that are present and return true according to the validPageLink(page) method of Crawler. Skeleton protected ArraySet pendingPages A list of pages that remain to be visited. As a single page is visited, any valid links that are found are added to this list to be visited later. This list should only contain valid, existing pages which can be visited and have not yet been visited. public IterativeCrawler() Creates an empty crawler public void crawl (String pageFileName) Master crawl method which will start at the given page and visit all reachable pages. It adds the given page to the pendingPages and calls crawlRemainging public void crawlRemaining() Enter a loop that crawls individual pages until there are no pending pages remaining. During execution, the pendingPages set grows and shrinks but should eventually reduce to size 0. public void addPendingPage(String pageFileName) Will add a page to be visited at a later time by the crawler. It can be used independently to add pages for later or during other methods of IterativeCrawler. It is assumed that that pageFileName is valid and exists so can be visited and parsed. public int pending PagesSize() Returns the number of pages remaining to visit. public String pendingPagesString() Returns a string with each pending page to visit on its own line. public void crawlNextPage() Crawl a single page which is retrieved and removed from the list of pending pages (the next page to be crawled should be the last page in the pendingPages set - use asList method to get a list version of the set and the remove method of lists to accomplish this). The page is added to the foundPages for the crawler and any links on it are added either to the pending Pages or skippedPages. The process of doing this is very similar to the pseudocode in RecursiveCrawler but replaces recursive calls with additions to pendingPages.