Seed URL

A Seed URL in web crawling is a url from which a web crawler will begin to traverse a site. Once a crawler is on a seed URL it will extra data from the page and look for all links to additional pages. If a crawler is set to crawl an entire domain it will systematically follow each link on every page, extracting data from each ensuing page. Paths from a seed URL are often influenced by a websites Robots.txt file, which dictates how the site owner would like bots to traverse the site.