PHP-spider
本文导语: 一个可扩展的PHP WEB 蜘蛛,示例代码: use VDBSpiderSpider; use VDBSpiderDiscovererXPathExpressionDiscoverer; $spider = new Spider('http://www.oschina.net'); 特性: supports two traversal algorithms: breadth-first and depth-first supports depth limiting and queue size limi...
一个可扩展的PHP WEB 蜘蛛,示例代码:
use VDBSpiderSpider; use VDBSpiderDiscovererXPathExpressionDiscoverer; $spider = new Spider('http://www.oschina.net');
特性:
supports two traversal algorithms: breadth-first and depth-first
supports depth limiting and queue size limiting
supports adding custom URI discovery logic, based on XPath, CSS selectors, or plain old PHP
comes with a useful set of URI filters, such as Domain limiting
supports custom URI filters, both prefetch (URI) and postfetch (Resource content)
supports custom request handling logic
comes with a useful set of persistence handlers (memory, file. Redis soon to follow)
supports custom persistence handlers
collects statistics about the crawl for reporting
dispatches useful events, allowing developers to add even more custom behavior
supports a politeness policy
will soon come with many default discoverers: RSS, Atom, RDF, etc.
will soon support multiple queueing mechanisms (file, memcache, redis)
will eventually support distributed spidering with a central queue
您可能感兴趣的文章:
本站(WWW.)站内文章除注明原创外,均为转载、整理或搜集自网络。欢迎任何形式的转载,转载请注明出处。