PHP-spider__supports_custom_limiting_with_useful_

当前位置: 软件>HTML标签参考手册 iis7站长之家

PHP-spider

来源: 发布时间：2014-12-20

本文导语: 一个可扩展的PHP WEB 蜘蛛，示例代码： use VDBSpiderSpider; use VDBSpiderDiscovererXPathExpressionDiscoverer; $spider = new Spider('http://www.oschina.net'); 特性： supports two traversal algorithms: breadth-first and depth-first supports depth limiting and queue size limi...

一个可扩展的PHP WEB 蜘蛛，示例代码：

use VDBSpiderSpider;
use VDBSpiderDiscovererXPathExpressionDiscoverer;

$spider = new Spider('http://www.oschina.net');

特性：

supports two traversal algorithms: breadth-first and depth-first
supports depth limiting and queue size limiting
supports adding custom URI discovery logic, based on XPath, CSS selectors, or plain old PHP
comes with a useful set of URI filters, such as Domain limiting
supports custom URI filters, both prefetch (URI) and postfetch (Resource content)
supports custom request handling logic
comes with a useful set of persistence handlers (memory, file. Redis soon to follow)
supports custom persistence handlers
collects statistics about the crawl for reporting
dispatches useful events, allowing developers to add even more custom behavior
supports a politeness policy
will soon come with many default discoverers: RSS, Atom, RDF, etc.
will soon support multiple queueing mechanisms (file, memcache, redis)
will eventually support distributed spidering with a central queue

您可能感兴趣的文章:

本站(WWW.)旨在分享和传播互联网科技相关的资讯和技术，将尽最大努力为读者提供更好的信息聚合和浏览方式。
本站(WWW.)站内文章除注明原创外，均为转载、整理或搜集自网络。欢迎任何形式的转载，转载请注明出处。