当前位置:  软件>HTML标签参考手册 iis7站长之家

PHP-spider

    来源:    发布时间:2014-12-20

    本文导语:  一个可扩展的PHP WEB 蜘蛛,示例代码: use VDBSpiderSpider; use VDBSpiderDiscovererXPathExpressionDiscoverer; $spider = new Spider('http://www.oschina.net'); 特性: supports two traversal algorithms: breadth-first and depth-first supports depth limiting and queue size limi...

一个可扩展的PHP WEB 蜘蛛,示例代码:

use VDBSpiderSpider;
use VDBSpiderDiscovererXPathExpressionDiscoverer;

$spider = new Spider('http://www.oschina.net');

特性:

  • supports two traversal algorithms: breadth-first and depth-first

  • supports depth limiting and queue size limiting

  • supports adding custom URI discovery logic, based on XPath, CSS selectors, or plain old PHP

  • comes with a useful set of URI filters, such as Domain limiting

  • supports custom URI filters, both prefetch (URI) and postfetch (Resource content)

  • supports custom request handling logic

  • comes with a useful set of persistence handlers (memory, file. Redis soon to follow)

  • supports custom persistence handlers

  • collects statistics about the crawl for reporting

  • dispatches useful events, allowing developers to add even more custom behavior

  • supports a politeness policy

  • will soon come with many default discoverers: RSS, Atom, RDF, etc.

  • will soon support multiple queueing mechanisms (file, memcache, redis)

  • will eventually support distributed spidering with a central queue


    
 
 

您可能感兴趣的文章:

 
本站(WWW.)旨在分享和传播互联网科技相关的资讯和技术,将尽最大努力为读者提供更好的信息聚合和浏览方式。
本站(WWW.)站内文章除注明原创外,均为转载、整理或搜集自网络。欢迎任何形式的转载,转载请注明出处。












  • 相关文章推荐


  • 站内导航:


    特别声明:169IT网站部分信息来自互联网,如果侵犯您的权利,请及时告知,本站将立即删除!

    ©2012-2021,,E-mail:www_#163.com(请将#改为@)

    浙ICP备11055608号-3