如何禁止别人爬虫小程序代码_CMS教程

Robotstxt - 禁止爬虫

robotstxt用于禁止网络爬虫访问网站指定目录。robotstxt的格式采用面向行的语法：空行、注释行（以#打头）、规则行。规则行的格式为：Field: value。常见的规则行：User-Agent、Disallow、Allow行。

User-Agent行

User-Agent: robot-name

User-Agent:

Disallow和Allow行

Disallow: /path

Disallow: # 空字符串，起通配符效果，全禁止

Allow: /path

Allow: # 空字符串，起通配符效果，全允许

搜索引擎的User-Agent对应名称

搜索引擎

User-Agent值

Google googlebot

百度 baiduspider

雅虎 slurp

MSN msnbot

Alexa is_archiver

我在Linux上抓包观察到的一些搜索引擎访问记录：

# tcpdump -n -nn -A -l -s1024 'tcp port 80'|grep User-Agent

User-Agent: Mozilla/50 (compatible; Googlebot/21; +>

以上就是关于如何禁止别人爬虫小程序代码全部的内容，包括:如何禁止别人爬虫小程序代码、15《Python 原生爬虫教程》爬虫和反爬虫、等相关内容解答，如果想了解更多相关内容，可以关注我们，你们的支持是我们更新的动力！

欢迎分享，转载请注明来源：内存溢出

如何禁止别人爬虫小程序代码