通过User-Agent
构建请求头headers
能够将爬虫伪装,但是有的网站会去检查请求头,所以在每次提交网站请求时可以使用随机的header,频繁更换UserAgent
可以避免触发相应的反爬机制,而fake-useragent
对此类需求提供了解决方案用于python爬虫伪装。
通过pip进行下载
pip install fake-useragent
导库
import os
import fake_useragent
# ua = UserAgent(use_cache_server=False)
# ua = UserAgent(cache=False)
# ua = UserAgent(verify_ssl=False)
导入以下代码
# from:https://fake-useragent.herokuapp.com/browsers/0.1.11
def get_header():
location = os.getcwd() + '/data/fake_useragent.json'
ua = fake_useragent.UserAgent(path=location)
# return ua.chrome
# return ua.ie
# return ua.firefox
# return ua.opera
# return ua.safari
return ua.random
for i in range(10):
print(get_header())
测试输出
Mozilla/5.0 (X11; CrOS i686 4319.74.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1944.0 Safari/537.36
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36 Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10
Mozilla/5.0 (X11; NetBSD) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0
Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.17 Safari/537.36
Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1464.0 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36
Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1464.0 Safari/537.36
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.2117.157 Safari/537.36
文章评论