搜索引擎下拉框推荐关键词挖掘

作者: Ginson 分类: Python,SEO 发布时间: 2017-08-17 10:18

关键词挖掘对 SEO 重要性不言而喻,目前我最常用的挖掘方式是通过百度推广工具的 API,不仅效率高而且关键词的相关信息完整。

通过搜索引擎的下拉框关键词也是一种比较好的渠道,搜索下拉框关键词挖掘相比其他渠道优点在于时效性强,可以及时发现热度上升的关键词,提前布局。

以下给出下拉框关键词挖掘代码,支持百度、搜狗和360的下拉框关键词挖掘。

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

'''
采集百度、360、搜狗下拉框关键词脚本
'''

import json,requests,re,sys,time

def get_sugs(word):
    sugs = []
    for SE in ['baidu','360','sogou']:
        sugs += baidu_sugs(word)
        sugs += sogou_sugs(word)
        sugs += so_sugs(word)
    return set(sugs)

def baidu_sugs(word):
    try:
        r = requests.get('http://suggestion.baidu.com/su?wd=%s&sugmode=3&json=1' %word,timeout=5)
        r_js = json.loads(re.sub("\\\\\\\'",' ',re.sub(r'window.baidu.sug\((.*)\);',r'\1',r.text)))
    except Exception as e:
        print(e)
        with open('采集失败关键词.txt','a+') as f:
            f.write(word+'\tbaidu\n')
        return []
    else:
        return r_js['s']

def sogou_sugs(word):
    try:
        r = requests.get('https://www.sogou.com/suggnew/ajajjson?key=%s&type=web' %word,timeout=5)
        r_js = json.loads(re.sub(r'window.sogou.sug\((.*),-1\);',r'\1',r.text))
    except Exception as e:
        print(e)
        with open('采集失败关键词.txt','a+') as f:
            f.write(word+'\tsogou\n')
        return []
    else:
        return r_js[1]

def so_sugs(word):
    try:
        r = requests.get('https://sug.so.360.cn/suggest?callback=suggest_so&encodein=utf-8&encodeout=utf-8&format=json&fields=word&word=%s' %word,timeout=5)
        r_js = json.loads(re.sub(r'suggest_so\((.*)\);',r'\1',r.text))
    except Exception as e:
        print(e)
        with open('采集失败关键词.txt','a+') as f:
            f.write(word+'\t360\n')
        return []
    else:
        return [ x['word'] for x in r_js['result'] ]


if __name__ == '__main__':

    for seed_word in open(sys.argv[1],'r').readlines():
        seed_word = seed_word.strip()
        for line in get_sugs(seed_word):
            with open('下拉框关键词.txt','a+') as f:
                f.write('%s\t%s\n' %(seed_word,line))
        #time.sleep(0.5)

发表评论

电子邮件地址不会被公开。 必填项已用*标注