无需编程的爬虫 - 爬取京东的评论

无需编程的爬虫 – 爬取京东的评论

2020 年 4 月 22 日
筆記

【原创】转载请注明作者Johnthegreat和本文链接

做电商时，消费者对商品的评论是很重要的，但是不会写代码怎么办？这里有个Chrome插件可以做到简单的数据爬取，一句代码都不用写。下面给大家展示部分抓取后的数据：

可以看到，抓取的地址，评论人，评论内容，时间，产品颜色都已经抓取下来了。那么，爬取这些数据需要哪些工具呢？就两个：

1. Chrome浏览器；

2. 插件：Web Scraper

插件下载地址：//chromecj.com/productivity/2018-05/942.html

最后，如果你想自己动手抓取一下，这里是这次抓取的详细过程：

1. 首先，复制如下的代码，对，你不需要写代码，但是为了便于上手，复制代码还是需要的，后续可以自己定制和选择，不需要写代码。

{
    "_id": "jdreview",
    "startUrl": [
        "//item.jd.com/100000680365.html#comment"
    ],
    "selectors": [
        {
            "id": "user",
            "type": "SelectorText",
            "selector": "div.user-info",
            "parentSelectors": [
                "main"
            ],
            "multiple": false,
            "regex": "",
            "delay": 0
        },
        {
            "id": "comments",
            "type": "SelectorText",
            "selector": "div.comment-column > p.comment-con",
            "parentSelectors": [
                "main"
            ],
            "multiple": false,
            "regex": "",
            "delay": 0
        },
        {
            "id": "time",
            "type": "SelectorText",
            "selector": "div.comment-message:nth-of-type(5) span:nth-of-type(4), div.order-info span:nth-of-type(4)",
            "parentSelectors": [
                "main"
            ],
            "multiple": false,
            "regex": "",
            "delay": "0"
        },
        {
            "id": "color",
            "type": "SelectorText",
            "selector": "div.order-info span:nth-of-type(1)",
            "parentSelectors": [
                "main"
            ],
            "multiple": false,
            "regex": "",
            "delay": 0
        },
        {
            "id": "main",
            "type": "SelectorElementClick",
            "selector": "div.comment-item",
            "parentSelectors": [
                "_root"
            ],
            "multiple": true,
            "delay": "10000",
            "clickElementSelector": "div.com-table-footer a.ui-pager-next",
            "clickType": "clickMore",
            "discardInitialElements": false,
            "clickElementUniquenessType": "uniqueHTMLText"
        }
    ]
}

2. 然后打开chrome浏览器，在任意页面同时按下Ctrl+Shift+i，在弹出的窗口中找到Web Scraper，如下：