Cheerio,刮刮时得到403

我在每个步骤上都使用了特定的链接几次测试了代码,而且效果很好,我不知道是否存在某种阻止请求的机制(这是我第一次尝试抓取/加油)

现在,当我尝试运行它时,对于每个请求(总共507个),我都会收到403错误,因此我只是停止了节点。我真的很希望,因为我确实获得了初始链接,但是当我尝试在它们上运行profileParse时,它就崩溃了:(

这是我的retailX.js

const rp = require('request-promise');
const $ = require('cheerio');
const profileParse = require('./profileParse')
const fs = require('fs')
const writeStream = fs.createWriteStream('post.csv')
const url = 'url im scraping here';


//headers
//writeStream.write(`Name, URL \n`)

rp({ 
    url:url, 
    headers: {
      'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
    },
    json:true 
  })
  .then(function(html){
    const profileUrls = []

    for (let i = 456; i < 457; i++){
        profileUrls.push('first portion of link here' + $('td > a[class="exhibitorName"]', html)[i].attribs.href)
    }

    return Promise.all(
        profileUrls.map(url => {
            return profileParse(url)
        })
    )
  })
  .then(profile => {
      //write row to csv
    writeStream.write(profile)
    console.log(profile, 'scraping done')
  })
  .catch(function(err){
    //handle error
    console.log("THERE IS AN ERROR")
  });

从第二个链接中刮取的第二个功能 profileParse.js

const rp = require('request-promise');
const $ = require('cheerio');

const profileParse = (url) => {
    return rp({ 
    url:url, 
    headers: {
      'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
    },
    json:true 
  })
  .then(function(html) {

const info = {
    name: $('div[class="panel-body"] > h1', html).text(),
    url: $('span[class="BoothContactUrl"] > a', html).text()
}

    return info
  })
  .catch(function(err) {
    //handle error
    console.log(err, 'THERE IS AN ERROR')
  });

}
module.exports = profileParse
评论