Due to the professional needs of friends, it is necessary to collect the number of competitor FB or IG Funsi and the number of daily posts every day (weekly), as a data analysis,
but the number of FB and IG Funsi groups that need to be tracked together exceeds 200 (really crazy, this is a real case)
If you click on one by one with your finger, copy and paste it to excel.
This action repeats two hundred times. I think about it, it collapses (and it collapses every time), so he just Ask me how these steps are automated
FB web pages do not support jQuery search. IG needs to log in first.
Based on the above two reasons, FB and IGE will be blocked by normal crawler programs. It must be executed by a physical web page.
So this time, selenium-webdriver is used for processing.
FB & IG crawler ideas
google sheet operation ideas
Scheduling logic
NVM: version control
windows download page for node.js , windows official introduction
Mac recommends downloading Homebrew first
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
下載後安裝nvm
brew install nvm
nvm install 12.6
git: program version control windows download page , mac download page official introduction
yarn: installation package is more convenient windows download page , mac download page , official introduction
VScode: it is recommended to use IDE to write program download page
Need to download the chrome driver. This driver needs to be the same as your chrome version. Please put this driver in the root directory of this project.
git clone https://github.com/dean9703111/FB_IG_crawler.git
cd FB_IG_crawler
yarn
Please make a copy of .env.example and rename it to .env, then fill in the SPREADSHEET_ID to be written by your target, and put your own FB and IG account secrets
Please copy ex_fb.json and ex_ig.json in the json folder and rename them to fb.json and ig.json, and change the title and url to the special page of your target crawler
Please pursuant to google sheets api teaching to complete the application
name to decide, type select Desktop App, then click on the “DOWNLOAD CLIENT CONFIGURATION”
after downloading the new certificate in this project “google_key” folder and put in
then under the command
node index.js
Then a URL will pop up for you to get the authentication code. After
copying and pasting, you can write the data into google sheets.
Basically, you can ignore the operation of viewing pictures in English.
If you want it to be executed automatically every day, you need to install the forever package on the computer (env, please remember USE_CRON=true)
npm install forever -g
Next, execute this command in the project directory, it will always be executed in the background
forever start [目標程式](index.js)
List the schedules currently being executed
forever list
Pause all currently executing schedules
forever stopall
Apply for a set of LINE Notify tokens at this website .
This blog has other LINE Notify applications!
This is a problem when the google sheets permissions are insufficient. GaxiosError: Insufficient Permission There will be a problem if the chrome version cannot correspond to the chromedriver. Chrome not reachable Selenium WebDriver error reference resource 1 and reference resource 2
Author: dean9703111
GitHub: https://github.com/dean9703111/FB_IG_crawler
#nodejs #node #javascript