1676405460
This crawler automates the following step:
# upload pdf to googledrive, store data and notify via email
python script/spider.py -c config/prod.cfg -u googledrive -s firebase -n gmail
# download all format
python script/spider.py --config config/prod.cfg --all
# download only one format: pdf|epub|mobi
python script/spider.py --config config/prod.cfg --type pdf
# download also additional material: source code (if exists) and book cover
python script/spider.py --config config/prod.cfg -t pdf --extras
# equivalent (default is pdf)
python script/spider.py -c config/prod.cfg -e
# download and then upload to Google Drive (given the download url anyone can download it)
python script/spider.py -c config/prod.cfg -t epub --upload googledrive
python script/spider.py --config config/prod.cfg --all --extras --upload googledrive
# download and then upload to OneDrive (given the download url anyone can download it)
python script/spider.py -c config/prod.cfg -t epub --upload onedrive
python script/spider.py --config config/prod.cfg --all --extras --upload onedrive
# download and notify: gmail|ifttt|join|pushover
python script/spider.py -c config/prod.cfg --notify gmail
# only claim book (no downloads):
python script/spider.py -c config/prod.cfg --notify gmail --claimOnly
Before you start you should
python --version
git clone https://github.com/niqdev/packtpub-crawler.git
pip install -r requirements.txt
(see also virtualenv)cp config/prod_example.cfg config/prod.cfg
[credential]
credential.email=PACKTPUB_EMAIL
credential.password=PACKTPUB_PASSWORD
Now you should be able to claim and download your first eBook
python script/spider.py --config config/prod.cfg
From the documentation, Google Drive API requires OAuth2.0 for authentication, so to upload files you should:
config/client_secrets.json
[googledrive]
...
googledrive.client_secrets=config/client_secrets.json
googledrive.gmail=GOOGLE_DRIVE@gmail.com
Now you should be able to upload your eBook to Google Drive
python script/spider.py --config config/prod.cfg --upload googledrive
Only the first time you will be prompted to login in a browser which has javascript enabled (no text-based browser) to generate config/auth_token.json
. You should also copy and paste in the config the FOLDER_ID, otherwise every time a new folder with the same name will be created.
[googledrive]
...
googledrive.default_folder=packtpub
googledrive.upload_folder=FOLDER_ID
Documentation: OAuth, Quickstart, example and permissions
From the documentation, OneDrive API requires OAuth2.0 for authentication, so to upload files you should:
[onedrive]
...
onedrive.client_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
onedrive.client_secret=XxXxXxXxXxXxXxXxXxXxXxX
Now you should be able to upload your eBook to OneDrive
python script/spider.py --config config/prod.cfg --upload onedrive
Only the first time you will be prompted to login in a browser which has javascript enabled (no text-based browser) to generate config/session.onedrive.pickle
.
[onedrive]
...
onedrive.folder=packtpub
Documentation: Registration, Python API
To upload your eBook via scp
on a remote server update the configs
[scp]
scp.host=SCP_HOST
scp.user=SCP_USER
scp.password=SCP_PASSWORD
scp.path=SCP_UPLOAD_PATH
Now you should be able to upload your eBook
python script/spider.py --config config/prod.cfg --upload scp
Note:
scp.path
on the remote server must exists in advance--upload scp
is incompatible with --store
and --notify
Create a new Firebase project, copy the database secret from your settings
https://console.firebase.google.com/project/PROJECT_NAME/settings/database
and update the configs
[firebase]
firebase.database_secret=DATABASE_SECRET
firebase.url=https://PROJECT_NAME.firebaseio.com
Now you should be able to store your eBook details on Firebase
python script/spider.py --config config/prod.cfg --upload googledrive --store firebase
To send a notification via email using Gmail you should:
[gmail]
...
gmail.username=EMAIL_USERNAME@gmail.com
gmail.password=EMAIL_PASSWORD
gmail.from=FROM_EMAIL@gmail.com
gmail.to=TO_EMAIL_1@gmail.com,TO_EMAIL_2@gmail.com
Now you should be able to notify your accounts
python script/spider.py --config config/prod.cfg --notify gmail
[ifttt]
ifttt.event_name=packtpub-crawler
ifttt.key=IFTTT_MAKER_KEY
Now you should be able to trigger the applet
python script/spider.py --config config/prod.cfg --notify ifttt
Value mappings:
[join]
join.device_ids=DEVICE_IDS_COMMA_SEPARATED_OR_GROUP_NAME
join.api_key=API_KEY
Now you should be able to trigger the event
python script/spider.py --config config/prod.cfg --notify join
[pushover]
pushover.user_key=PUSHOVER_USER_KEY
pushover.api_key=PUSHOVER_API_KEY
Create a new branch
git checkout -b heroku-scheduler
Update the .gitignore
and commit your changes
# remove config/prod.cfg config/client_secrets.json config/auth_token.json # add dev/ config/dev.cfg config/prod_example.cfg
Create, config and deploy the scheduler
heroku login # create a new app heroku create APP_NAME --region eu # or if you already have an existing app heroku git:remote -a APP_NAME # deploy your app git push -u heroku heroku-scheduler:master heroku ps:scale clock=1 # useful commands heroku ps heroku logs --ps clock.1 heroku logs --tail heroku run bash
Update script/scheduler.py
with your own preferences.
More info about Heroku Scheduler, Clock Processes, Add-on and APScheduler
Build your image
docker build -t niqdev/packtpub-crawler:2.4.0 .
Run manually
docker run \
--rm \
--name my-packtpub-crawler \
niqdev/packtpub-crawler:2.4.0 \
python script/spider.py --config config/prod.cfg
Run scheduled crawler in background
docker run \
--detach \
--name my-packtpub-crawler \
niqdev/packtpub-crawler:2.4.0
# useful commands
docker exec -i -t my-packtpub-crawler bash
docker logs -f my-packtpub-crawler
Alternatively you can pull from Docker Hub this fork
docker pull kuchy/packtpub-crawler
Add this to your crontab to run the job daily at 9 AM:
crontab -e
00 09 * * * cd PATH_TO_PROJECT/packtpub-crawler && /usr/bin/python script/spider.py --config config/prod.cfg >> /tmp/packtpub.log 2>&1
Create two files in /etc/systemd/system:
[Unit]
Description=run packtpub-crawler
[Service]
User=USER_THAT_SHOULD_RUN_THE_SCRIPT
ExecStart=/usr/bin/python2.7 PATH_TO_PROJECT/packtpub-crawler/script/spider.py -c config/prod.cfg
[Install]
WantedBy=multi-user.target
[Unit]
Description=Runs packtpub-crawler every day at 7
[Timer]
OnBootSec=10min
OnActiveSec=1s
OnCalendar=*-*-* 07:00:00
Unit=packtpub_crawler.service
Persistent=true
[Install]
WantedBy=multi-user.target
Enable the script with sudo systemctl enable packtpub_crawler.timer
. You can test the service with sudo systemctl start packtpub_crawler.timer
and see the output with sudo journalctl -u packtpub_crawler.service -f
.
The script downloads also the free ebooks from the weekly packtpub newsletter. The URL is generated by a Google Apps Script which parses all the mails. You can get the code here, if you want to see the actual script, please clone the spreadsheet and go to Tools > Script editor...
.
To use your own source, modify in the config
url.bookFromNewsletter=https://goo.gl/kUciut
The URL should point to a file containing only the URL (no semicolons, HTML, JSON, etc).
You can also clone the spreadsheet to use your own Gmail account. Subscribe to the newsletter (on the bottom of the page) and create a filter to tag your mails accordingly.
Install paramiko with sudo -H pip install paramiko --ignore-installed
Install missing dependencies as described here
# install pip + setuptools
curl https://bootstrap.pypa.io/get-pip.py | python -
# upgrade pip
pip install -U pip
# install virtualenv globally
sudo pip install virtualenv
# create virtualenv
virtualenv env
# activate virtualenv
source env/bin/activate
# verify virtualenv
which python
python --version
# deactivate virtualenv
deactivate
Run a simple static server with
node dev/server.js
and test the crawler with
python script/spider.py --dev --config config/dev.cfg --all
This project is just a Proof of Concept and not intended for any illegal usage. I'm not responsible for any damage or abuse, use it at your own risk.
Author: Niqdev
Source Code: https://github.com/niqdev/packtpub-crawler
License: MIT license
#firebase #heroku #docker #onedrive #googledrive
1676405460
This crawler automates the following step:
# upload pdf to googledrive, store data and notify via email
python script/spider.py -c config/prod.cfg -u googledrive -s firebase -n gmail
# download all format
python script/spider.py --config config/prod.cfg --all
# download only one format: pdf|epub|mobi
python script/spider.py --config config/prod.cfg --type pdf
# download also additional material: source code (if exists) and book cover
python script/spider.py --config config/prod.cfg -t pdf --extras
# equivalent (default is pdf)
python script/spider.py -c config/prod.cfg -e
# download and then upload to Google Drive (given the download url anyone can download it)
python script/spider.py -c config/prod.cfg -t epub --upload googledrive
python script/spider.py --config config/prod.cfg --all --extras --upload googledrive
# download and then upload to OneDrive (given the download url anyone can download it)
python script/spider.py -c config/prod.cfg -t epub --upload onedrive
python script/spider.py --config config/prod.cfg --all --extras --upload onedrive
# download and notify: gmail|ifttt|join|pushover
python script/spider.py -c config/prod.cfg --notify gmail
# only claim book (no downloads):
python script/spider.py -c config/prod.cfg --notify gmail --claimOnly
Before you start you should
python --version
git clone https://github.com/niqdev/packtpub-crawler.git
pip install -r requirements.txt
(see also virtualenv)cp config/prod_example.cfg config/prod.cfg
[credential]
credential.email=PACKTPUB_EMAIL
credential.password=PACKTPUB_PASSWORD
Now you should be able to claim and download your first eBook
python script/spider.py --config config/prod.cfg
From the documentation, Google Drive API requires OAuth2.0 for authentication, so to upload files you should:
config/client_secrets.json
[googledrive]
...
googledrive.client_secrets=config/client_secrets.json
googledrive.gmail=GOOGLE_DRIVE@gmail.com
Now you should be able to upload your eBook to Google Drive
python script/spider.py --config config/prod.cfg --upload googledrive
Only the first time you will be prompted to login in a browser which has javascript enabled (no text-based browser) to generate config/auth_token.json
. You should also copy and paste in the config the FOLDER_ID, otherwise every time a new folder with the same name will be created.
[googledrive]
...
googledrive.default_folder=packtpub
googledrive.upload_folder=FOLDER_ID
Documentation: OAuth, Quickstart, example and permissions
From the documentation, OneDrive API requires OAuth2.0 for authentication, so to upload files you should:
[onedrive]
...
onedrive.client_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
onedrive.client_secret=XxXxXxXxXxXxXxXxXxXxXxX
Now you should be able to upload your eBook to OneDrive
python script/spider.py --config config/prod.cfg --upload onedrive
Only the first time you will be prompted to login in a browser which has javascript enabled (no text-based browser) to generate config/session.onedrive.pickle
.
[onedrive]
...
onedrive.folder=packtpub
Documentation: Registration, Python API
To upload your eBook via scp
on a remote server update the configs
[scp]
scp.host=SCP_HOST
scp.user=SCP_USER
scp.password=SCP_PASSWORD
scp.path=SCP_UPLOAD_PATH
Now you should be able to upload your eBook
python script/spider.py --config config/prod.cfg --upload scp
Note:
scp.path
on the remote server must exists in advance--upload scp
is incompatible with --store
and --notify
Create a new Firebase project, copy the database secret from your settings
https://console.firebase.google.com/project/PROJECT_NAME/settings/database
and update the configs
[firebase]
firebase.database_secret=DATABASE_SECRET
firebase.url=https://PROJECT_NAME.firebaseio.com
Now you should be able to store your eBook details on Firebase
python script/spider.py --config config/prod.cfg --upload googledrive --store firebase
To send a notification via email using Gmail you should:
[gmail]
...
gmail.username=EMAIL_USERNAME@gmail.com
gmail.password=EMAIL_PASSWORD
gmail.from=FROM_EMAIL@gmail.com
gmail.to=TO_EMAIL_1@gmail.com,TO_EMAIL_2@gmail.com
Now you should be able to notify your accounts
python script/spider.py --config config/prod.cfg --notify gmail
[ifttt]
ifttt.event_name=packtpub-crawler
ifttt.key=IFTTT_MAKER_KEY
Now you should be able to trigger the applet
python script/spider.py --config config/prod.cfg --notify ifttt
Value mappings:
[join]
join.device_ids=DEVICE_IDS_COMMA_SEPARATED_OR_GROUP_NAME
join.api_key=API_KEY
Now you should be able to trigger the event
python script/spider.py --config config/prod.cfg --notify join
[pushover]
pushover.user_key=PUSHOVER_USER_KEY
pushover.api_key=PUSHOVER_API_KEY
Create a new branch
git checkout -b heroku-scheduler
Update the .gitignore
and commit your changes
# remove config/prod.cfg config/client_secrets.json config/auth_token.json # add dev/ config/dev.cfg config/prod_example.cfg
Create, config and deploy the scheduler
heroku login # create a new app heroku create APP_NAME --region eu # or if you already have an existing app heroku git:remote -a APP_NAME # deploy your app git push -u heroku heroku-scheduler:master heroku ps:scale clock=1 # useful commands heroku ps heroku logs --ps clock.1 heroku logs --tail heroku run bash
Update script/scheduler.py
with your own preferences.
More info about Heroku Scheduler, Clock Processes, Add-on and APScheduler
Build your image
docker build -t niqdev/packtpub-crawler:2.4.0 .
Run manually
docker run \
--rm \
--name my-packtpub-crawler \
niqdev/packtpub-crawler:2.4.0 \
python script/spider.py --config config/prod.cfg
Run scheduled crawler in background
docker run \
--detach \
--name my-packtpub-crawler \
niqdev/packtpub-crawler:2.4.0
# useful commands
docker exec -i -t my-packtpub-crawler bash
docker logs -f my-packtpub-crawler
Alternatively you can pull from Docker Hub this fork
docker pull kuchy/packtpub-crawler
Add this to your crontab to run the job daily at 9 AM:
crontab -e
00 09 * * * cd PATH_TO_PROJECT/packtpub-crawler && /usr/bin/python script/spider.py --config config/prod.cfg >> /tmp/packtpub.log 2>&1
Create two files in /etc/systemd/system:
[Unit]
Description=run packtpub-crawler
[Service]
User=USER_THAT_SHOULD_RUN_THE_SCRIPT
ExecStart=/usr/bin/python2.7 PATH_TO_PROJECT/packtpub-crawler/script/spider.py -c config/prod.cfg
[Install]
WantedBy=multi-user.target
[Unit]
Description=Runs packtpub-crawler every day at 7
[Timer]
OnBootSec=10min
OnActiveSec=1s
OnCalendar=*-*-* 07:00:00
Unit=packtpub_crawler.service
Persistent=true
[Install]
WantedBy=multi-user.target
Enable the script with sudo systemctl enable packtpub_crawler.timer
. You can test the service with sudo systemctl start packtpub_crawler.timer
and see the output with sudo journalctl -u packtpub_crawler.service -f
.
The script downloads also the free ebooks from the weekly packtpub newsletter. The URL is generated by a Google Apps Script which parses all the mails. You can get the code here, if you want to see the actual script, please clone the spreadsheet and go to Tools > Script editor...
.
To use your own source, modify in the config
url.bookFromNewsletter=https://goo.gl/kUciut
The URL should point to a file containing only the URL (no semicolons, HTML, JSON, etc).
You can also clone the spreadsheet to use your own Gmail account. Subscribe to the newsletter (on the bottom of the page) and create a filter to tag your mails accordingly.
Install paramiko with sudo -H pip install paramiko --ignore-installed
Install missing dependencies as described here
# install pip + setuptools
curl https://bootstrap.pypa.io/get-pip.py | python -
# upgrade pip
pip install -U pip
# install virtualenv globally
sudo pip install virtualenv
# create virtualenv
virtualenv env
# activate virtualenv
source env/bin/activate
# verify virtualenv
which python
python --version
# deactivate virtualenv
deactivate
Run a simple static server with
node dev/server.js
and test the crawler with
python script/spider.py --dev --config config/dev.cfg --all
This project is just a Proof of Concept and not intended for any illegal usage. I'm not responsible for any damage or abuse, use it at your own risk.
Author: Niqdev
Source Code: https://github.com/niqdev/packtpub-crawler
License: MIT license
1625302026
Orbit Edge is a daily deals app development company that has 10+ years of experience in daily deals app development services. Our robust, informative, and easy-to-use daily deals app delivers a best-in-class user experience.
#daily deals app development #daily deals app development services #daily deals website development #best daily deals apps #hire daily deals app developers
1621779092
Orbit Edge is a daily deals app development company that has 10+ years of experience in daily deals app development services. Our robust, informative, and easy-to-use daily deals app delivers a best-in-class user experience.
#daily deals app development #daily deals website development #daily deals app development services #hire daily deals app developers #best daily deals apps
1619790540
http://blogs.rediff.com/bellagarvin/2021/04/30/top-5-daily-deals-app-development-companies-2021-22/
In case, if you are finding a new path in the field of daily deals app and searching for the most reliable daily deals app development companies then you are on the right path. Let me share the list of the top 5 daily deals app development companies that can develop customized apps and websites where online customers can easily browse through the coupons and promo codes to make a cost-effective deal.
#daily deals app development #daily deals mobile app development #daily deals app development services #daily deals app development company #best daily deals apps
1594369800
SQL stands for Structured Query Language. SQL is a scripting language expected to store, control, and inquiry information put away in social databases. The main manifestation of SQL showed up in 1974, when a gathering in IBM built up the principal model of a social database. The primary business social database was discharged by Relational Software later turning out to be Oracle.
Models for SQL exist. In any case, the SQL that can be utilized on every last one of the major RDBMS today is in various flavors. This is because of two reasons:
1. The SQL order standard is genuinely intricate, and it isn’t handy to actualize the whole standard.
2. Every database seller needs an approach to separate its item from others.
Right now, contrasts are noted where fitting.
#programming books #beginning sql pdf #commands sql #download free sql full book pdf #introduction to sql pdf #introduction to sql ppt #introduction to sql #practical sql pdf #sql commands pdf with examples free download #sql commands #sql free bool download #sql guide #sql language #sql pdf #sql ppt #sql programming language #sql tutorial for beginners #sql tutorial pdf #sql #structured query language pdf #structured query language ppt #structured query language