This crawler automates the following step:
# upload pdf to googledrive, store data and notify via email python script/spider.py -c config/prod.cfg -u googledrive -s firebase -n gmail
# download all format python script/spider.py --config config/prod.cfg --all # download only one format: pdf|epub|mobi python script/spider.py --config config/prod.cfg --type pdf # download also additional material: source code (if exists) and book cover python script/spider.py --config config/prod.cfg -t pdf --extras # equivalent (default is pdf) python script/spider.py -c config/prod.cfg -e # download and then upload to Google Drive (given the download url anyone can download it) python script/spider.py -c config/prod.cfg -t epub --upload googledrive python script/spider.py --config config/prod.cfg --all --extras --upload googledrive # download and then upload to OneDrive (given the download url anyone can download it) python script/spider.py -c config/prod.cfg -t epub --upload onedrive python script/spider.py --config config/prod.cfg --all --extras --upload onedrive # download and notify: gmail|ifttt|join|pushover python script/spider.py -c config/prod.cfg --notify gmail # only claim book (no downloads): python script/spider.py -c config/prod.cfg --notify gmail --claimOnly
Before you start you should
git clone https://github.com/niqdev/packtpub-crawler.git
pip install -r requirements.txt(see also virtualenv)
cp config/prod_example.cfg config/prod.cfg
[credential] credential.email=PACKTPUB_EMAIL credential.password=PACKTPUB_PASSWORD
Now you should be able to claim and download your first eBook
python script/spider.py --config config/prod.cfg
From the documentation, Google Drive API requires OAuth2.0 for authentication, so to upload files you should:
[googledrive] ... googledrive.client_secrets=config/client_secrets.json googledrive.gmail=GOOGLE_DRIVE@gmail.com
Now you should be able to upload your eBook to Google Drive
python script/spider.py --config config/prod.cfg --upload googledrive
config/auth_token.json. You should also copy and paste in the config the FOLDER_ID, otherwise every time a new folder with the same name will be created.
[googledrive] ... googledrive.default_folder=packtpub googledrive.upload_folder=FOLDER_ID
Documentation: OAuth, Quickstart, example and permissions
From the documentation, OneDrive API requires OAuth2.0 for authentication, so to upload files you should:
[onedrive] ... onedrive.client_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx onedrive.client_secret=XxXxXxXxXxXxXxXxXxXxXxX
Now you should be able to upload your eBook to OneDrive
python script/spider.py --config config/prod.cfg --upload onedrive
[onedrive] ... onedrive.folder=packtpub
Documentation: Registration, Python API
To upload your eBook via
scp on a remote server update the configs
[scp] scp.host=SCP_HOST scp.user=SCP_USER scp.password=SCP_PASSWORD scp.path=SCP_UPLOAD_PATH
Now you should be able to upload your eBook
python script/spider.py --config config/prod.cfg --upload scp
scp.pathon the remote server must exists in advance
--upload scpis incompatible with
Create a new Firebase project, copy the database secret from your settings
and update the configs
[firebase] firebase.database_secret=DATABASE_SECRET firebase.url=https://PROJECT_NAME.firebaseio.com
Now you should be able to store your eBook details on Firebase
python script/spider.py --config config/prod.cfg --upload googledrive --store firebase
To send a notification via email using Gmail you should:
[gmail] ... gmail.username=EMAIL_USERNAME@gmail.com gmail.password=EMAIL_PASSWORD gmail.from=FROM_EMAIL@gmail.com gmail.to=TO_EMAIL_1@gmail.com,TO_EMAIL_2@gmail.com
Now you should be able to notify your accounts
python script/spider.py --config config/prod.cfg --notify gmail
[ifttt] ifttt.event_name=packtpub-crawler ifttt.key=IFTTT_MAKER_KEY
Now you should be able to trigger the applet
python script/spider.py --config config/prod.cfg --notify ifttt
[join] join.device_ids=DEVICE_IDS_COMMA_SEPARATED_OR_GROUP_NAME join.api_key=API_KEY
Now you should be able to trigger the event
python script/spider.py --config config/prod.cfg --notify join
[pushover] pushover.user_key=PUSHOVER_USER_KEY pushover.api_key=PUSHOVER_API_KEY
Create a new branch
git checkout -b heroku-scheduler
.gitignore and commit your changes
# remove config/prod.cfg config/client_secrets.json config/auth_token.json # add dev/ config/dev.cfg config/prod_example.cfg
Create, config and deploy the scheduler
heroku login # create a new app heroku create APP_NAME --region eu # or if you already have an existing app heroku git:remote -a APP_NAME # deploy your app git push -u heroku heroku-scheduler:master heroku ps:scale clock=1 # useful commands heroku ps heroku logs --ps clock.1 heroku logs --tail heroku run bash
script/scheduler.py with your own preferences.
More info about Heroku Scheduler, Clock Processes, Add-on and APScheduler
Build your image
docker build -t niqdev/packtpub-crawler:2.4.0 .
docker run \ --rm \ --name my-packtpub-crawler \ niqdev/packtpub-crawler:2.4.0 \ python script/spider.py --config config/prod.cfg
Run scheduled crawler in background
docker run \ --detach \ --name my-packtpub-crawler \ niqdev/packtpub-crawler:2.4.0 # useful commands docker exec -i -t my-packtpub-crawler bash docker logs -f my-packtpub-crawler
Alternatively you can pull from Docker Hub this fork
docker pull kuchy/packtpub-crawler
Add this to your crontab to run the job daily at 9 AM:
crontab -e 00 09 * * * cd PATH_TO_PROJECT/packtpub-crawler && /usr/bin/python script/spider.py --config config/prod.cfg >> /tmp/packtpub.log 2>&1
Create two files in /etc/systemd/system:
[Unit] Description=run packtpub-crawler [Service] User=USER_THAT_SHOULD_RUN_THE_SCRIPT ExecStart=/usr/bin/python2.7 PATH_TO_PROJECT/packtpub-crawler/script/spider.py -c config/prod.cfg [Install] WantedBy=multi-user.target
[Unit] Description=Runs packtpub-crawler every day at 7 [Timer] OnBootSec=10min OnActiveSec=1s OnCalendar=*-*-* 07:00:00 Unit=packtpub_crawler.service Persistent=true [Install] WantedBy=multi-user.target
Enable the script with
sudo systemctl enable packtpub_crawler.timer. You can test the service with
sudo systemctl start packtpub_crawler.timer and see the output with
sudo journalctl -u packtpub_crawler.service -f.
The script downloads also the free ebooks from the weekly packtpub newsletter. The URL is generated by a Google Apps Script which parses all the mails. You can get the code here, if you want to see the actual script, please clone the spreadsheet and go to
Tools > Script editor....
To use your own source, modify in the config
The URL should point to a file containing only the URL (no semicolons, HTML, JSON, etc).
You can also clone the spreadsheet to use your own Gmail account. Subscribe to the newsletter (on the bottom of the page) and create a filter to tag your mails accordingly.
Install paramiko with
sudo -H pip install paramiko --ignore-installed
Install missing dependencies as described here
# install pip + setuptools curl https://bootstrap.pypa.io/get-pip.py | python - # upgrade pip pip install -U pip # install virtualenv globally sudo pip install virtualenv # create virtualenv virtualenv env # activate virtualenv source env/bin/activate # verify virtualenv which python python --version # deactivate virtualenv deactivate
Run a simple static server with
and test the crawler with
python script/spider.py --dev --config config/dev.cfg --all
This project is just a Proof of Concept and not intended for any illegal usage. I'm not responsible for any damage or abuse, use it at your own risk.
Source Code: https://github.com/niqdev/packtpub-crawler
License: MIT license
In this step-by-step tutorial, learn the top 10 best Microsoft OneDrive tips & tricks. For example, backup your camera photos and videos, scan documents in high quality, set expiring share links, get increased security with the personal vault, access version history, and many more.
0:14 What is OneDrive & plans
1:26 Phone camera upload / backup
3:34 Scan documents
5:07 Share & collaborate
7:28 Expire share link
7:55 Password protected share link
8:32 Personal vault
9:51 Version history
10:44 Keep file on device & OneDrive
11:28 Embed files on web sites
13:00 Restore OneDrive
Microsoft has suspended 18 Azure Active Directory applications that were being leveraged for command-and-control (C2) infrastructure by what it says is a Chinese nation-state actor.
While Microsoft services like Azure Active Directory (AD) – its cloud-based identity and access management service – are popular among enterprises, cybercriminals are also swooping in on these services to enhance the weaponization of their malware payload, attempt to gain command and control all the way to the server, and obfuscate detection. One such threat group recently spotted leveraging these cloud services and open source tools is what Microsoft calls Gadolinium, a Chinese nation-state activity group that has been compromising targets for nearly a decade.
After compromising victim devices, Gadolinium was setting up AD accounts to receive commands from and send stolen data to its C2 server. But beyond that, the threat group has also stored stolen data in Microsoft’s file hosting service and synchronization service, OneDrive; launched attacks using the open source PowershellEmpire toolkit and used GitHub to host commands.
“Gadolinium has been experimenting with using cloud services to deliver their attacks to increase both operation speed and scale for years,” said Ben Koehl and Joe Hannon, with Microsoft’s Threat Intelligence Center, in a report posted Thursday.
Gadolinium is also known as APT40, which researchers like FireEye have assessed with “moderate confidence” is a state sponsored espionage actor attributed to China. While previously Gadolinium has targeted worldwide maritime and health industries, Microsoft said recently it has observed newly expanded targeting for the threat group to include the Asia Pacific region and other targets in higher education and regional government organizations.
In mid-April 2020 the threat actors were detected sending spear-phishing emails with malicious attachments, with lures relating to the COVID-19 pandemic. When opened, the attached PowerPoint file (20200423-sitrep-92-covid-19.ppt), would drop a file, doc1.dotm, which then has two payloads that run in succession.
These include a payload that turns off a type check (DisableActivitySurrogateSelectorTypeCheck), while the second loads an embedded .Net binary that downloads a .Png image file.
“The .png is actually PowerShell which downloads and uploads fake png files using the Microsoft Graph API to https://graph.microsoft.com/v1.0/drive/root:/onlinework/contact/$($ID)_1.png:/content where $ID is the ID of the malware,” said researchers.
Behind the scenes, these attacks relied on a bundle of Microsoft services and open source tooling – which Microsoft said has been a steady trend in recent years for several nation-state activity groups migrating to open source tooling.
#cloud security #hacks #web security #apt40 #azure #azure active directory #chinese state sponsored hacker #cloud attack #covid-19 pandemic #gadolinium #github #hacker #microsoft #onedrive #open source #powershell #powershell empire toolkit #spear phishing attack
OneDrive to Google Cloud Storage
Transfer files from Microsoft OneDrive to Google Cloud Storage.
People who wants to transfer files from Microsoft OneDrive to Google Cloud Storage
OneDrive gives free storage for a limited time, for example if you buy a new Samsung phone, OneDrive will give you additional 100GB free storage for 2 years. And Samsung Gallery backups your Camera photos to OneDrive. For 2 years, everything is awesome with that 100 GB free storage. But when the free storage expires, you are left with only 15 GB storage, Samsung Gallery and Microsoft bombards you with the alerts to upgrade your account.
You are left with 3 choices:
I wanted to transfer my files to Google Cloud Storage, because it is cheap and accessible for archival needs.
You should provide necessary data as environment variables. You can provide environment variables in a
Access token to be able to access OneDrive API. You can get it by following:
Access Token, click that tab, your access token will appear.
Simply copy that token and past it to
.env file as
Name of the Google Cloud Storage bucket. Make sure that there is a bucket of that name, if not, go to Google Cloud Console and create a bucket of that name.
Id of your OneDrive drive. You can get it by following:
Relative path of folder that has your files. For example if you want to transfer files which are located in
OneDrive->Personal Files->Photos->Exciting Day, then you should provide
ONE_DRIVE_FOLDER_RELATIVE_PATH=Personal Files/Photos/Exciting Day as environment variable.
If you run the script in a Compute Engine which is located in the same project with the bucket, you won’t need to authenticate. But if you are outside of the Google Cloud, you should visit https://cloud.google.com/docs/authentication/getting-started and follow the instructions to authenticate with Google Cloud.
npm run build
npm run start
#cloud #onedrive #web-development #developer