Wanda  Huel

Wanda Huel

1602982800

How to Scrape Tweets and create Dataset using Twint without Twitter API

Outline:

  1. Introduction
  2. Setup/Installation
  3. Pre-script
  4. Script
  5. into Production
  6. Use Cases
  7. Example
  8. Conclusion
  9. Connect

Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter’s API.

In this article, I’ll describe how I created a huge dataset of tweets scraped from an entire country.

Note: This article will prepare you for a production level script.

Installation

Git:

git clone https://github.com/twintproject/twint.git
cd twint
pip3 install . -r requirements.txt

Pip:

pip3 install twint

or

pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint

Pre-Script:

  1. Search for the cities list of the country (this could be the name of places in a city or state) on google. Like: Cities list of Pakistan
  2. After downloading the cities list, clean the data. And if it is not cleaned, you can also obtain this data from Wikipedia.
  3. After cleaning it, assign the list in the ‘all_cities’ variable script as below

#twitter #twint #machine-learning #scraping #dataset

What is GEEK

Buddha Community

How to Scrape Tweets and create Dataset using Twint without Twitter API
Easter  Deckow

Easter Deckow

1655630160

PyTumblr: A Python Tumblr API v2 Client

PyTumblr

Installation

Install via pip:

$ pip install pytumblr

Install from source:

$ git clone https://github.com/tumblr/pytumblr.git
$ cd pytumblr
$ python setup.py install

Usage

Create a client

A pytumblr.TumblrRestClient is the object you'll make all of your calls to the Tumblr API through. Creating one is this easy:

client = pytumblr.TumblrRestClient(
    '<consumer_key>',
    '<consumer_secret>',
    '<oauth_token>',
    '<oauth_secret>',
)

client.info() # Grabs the current user information

Two easy ways to get your credentials to are:

  1. The built-in interactive_console.py tool (if you already have a consumer key & secret)
  2. The Tumblr API console at https://api.tumblr.com/console
  3. Get sample login code at https://api.tumblr.com/console/calls/user/info

Supported Methods

User Methods

client.info() # get information about the authenticating user
client.dashboard() # get the dashboard for the authenticating user
client.likes() # get the likes for the authenticating user
client.following() # get the blogs followed by the authenticating user

client.follow('codingjester.tumblr.com') # follow a blog
client.unfollow('codingjester.tumblr.com') # unfollow a blog

client.like(id, reblogkey) # like a post
client.unlike(id, reblogkey) # unlike a post

Blog Methods

client.blog_info(blogName) # get information about a blog
client.posts(blogName, **params) # get posts for a blog
client.avatar(blogName) # get the avatar for a blog
client.blog_likes(blogName) # get the likes on a blog
client.followers(blogName) # get the followers of a blog
client.blog_following(blogName) # get the publicly exposed blogs that [blogName] follows
client.queue(blogName) # get the queue for a given blog
client.submission(blogName) # get the submissions for a given blog

Post Methods

Creating posts

PyTumblr lets you create all of the various types that Tumblr supports. When using these types there are a few defaults that are able to be used with any post type.

The default supported types are described below.

  • state - a string, the state of the post. Supported types are published, draft, queue, private
  • tags - a list, a list of strings that you want tagged on the post. eg: ["testing", "magic", "1"]
  • tweet - a string, the string of the customized tweet you want. eg: "Man I love my mega awesome post!"
  • date - a string, the customized GMT that you want
  • format - a string, the format that your post is in. Support types are html or markdown
  • slug - a string, the slug for the url of the post you want

We'll show examples throughout of these default examples while showcasing all the specific post types.

Creating a photo post

Creating a photo post supports a bunch of different options plus the described default options * caption - a string, the user supplied caption * link - a string, the "click-through" url for the photo * source - a string, the url for the photo you want to use (use this or the data parameter) * data - a list or string, a list of filepaths or a single file path for multipart file upload

#Creates a photo post using a source URL
client.create_photo(blogName, state="published", tags=["testing", "ok"],
                    source="https://68.media.tumblr.com/b965fbb2e501610a29d80ffb6fb3e1ad/tumblr_n55vdeTse11rn1906o1_500.jpg")

#Creates a photo post using a local filepath
client.create_photo(blogName, state="queue", tags=["testing", "ok"],
                    tweet="Woah this is an incredible sweet post [URL]",
                    data="/Users/johnb/path/to/my/image.jpg")

#Creates a photoset post using several local filepaths
client.create_photo(blogName, state="draft", tags=["jb is cool"], format="markdown",
                    data=["/Users/johnb/path/to/my/image.jpg", "/Users/johnb/Pictures/kittens.jpg"],
                    caption="## Mega sweet kittens")

Creating a text post

Creating a text post supports the same options as default and just a two other parameters * title - a string, the optional title for the post. Supports markdown or html * body - a string, the body of the of the post. Supports markdown or html

#Creating a text post
client.create_text(blogName, state="published", slug="testing-text-posts", title="Testing", body="testing1 2 3 4")

Creating a quote post

Creating a quote post supports the same options as default and two other parameter * quote - a string, the full text of the qote. Supports markdown or html * source - a string, the cited source. HTML supported

#Creating a quote post
client.create_quote(blogName, state="queue", quote="I am the Walrus", source="Ringo")

Creating a link post

  • title - a string, the title of post that you want. Supports HTML entities.
  • url - a string, the url that you want to create a link post for.
  • description - a string, the desciption of the link that you have
#Create a link post
client.create_link(blogName, title="I like to search things, you should too.", url="https://duckduckgo.com",
                   description="Search is pretty cool when a duck does it.")

Creating a chat post

Creating a chat post supports the same options as default and two other parameters * title - a string, the title of the chat post * conversation - a string, the text of the conversation/chat, with diablog labels (no html)

#Create a chat post
chat = """John: Testing can be fun!
Renee: Testing is tedious and so are you.
John: Aw.
"""
client.create_chat(blogName, title="Renee just doesn't understand.", conversation=chat, tags=["renee", "testing"])

Creating an audio post

Creating an audio post allows for all default options and a has 3 other parameters. The only thing to keep in mind while dealing with audio posts is to make sure that you use the external_url parameter or data. You cannot use both at the same time. * caption - a string, the caption for your post * external_url - a string, the url of the site that hosts the audio file * data - a string, the filepath of the audio file you want to upload to Tumblr

#Creating an audio file
client.create_audio(blogName, caption="Rock out.", data="/Users/johnb/Music/my/new/sweet/album.mp3")

#lets use soundcloud!
client.create_audio(blogName, caption="Mega rock out.", external_url="https://soundcloud.com/skrillex/sets/recess")

Creating a video post

Creating a video post allows for all default options and has three other options. Like the other post types, it has some restrictions. You cannot use the embed and data parameters at the same time. * caption - a string, the caption for your post * embed - a string, the HTML embed code for the video * data - a string, the path of the file you want to upload

#Creating an upload from YouTube
client.create_video(blogName, caption="Jon Snow. Mega ridiculous sword.",
                    embed="http://www.youtube.com/watch?v=40pUYLacrj4")

#Creating a video post from local file
client.create_video(blogName, caption="testing", data="/Users/johnb/testing/ok/blah.mov")

Editing a post

Updating a post requires you knowing what type a post you're updating. You'll be able to supply to the post any of the options given above for updates.

client.edit_post(blogName, id=post_id, type="text", title="Updated")
client.edit_post(blogName, id=post_id, type="photo", data="/Users/johnb/mega/awesome.jpg")

Reblogging a Post

Reblogging a post just requires knowing the post id and the reblog key, which is supplied in the JSON of any post object.

client.reblog(blogName, id=125356, reblog_key="reblog_key")

Deleting a post

Deleting just requires that you own the post and have the post id

client.delete_post(blogName, 123456) # Deletes your post :(

A note on tags: When passing tags, as params, please pass them as a list (not a comma-separated string):

client.create_text(blogName, tags=['hello', 'world'], ...)

Getting notes for a post

In order to get the notes for a post, you need to have the post id and the blog that it is on.

data = client.notes(blogName, id='123456')

The results include a timestamp you can use to make future calls.

data = client.notes(blogName, id='123456', before_timestamp=data["_links"]["next"]["query_params"]["before_timestamp"])

Tagged Methods

# get posts with a given tag
client.tagged(tag, **params)

Using the interactive console

This client comes with a nice interactive console to run you through the OAuth process, grab your tokens (and store them for future use).

You'll need pyyaml installed to run it, but then it's just:

$ python interactive-console.py

and away you go! Tokens are stored in ~/.tumblr and are also shared by other Tumblr API clients like the Ruby client.

Running tests

The tests (and coverage reports) are run with nose, like this:

python setup.py test

Author: tumblr
Source Code: https://github.com/tumblr/pytumblr
License: Apache-2.0 license

#python #api 

Chloe  Butler

Chloe Butler

1667425440

Pdf2gerb: Perl Script Converts PDF Files to Gerber format

pdf2gerb

Perl script converts PDF files to Gerber format

Pdf2Gerb generates Gerber 274X photoplotting and Excellon drill files from PDFs of a PCB. Up to three PDFs are used: the top copper layer, the bottom copper layer (for 2-sided PCBs), and an optional silk screen layer. The PDFs can be created directly from any PDF drawing software, or a PDF print driver can be used to capture the Print output if the drawing software does not directly support output to PDF.

The general workflow is as follows:

  1. Design the PCB using your favorite CAD or drawing software.
  2. Print the top and bottom copper and top silk screen layers to a PDF file.
  3. Run Pdf2Gerb on the PDFs to create Gerber and Excellon files.
  4. Use a Gerber viewer to double-check the output against the original PCB design.
  5. Make adjustments as needed.
  6. Submit the files to a PCB manufacturer.

Please note that Pdf2Gerb does NOT perform DRC (Design Rule Checks), as these will vary according to individual PCB manufacturer conventions and capabilities. Also note that Pdf2Gerb is not perfect, so the output files must always be checked before submitting them. As of version 1.6, Pdf2Gerb supports most PCB elements, such as round and square pads, round holes, traces, SMD pads, ground planes, no-fill areas, and panelization. However, because it interprets the graphical output of a Print function, there are limitations in what it can recognize (or there may be bugs).

See docs/Pdf2Gerb.pdf for install/setup, config, usage, and other info.


pdf2gerb_cfg.pm

#Pdf2Gerb config settings:
#Put this file in same folder/directory as pdf2gerb.pl itself (global settings),
#or copy to another folder/directory with PDFs if you want PCB-specific settings.
#There is only one user of this file, so we don't need a custom package or namespace.
#NOTE: all constants defined in here will be added to main namespace.
#package pdf2gerb_cfg;

use strict; #trap undef vars (easier debug)
use warnings; #other useful info (easier debug)


##############################################################################################
#configurable settings:
#change values here instead of in main pfg2gerb.pl file

use constant WANT_COLORS => ($^O !~ m/Win/); #ANSI colors no worky on Windows? this must be set < first DebugPrint() call

#just a little warning; set realistic expectations:
#DebugPrint("${\(CYAN)}Pdf2Gerb.pl ${\(VERSION)}, $^O O/S\n${\(YELLOW)}${\(BOLD)}${\(ITALIC)}This is EXPERIMENTAL software.  \nGerber files MAY CONTAIN ERRORS.  Please CHECK them before fabrication!${\(RESET)}", 0); #if WANT_DEBUG

use constant METRIC => FALSE; #set to TRUE for metric units (only affect final numbers in output files, not internal arithmetic)
use constant APERTURE_LIMIT => 0; #34; #max #apertures to use; generate warnings if too many apertures are used (0 to not check)
use constant DRILL_FMT => '2.4'; #'2.3'; #'2.4' is the default for PCB fab; change to '2.3' for CNC

use constant WANT_DEBUG => 0; #10; #level of debug wanted; higher == more, lower == less, 0 == none
use constant GERBER_DEBUG => 0; #level of debug to include in Gerber file; DON'T USE FOR FABRICATION
use constant WANT_STREAMS => FALSE; #TRUE; #save decompressed streams to files (for debug)
use constant WANT_ALLINPUT => FALSE; #TRUE; #save entire input stream (for debug ONLY)

#DebugPrint(sprintf("${\(CYAN)}DEBUG: stdout %d, gerber %d, want streams? %d, all input? %d, O/S: $^O, Perl: $]${\(RESET)}\n", WANT_DEBUG, GERBER_DEBUG, WANT_STREAMS, WANT_ALLINPUT), 1);
#DebugPrint(sprintf("max int = %d, min int = %d\n", MAXINT, MININT), 1); 

#define standard trace and pad sizes to reduce scaling or PDF rendering errors:
#This avoids weird aperture settings and replaces them with more standardized values.
#(I'm not sure how photoplotters handle strange sizes).
#Fewer choices here gives more accurate mapping in the final Gerber files.
#units are in inches
use constant TOOL_SIZES => #add more as desired
(
#round or square pads (> 0) and drills (< 0):
    .010, -.001,  #tiny pads for SMD; dummy drill size (too small for practical use, but needed so StandardTool will use this entry)
    .031, -.014,  #used for vias
    .041, -.020,  #smallest non-filled plated hole
    .051, -.025,
    .056, -.029,  #useful for IC pins
    .070, -.033,
    .075, -.040,  #heavier leads
#    .090, -.043,  #NOTE: 600 dpi is not high enough resolution to reliably distinguish between .043" and .046", so choose 1 of the 2 here
    .100, -.046,
    .115, -.052,
    .130, -.061,
    .140, -.067,
    .150, -.079,
    .175, -.088,
    .190, -.093,
    .200, -.100,
    .220, -.110,
    .160, -.125,  #useful for mounting holes
#some additional pad sizes without holes (repeat a previous hole size if you just want the pad size):
    .090, -.040,  #want a .090 pad option, but use dummy hole size
    .065, -.040, #.065 x .065 rect pad
    .035, -.040, #.035 x .065 rect pad
#traces:
    .001,  #too thin for real traces; use only for board outlines
    .006,  #minimum real trace width; mainly used for text
    .008,  #mainly used for mid-sized text, not traces
    .010,  #minimum recommended trace width for low-current signals
    .012,
    .015,  #moderate low-voltage current
    .020,  #heavier trace for power, ground (even if a lighter one is adequate)
    .025,
    .030,  #heavy-current traces; be careful with these ones!
    .040,
    .050,
    .060,
    .080,
    .100,
    .120,
);
#Areas larger than the values below will be filled with parallel lines:
#This cuts down on the number of aperture sizes used.
#Set to 0 to always use an aperture or drill, regardless of size.
use constant { MAX_APERTURE => max((TOOL_SIZES)) + .004, MAX_DRILL => -min((TOOL_SIZES)) + .004 }; #max aperture and drill sizes (plus a little tolerance)
#DebugPrint(sprintf("using %d standard tool sizes: %s, max aper %.3f, max drill %.3f\n", scalar((TOOL_SIZES)), join(", ", (TOOL_SIZES)), MAX_APERTURE, MAX_DRILL), 1);

#NOTE: Compare the PDF to the original CAD file to check the accuracy of the PDF rendering and parsing!
#for example, the CAD software I used generated the following circles for holes:
#CAD hole size:   parsed PDF diameter:      error:
#  .014                .016                +.002
#  .020                .02267              +.00267
#  .025                .026                +.001
#  .029                .03167              +.00267
#  .033                .036                +.003
#  .040                .04267              +.00267
#This was usually ~ .002" - .003" too big compared to the hole as displayed in the CAD software.
#To compensate for PDF rendering errors (either during CAD Print function or PDF parsing logic), adjust the values below as needed.
#units are pixels; for example, a value of 2.4 at 600 dpi = .0004 inch, 2 at 600 dpi = .0033"
use constant
{
    HOLE_ADJUST => -0.004 * 600, #-2.6, #holes seemed to be slightly oversized (by .002" - .004"), so shrink them a little
    RNDPAD_ADJUST => -0.003 * 600, #-2, #-2.4, #round pads seemed to be slightly oversized, so shrink them a little
    SQRPAD_ADJUST => +0.001 * 600, #+.5, #square pads are sometimes too small by .00067, so bump them up a little
    RECTPAD_ADJUST => 0, #(pixels) rectangular pads seem to be okay? (not tested much)
    TRACE_ADJUST => 0, #(pixels) traces seemed to be okay?
    REDUCE_TOLERANCE => .001, #(inches) allow this much variation when reducing circles and rects
};

#Also, my CAD's Print function or the PDF print driver I used was a little off for circles, so define some additional adjustment values here:
#Values are added to X/Y coordinates; units are pixels; for example, a value of 1 at 600 dpi would be ~= .002 inch
use constant
{
    CIRCLE_ADJUST_MINX => 0,
    CIRCLE_ADJUST_MINY => -0.001 * 600, #-1, #circles were a little too high, so nudge them a little lower
    CIRCLE_ADJUST_MAXX => +0.001 * 600, #+1, #circles were a little too far to the left, so nudge them a little to the right
    CIRCLE_ADJUST_MAXY => 0,
    SUBST_CIRCLE_CLIPRECT => FALSE, #generate circle and substitute for clip rects (to compensate for the way some CAD software draws circles)
    WANT_CLIPRECT => TRUE, #FALSE, #AI doesn't need clip rect at all? should be on normally?
    RECT_COMPLETION => FALSE, #TRUE, #fill in 4th side of rect when 3 sides found
};

#allow .012 clearance around pads for solder mask:
#This value effectively adjusts pad sizes in the TOOL_SIZES list above (only for solder mask layers).
use constant SOLDER_MARGIN => +.012; #units are inches

#line join/cap styles:
use constant
{
    CAP_NONE => 0, #butt (none); line is exact length
    CAP_ROUND => 1, #round cap/join; line overhangs by a semi-circle at either end
    CAP_SQUARE => 2, #square cap/join; line overhangs by a half square on either end
    CAP_OVERRIDE => FALSE, #cap style overrides drawing logic
};
    
#number of elements in each shape type:
use constant
{
    RECT_SHAPELEN => 6, #x0, y0, x1, y1, count, "rect" (start, end corners)
    LINE_SHAPELEN => 6, #x0, y0, x1, y1, count, "line" (line seg)
    CURVE_SHAPELEN => 10, #xstart, ystart, x0, y0, x1, y1, xend, yend, count, "curve" (bezier 2 points)
    CIRCLE_SHAPELEN => 5, #x, y, 5, count, "circle" (center + radius)
};
#const my %SHAPELEN =
#Readonly my %SHAPELEN =>
our %SHAPELEN =
(
    rect => RECT_SHAPELEN,
    line => LINE_SHAPELEN,
    curve => CURVE_SHAPELEN,
    circle => CIRCLE_SHAPELEN,
);

#panelization:
#This will repeat the entire body the number of times indicated along the X or Y axes (files grow accordingly).
#Display elements that overhang PCB boundary can be squashed or left as-is (typically text or other silk screen markings).
#Set "overhangs" TRUE to allow overhangs, FALSE to truncate them.
#xpad and ypad allow margins to be added around outer edge of panelized PCB.
use constant PANELIZE => {'x' => 1, 'y' => 1, 'xpad' => 0, 'ypad' => 0, 'overhangs' => TRUE}; #number of times to repeat in X and Y directions

# Set this to 1 if you need TurboCAD support.
#$turboCAD = FALSE; #is this still needed as an option?

#CIRCAD pad generation uses an appropriate aperture, then moves it (stroke) "a little" - we use this to find pads and distinguish them from PCB holes. 
use constant PAD_STROKE => 0.3; #0.0005 * 600; #units are pixels
#convert very short traces to pads or holes:
use constant TRACE_MINLEN => .001; #units are inches
#use constant ALWAYS_XY => TRUE; #FALSE; #force XY even if X or Y doesn't change; NOTE: needs to be TRUE for all pads to show in FlatCAM and ViewPlot
use constant REMOVE_POLARITY => FALSE; #TRUE; #set to remove subtractive (negative) polarity; NOTE: must be FALSE for ground planes

#PDF uses "points", each point = 1/72 inch
#combined with a PDF scale factor of .12, this gives 600 dpi resolution (1/72 * .12 = 600 dpi)
use constant INCHES_PER_POINT => 1/72; #0.0138888889; #multiply point-size by this to get inches

# The precision used when computing a bezier curve. Higher numbers are more precise but slower (and generate larger files).
#$bezierPrecision = 100;
use constant BEZIER_PRECISION => 36; #100; #use const; reduced for faster rendering (mainly used for silk screen and thermal pads)

# Ground planes and silk screen or larger copper rectangles or circles are filled line-by-line using this resolution.
use constant FILL_WIDTH => .01; #fill at most 0.01 inch at a time

# The max number of characters to read into memory
use constant MAX_BYTES => 10 * M; #bumped up to 10 MB, use const

use constant DUP_DRILL1 => TRUE; #FALSE; #kludge: ViewPlot doesn't load drill files that are too small so duplicate first tool

my $runtime = time(); #Time::HiRes::gettimeofday(); #measure my execution time

print STDERR "Loaded config settings from '${\(__FILE__)}'.\n";
1; #last value must be truthful to indicate successful load


#############################################################################################
#junk/experiment:

#use Package::Constants;
#use Exporter qw(import); #https://perldoc.perl.org/Exporter.html

#my $caller = "pdf2gerb::";

#sub cfg
#{
#    my $proto = shift;
#    my $class = ref($proto) || $proto;
#    my $settings =
#    {
#        $WANT_DEBUG => 990, #10; #level of debug wanted; higher == more, lower == less, 0 == none
#    };
#    bless($settings, $class);
#    return $settings;
#}

#use constant HELLO => "hi there2"; #"main::HELLO" => "hi there";
#use constant GOODBYE => 14; #"main::GOODBYE" => 12;

#print STDERR "read cfg file\n";

#our @EXPORT_OK = Package::Constants->list(__PACKAGE__); #https://www.perlmonks.org/?node_id=1072691; NOTE: "_OK" skips short/common names

#print STDERR scalar(@EXPORT_OK) . " consts exported:\n";
#foreach(@EXPORT_OK) { print STDERR "$_\n"; }
#my $val = main::thing("xyz");
#print STDERR "caller gave me $val\n";
#foreach my $arg (@ARGV) { print STDERR "arg $arg\n"; }

Download Details:

Author: swannman
Source Code: https://github.com/swannman/pdf2gerb

License: GPL-3.0 license

#perl 

Wanda  Huel

Wanda Huel

1602982800

How to Scrape Tweets and create Dataset using Twint without Twitter API

Outline:

  1. Introduction
  2. Setup/Installation
  3. Pre-script
  4. Script
  5. into Production
  6. Use Cases
  7. Example
  8. Conclusion
  9. Connect

Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter’s API.

In this article, I’ll describe how I created a huge dataset of tweets scraped from an entire country.

Note: This article will prepare you for a production level script.

Installation

Git:

git clone https://github.com/twintproject/twint.git
cd twint
pip3 install . -r requirements.txt

Pip:

pip3 install twint

or

pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint

Pre-Script:

  1. Search for the cities list of the country (this could be the name of places in a city or state) on google. Like: Cities list of Pakistan
  2. After downloading the cities list, clean the data. And if it is not cleaned, you can also obtain this data from Wikipedia.
  3. After cleaning it, assign the list in the ‘all_cities’ variable script as below

#twitter #twint #machine-learning #scraping #dataset

Sival Alethea

Sival Alethea

1624302000

APIs for Beginners - How to use an API (Full Course / Tutorial)

What is an API? Learn all about APIs (Application Programming Interfaces) in this full tutorial for beginners. You will learn what APIs do, why APIs exist, and the many benefits of APIs. APIs are used all the time in programming and web development so it is important to understand how to use them.

You will also get hands-on experience with a few popular web APIs. As long as you know the absolute basics of coding and the web, you’ll have no problem following along.
⭐️ Unit 1 - What is an API
⌨️ Video 1 - Welcome (0:00:00)
⌨️ Video 2 - Defining Interface (0:03:57)
⌨️ Video 3 - Defining API (0:07:51)
⌨️ Video 4 - Remote APIs (0:12:55)
⌨️ Video 5 - How the web works (0:17:04)
⌨️ Video 6 - RESTful API Constraint Scavenger Hunt (0:22:00)

⭐️ Unit 2 - Exploring APIs
⌨️ Video 1 - Exploring an API online (0:27:36)
⌨️ Video 2 - Using an API from the command line (0:44:30)
⌨️ Video 3 - Using Postman to explore APIs (0:53:56)
⌨️ Video 4 - Please please Mr. Postman (1:03:33)
⌨️ Video 5 - Using Helper Libraries (JavaScript) (1:14:41)
⌨️ Video 6 - Using Helper Libraries (Python) (1:24:40)

⭐️ Unit 3 - Using APIs
⌨️ Video 1 - Introducing the project (1:34:18)
⌨️ Video 2 - Flask app (1:36:07)
⌨️ Video 3 - Dealing with API Limits (1:50:00)
⌨️ Video 4 - JavaScript Single Page Application (1:54:27)
⌨️ Video 5 - Moar JavaScript and Recap (2:07:53)
⌨️ Video 6 - Review (2:18:03)
📺 The video in this post was made by freeCodeCamp.org
The origin of the article: https://www.youtube.com/watch?v=GZvSYJDk-us&list=PLWKjhJtqVAblfum5WiQblKPwIbqYXkDoC&index=5
🔥 If you’re a beginner. I believe the article below will be useful to you ☞ What You Should Know Before Investing in Cryptocurrency - For Beginner
⭐ ⭐ ⭐The project is of interest to the community. Join to Get free ‘GEEK coin’ (GEEKCASH coin)!
☞ **-----CLICK HERE-----**⭐ ⭐ ⭐
Thanks for visiting and watching! Please don’t forget to leave a like, comment and share!

#apis #apis for beginners #how to use an api #apis for beginners - how to use an api #application programming interfaces #learn all about apis

Top 10 API Security Threats Every API Team Should Know

As more and more data is exposed via APIs either as API-first companies or for the explosion of single page apps/JAMStack, API security can no longer be an afterthought. The hard part about APIs is that it provides direct access to large amounts of data while bypassing browser precautions. Instead of worrying about SQL injection and XSS issues, you should be concerned about the bad actor who was able to paginate through all your customer records and their data.

Typical prevention mechanisms like Captchas and browser fingerprinting won’t work since APIs by design need to handle a very large number of API accesses even by a single customer. So where do you start? The first thing is to put yourself in the shoes of a hacker and then instrument your APIs to detect and block common attacks along with unknown unknowns for zero-day exploits. Some of these are on the OWASP Security API list, but not all.

Insecure pagination and resource limits

Most APIs provide access to resources that are lists of entities such as /users or /widgets. A client such as a browser would typically filter and paginate through this list to limit the number items returned to a client like so:

First Call: GET /items?skip=0&take=10 
Second Call: GET /items?skip=10&take=10

However, if that entity has any PII or other information, then a hacker could scrape that endpoint to get a dump of all entities in your database. This could be most dangerous if those entities accidently exposed PII or other sensitive information, but could also be dangerous in providing competitors or others with adoption and usage stats for your business or provide scammers with a way to get large email lists. See how Venmo data was scraped

A naive protection mechanism would be to check the take count and throw an error if greater than 100 or 1000. The problem with this is two-fold:

  1. For data APIs, legitimate customers may need to fetch and sync a large number of records such as via cron jobs. Artificially small pagination limits can force your API to be very chatty decreasing overall throughput. Max limits are to ensure memory and scalability requirements are met (and prevent certain DDoS attacks), not to guarantee security.
  2. This offers zero protection to a hacker that writes a simple script that sleeps a random delay between repeated accesses.
skip = 0
while True:    response = requests.post('https://api.acmeinc.com/widgets?take=10&skip=' + skip),                      headers={'Authorization': 'Bearer' + ' ' + sys.argv[1]})    print("Fetched 10 items")    sleep(randint(100,1000))    skip += 10

How to secure against pagination attacks

To secure against pagination attacks, you should track how many items of a single resource are accessed within a certain time period for each user or API key rather than just at the request level. By tracking API resource access at the user level, you can block a user or API key once they hit a threshold such as “touched 1,000,000 items in a one hour period”. This is dependent on your API use case and can even be dependent on their subscription with you. Like a Captcha, this can slow down the speed that a hacker can exploit your API, like a Captcha if they have to create a new user account manually to create a new API key.

Insecure API key generation

Most APIs are protected by some sort of API key or JWT (JSON Web Token). This provides a natural way to track and protect your API as API security tools can detect abnormal API behavior and block access to an API key automatically. However, hackers will want to outsmart these mechanisms by generating and using a large pool of API keys from a large number of users just like a web hacker would use a large pool of IP addresses to circumvent DDoS protection.

How to secure against API key pools

The easiest way to secure against these types of attacks is by requiring a human to sign up for your service and generate API keys. Bot traffic can be prevented with things like Captcha and 2-Factor Authentication. Unless there is a legitimate business case, new users who sign up for your service should not have the ability to generate API keys programmatically. Instead, only trusted customers should have the ability to generate API keys programmatically. Go one step further and ensure any anomaly detection for abnormal behavior is done at the user and account level, not just for each API key.

Accidental key exposure

APIs are used in a way that increases the probability credentials are leaked:

  1. APIs are expected to be accessed over indefinite time periods, which increases the probability that a hacker obtains a valid API key that’s not expired. You save that API key in a server environment variable and forget about it. This is a drastic contrast to a user logging into an interactive website where the session expires after a short duration.
  2. The consumer of an API has direct access to the credentials such as when debugging via Postman or CURL. It only takes a single developer to accidently copy/pastes the CURL command containing the API key into a public forum like in GitHub Issues or Stack Overflow.
  3. API keys are usually bearer tokens without requiring any other identifying information. APIs cannot leverage things like one-time use tokens or 2-factor authentication.

If a key is exposed due to user error, one may think you as the API provider has any blame. However, security is all about reducing surface area and risk. Treat your customer data as if it’s your own and help them by adding guards that prevent accidental key exposure.

How to prevent accidental key exposure

The easiest way to prevent key exposure is by leveraging two tokens rather than one. A refresh token is stored as an environment variable and can only be used to generate short lived access tokens. Unlike the refresh token, these short lived tokens can access the resources, but are time limited such as in hours or days.

The customer will store the refresh token with other API keys. Then your SDK will generate access tokens on SDK init or when the last access token expires. If a CURL command gets pasted into a GitHub issue, then a hacker would need to use it within hours reducing the attack vector (unless it was the actual refresh token which is low probability)

Exposure to DDoS attacks

APIs open up entirely new business models where customers can access your API platform programmatically. However, this can make DDoS protection tricky. Most DDoS protection is designed to absorb and reject a large number of requests from bad actors during DDoS attacks but still need to let the good ones through. This requires fingerprinting the HTTP requests to check against what looks like bot traffic. This is much harder for API products as all traffic looks like bot traffic and is not coming from a browser where things like cookies are present.

Stopping DDoS attacks

The magical part about APIs is almost every access requires an API Key. If a request doesn’t have an API key, you can automatically reject it which is lightweight on your servers (Ensure authentication is short circuited very early before later middleware like request JSON parsing). So then how do you handle authenticated requests? The easiest is to leverage rate limit counters for each API key such as to handle X requests per minute and reject those above the threshold with a 429 HTTP response. There are a variety of algorithms to do this such as leaky bucket and fixed window counters.

Incorrect server security

APIs are no different than web servers when it comes to good server hygiene. Data can be leaked due to misconfigured SSL certificate or allowing non-HTTPS traffic. For modern applications, there is very little reason to accept non-HTTPS requests, but a customer could mistakenly issue a non HTTP request from their application or CURL exposing the API key. APIs do not have the protection of a browser so things like HSTS or redirect to HTTPS offer no protection.

How to ensure proper SSL

Test your SSL implementation over at Qualys SSL Test or similar tool. You should also block all non-HTTP requests which can be done within your load balancer. You should also remove any HTTP headers scrub any error messages that leak implementation details. If your API is used only by your own apps or can only be accessed server-side, then review Authoritative guide to Cross-Origin Resource Sharing for REST APIs

Incorrect caching headers

APIs provide access to dynamic data that’s scoped to each API key. Any caching implementation should have the ability to scope to an API key to prevent cross-pollution. Even if you don’t cache anything in your infrastructure, you could expose your customers to security holes. If a customer with a proxy server was using multiple API keys such as one for development and one for production, then they could see cross-pollinated data.

#api management #api security #api best practices #api providers #security analytics #api management policies #api access tokens #api access #api security risks #api access keys