1658542027
Geziyor is a blazing fast web crawling and web scraping framework. It can be used to crawl websites and extract structured data from them. Geziyor is useful for a wide range of purposes such as data mining, monitoring and automated testing.
See scraper Options for all custom settings.
We highly recommend you to use Geziyor with go modules.
This example extracts all quotes from quotes.toscrape.com and exports to JSON file.
func main() {
geziyor.NewGeziyor(&geziyor.Options{
StartURLs: []string{"http://quotes.toscrape.com/"},
ParseFunc: quotesParse,
Exporters: []export.Exporter{&export.JSON{}},
}).Start()
}
func quotesParse(g *geziyor.Geziyor, r *client.Response) {
r.HTMLDoc.Find("div.quote").Each(func(i int, s *goquery.Selection) {
g.Exports <- map[string]interface{}{
"text": s.Find("span.text").Text(),
"author": s.Find("small.author").Text(),
}
})
if href, ok := r.HTMLDoc.Find("li.next > a").Attr("href"); ok {
g.Get(r.JoinURL(href), quotesParse)
}
}
See tests for more usage examples.
go get -u github.com/geziyor/geziyor
If you want to make JS rendered requests, make sure you have Chrome installed.
NOTE: macOS limits the maximum number of open file descriptors. If you want to make concurrent requests over 256, you need to increase limits. Read this for more.
Initial requests start with StartURLs []string
field in Options
. Geziyor makes concurrent requests to those URLs. After reading response, ParseFunc func(g *Geziyor, r *Response)
called.
geziyor.NewGeziyor(&geziyor.Options{
StartURLs: []string{"http://api.ipify.org"},
ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
fmt.Println(string(r.Body))
},
}).Start()
If you want to manually create first requests, set StartRequestsFunc
. StartURLs
won't be used if you create requests manually.
You can make requests using Geziyor
methods:
geziyor.NewGeziyor(&geziyor.Options{
StartRequestsFunc: func(g *geziyor.Geziyor) {
g.Get("https://httpbin.org/anything", g.Opt.ParseFunc)
g.Head("https://httpbin.org/anything", g.Opt.ParseFunc)
},
ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
fmt.Println(string(r.Body))
},
}).Start()
JS Rendered requests can be made using GetRendered
method. By default, geziyor uses local Chrome application CLI to start Chrome browser. Set BrowserEndpoint
option to use different chrome instance. Such as, "ws://localhost:3000"
geziyor.NewGeziyor(&geziyor.Options{
StartRequestsFunc: func(g *geziyor.Geziyor) {
g.GetRendered("https://httpbin.org/anything", g.Opt.ParseFunc)
},
ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
fmt.Println(string(r.Body))
},
//BrowserEndpoint: "ws://localhost:3000",
}).Start()
We can extract HTML elements using response.HTMLDoc
. HTMLDoc is Goquery's Document.
HTMLDoc can be accessible on Response if response is HTML and can be parsed using Go's built-in HTML parser If response isn't HTML, response.HTMLDoc
would be nil
.
geziyor.NewGeziyor(&geziyor.Options{
StartURLs: []string{"http://quotes.toscrape.com/"},
ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
r.HTMLDoc.Find("div.quote").Each(func(_ int, s *goquery.Selection) {
log.Println(s.Find("span.text").Text(), s.Find("small.author").Text())
})
},
}).Start()
You can export data automatically using exporters. Just send data to Geziyor.Exports
chan. Available exporters
geziyor.NewGeziyor(&geziyor.Options{
StartURLs: []string{"http://quotes.toscrape.com/"},
ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
r.HTMLDoc.Find("div.quote").Each(func(_ int, s *goquery.Selection) {
g.Exports <- map[string]interface{}{
"text": s.Find("span.text").Text(),
"author": s.Find("small.author").Text(),
}
})
},
Exporters: []export.Exporter{&export.JSON{}},
}).Start()
You can create custom requests with client.NewRequest
Use that request on geziyor.Do(request, callback)
geziyor.NewGeziyor(&geziyor.Options{
StartRequestsFunc: func(g *geziyor.Geziyor) {
req, _ := client.NewRequest("GET", "https://httpbin.org/anything", nil)
req.Meta["key"] = "value"
g.Do(req, g.Opt.ParseFunc)
},
ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
fmt.Println("This is our data from request: ", r.Request.Meta["key"])
},
}).Start()
If you want to use proxy for your requests, and you have 1 proxy, you can just set these env values: HTTP_PROXY
HTTPS_PROXY
And geziyor will use those proxies.
Also, you can use in-order proxy per request by setting ProxyFunc
option to client.RoundRobinProxy
Or any custom proxy selection function that you want. See client/proxy.go
on how to implement that kind of custom proxy selection function.
Proxies can be HTTP, HTTPS and SOCKS5.
Note: If you use http
scheme for proxy, It'll be used for http requests and not for https requests.
geziyor.NewGeziyor(&geziyor.Options{
StartURLs: []string{"http://httpbin.org/anything"},
ParseFunc: parseFunc,
ProxyFunc: client.RoundRobinProxy("http://some-http-proxy.com", "https://some-https-proxy.com", "socks5://some-socks5-proxy.com"),
}).Start()
8748 request per seconds on Macbook Pro 15" 2016
See tests for this benchmark function:
>> go test -run none -bench Requests -benchtime 10s
goos: darwin
goarch: amd64
pkg: github.com/geziyor/geziyor
BenchmarkRequests-8 200000 108710 ns/op
PASS
ok github.com/geziyor/geziyor 22.861s
Author: geziyor
Source Code: https://github.com/geziyor/geziyor
License: MPL-2.0 license
1656997740
Geziyor is a blazing fast web crawling and web scraping framework. It can be used to crawl websites and extract structured data from them. Geziyor is useful for a wide range of purposes such as data mining, monitoring and automated testing.
See scraper Options for all custom settings.
We highly recommend you to use Geziyor with go modules.
This example extracts all quotes from quotes.toscrape.com and exports to JSON file.
func main() {
geziyor.NewGeziyor(&geziyor.Options{
StartURLs: []string{"http://quotes.toscrape.com/"},
ParseFunc: quotesParse,
Exporters: []export.Exporter{&export.JSON{}},
}).Start()
}
func quotesParse(g *geziyor.Geziyor, r *client.Response) {
r.HTMLDoc.Find("div.quote").Each(func(i int, s *goquery.Selection) {
g.Exports <- map[string]interface{}{
"text": s.Find("span.text").Text(),
"author": s.Find("small.author").Text(),
}
})
if href, ok := r.HTMLDoc.Find("li.next > a").Attr("href"); ok {
g.Get(r.JoinURL(href), quotesParse)
}
}
See tests for more usage examples.
go get -u github.com/geziyor/geziyor
If you want to make JS rendered requests, make sure you have Chrome installed.
NOTE: macOS limits the maximum number of open file descriptors. If you want to make concurrent requests over 256, you need to increase limits. Read this for more.
Initial requests start with StartURLs []string
field in Options
. Geziyor makes concurrent requests to those URLs. After reading response, ParseFunc func(g *Geziyor, r *Response)
called.
geziyor.NewGeziyor(&geziyor.Options{
StartURLs: []string{"http://api.ipify.org"},
ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
fmt.Println(string(r.Body))
},
}).Start()
If you want to manually create first requests, set StartRequestsFunc
. StartURLs
won't be used if you create requests manually.
You can make requests using Geziyor
methods:
geziyor.NewGeziyor(&geziyor.Options{
StartRequestsFunc: func(g *geziyor.Geziyor) {
g.Get("https://httpbin.org/anything", g.Opt.ParseFunc)
g.Head("https://httpbin.org/anything", g.Opt.ParseFunc)
},
ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
fmt.Println(string(r.Body))
},
}).Start()
JS Rendered requests can be made using GetRendered
method. By default, geziyor uses local Chrome application CLI to start Chrome browser. Set BrowserEndpoint
option to use different chrome instance. Such as, "ws://localhost:3000"
geziyor.NewGeziyor(&geziyor.Options{
StartRequestsFunc: func(g *geziyor.Geziyor) {
g.GetRendered("https://httpbin.org/anything", g.Opt.ParseFunc)
},
ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
fmt.Println(string(r.Body))
},
//BrowserEndpoint: "ws://localhost:3000",
}).Start()
We can extract HTML elements using response.HTMLDoc
. HTMLDoc is Goquery's Document.
HTMLDoc can be accessible on Response if response is HTML and can be parsed using Go's built-in HTML parser If response isn't HTML, response.HTMLDoc
would be nil
.
geziyor.NewGeziyor(&geziyor.Options{
StartURLs: []string{"http://quotes.toscrape.com/"},
ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
r.HTMLDoc.Find("div.quote").Each(func(_ int, s *goquery.Selection) {
log.Println(s.Find("span.text").Text(), s.Find("small.author").Text())
})
},
}).Start()
You can export data automatically using exporters. Just send data to Geziyor.Exports
chan. Available exporters
geziyor.NewGeziyor(&geziyor.Options{
StartURLs: []string{"http://quotes.toscrape.com/"},
ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
r.HTMLDoc.Find("div.quote").Each(func(_ int, s *goquery.Selection) {
g.Exports <- map[string]interface{}{
"text": s.Find("span.text").Text(),
"author": s.Find("small.author").Text(),
}
})
},
Exporters: []export.Exporter{&export.JSON{}},
}).Start()
You can create custom requests with client.NewRequest
Use that request on geziyor.Do(request, callback)
geziyor.NewGeziyor(&geziyor.Options{
StartRequestsFunc: func(g *geziyor.Geziyor) {
req, _ := client.NewRequest("GET", "https://httpbin.org/anything", nil)
req.Meta["key"] = "value"
g.Do(req, g.Opt.ParseFunc)
},
ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
fmt.Println("This is our data from request: ", r.Request.Meta["key"])
},
}).Start()
If you want to use proxy for your requests, and you have 1 proxy, you can just set these env values: HTTP_PROXY
HTTPS_PROXY
And geziyor will use those proxies.
Also, you can use in-order proxy per request by setting ProxyFunc
option to client.RoundRobinProxy
Or any custom proxy selection function that you want. See client/proxy.go
on how to implement that kind of custom proxy selection function.
Proxies can be HTTP, HTTPS and SOCKS5.
Note: If you use http
scheme for proxy, It'll be used for http requests and not for https requests.
geziyor.NewGeziyor(&geziyor.Options{
StartURLs: []string{"http://httpbin.org/anything"},
ParseFunc: parseFunc,
ProxyFunc: client.RoundRobinProxy("http://some-http-proxy.com", "https://some-https-proxy.com", "socks5://some-socks5-proxy.com"),
}).Start()
8748 request per seconds on Macbook Pro 15" 2016
See tests for this benchmark function:
>> go test -run none -bench Requests -benchtime 10s
goos: darwin
goarch: amd64
pkg: github.com/geziyor/geziyor
BenchmarkRequests-8 200000 108710 ns/op
PASS
ok github.com/geziyor/geziyor 22.861s
Author: geziyor
Source Code: https://github.com/geziyor/geziyor
License: MPL-2.0 license
1619172468
Web development frameworks are a powerful answer for businesses to accomplish a unique web app as they play a vital role in providing tools and libraries for developers to use.
Most businesses strive to seek offbeat web applications that can perform better and enhance traffic to the site. Plus, it is imperative to have such apps as the competition is very high in the digital world.
Developers find it sophisticated to use the libraries and templates provided by frameworks to make interactive and user-friendly web applications. Moreover, frameworks assist them in increasing the efficiency, performance, and productivity of the web development task.
Before getting deep into it, let’s have a quick glance at the below facts and figures below that will help you comprehend the utility of the frameworks.
As per Statista, 35.9% of developers used React in 2020.
25.1% of developers used the Angular framework worldwide.
According to SimilarTech, 2,935 websites use the Spring framework, most popular among the News and Media domain.
What is a Framework?
A framework is a set of tools that paves the way for web developers to create rich and interactive web apps. It comprises libraries, templates, and specific software tools. Additionally, it enables them to develop a hassle-free application by not rewriting the same code to build the application.
There are two categories of frameworks: the back-end framework, known as the server-side, and the front-end framework, known as the client-side.
The backend framework refers to a web page portion that you can not see, and it communicates with the front end one. On the other hand, the front-end is a part of the web that users can see and experience.
You can understand by an example that what you see on the app is the front-end part, and the communication you make with it is the part of the back end.
Read the full blog here
Hence, depending on your web development application requirements, you can hire web developers from India’s best web development company. In no time, you will be amongst those who are reaping the results of using web development frameworks for the applications.
#web-development-frameworks #web-frameworks #top-web-frameworks #best-web-development-frameworks
1599623760
In the last few years, web scraping has been one of my day to day and frequently needed tasks. I was wondering if I can make it smart and automatic to save lots of time. So I made AutoScraper!
The project code is available on Github.
This project is made for automatic web scraping to make scraping easy. It gets a url or the html content of a web page and a list of sample data which we want to scrape from that page. This data can be text, url or any html tag value of that page. It learns the scraping rules and returns the similar elements. Then you can use this learned object with new urls to get similar content or the exact same element of those new pages!
Install latest version from git repository using pip:
$ pip install git+https://github.com/alirezamika/autoscraper.git
Getting similar results
Say we want to fetch all related post titles in a stackoverflow page:
from autoscraper import AutoScraper
url = 'https://stackoverflow.com/questions/2081586/web-scraping-with-python'
## We can add one or multiple candidates here.
## You can also put urls here to retrieve urls.
wanted_list = ["How to call an external command?"]
scraper = AutoScraper()
result = scraper.build(url, wanted_list)
print(result)
#python #web-scraping #web-crawling #data-scraping #website-scraping #open-source #repositories-on-github #web-development
1655225700
go-web-framework-benchmark
This benchmark suite aims to compare the performance of Go web frameworks. It is inspired by Go HTTP Router Benchmark but this benchmark suite is different with that. Go HTTP Router Benchmark suit aims to compare the performance of routers but this Benchmark suit aims to compare whole HTTP request processing.
Last Test Updated: 2020-05
test environment
When I investigated performance of Go web frameworks, I found Go HTTP Router Benchmark, created by Julien Schmidt. He also developed a high performance http router: httprouter. I had thought I got the performance result until I created a piece of codes to mock the real business logics:
api.Get("/rest/hello", func(c *XXXXX.Context) {
sleepTime := strconv.Atoi(os.Args[1]) //10ms
if sleepTime > 0 {
time.Sleep(time.Duration(sleepTime) * time.Millisecond)
}
c.Text("Hello world")
})
When I use the above codes to test those web frameworks, the token time of route selection is not so important in the whole http request processing, although performance of route selection of web frameworks are very different.
So I create this project to compare performance of web frameworks including connection, route selection, handler processing. It mocks business logics and can set a special processing time.
The you can get some interesting results if you use it to test.
When you test a web framework, this test suit will starts a simple http server implemented by this web framework. It is a real http server and only contains GET url: "/hello".
When this server processes this url, it will sleep n milliseconds in this handler. It mocks the business logics such as:
It contains a test.sh that can do those tests automatically.
It uses wrk to test.
The first test case is to mock 0 ms, 10 ms, 100 ms, 500 ms processing time in handlers.
the concurrency clients are 5000.
Latency is the time of real processing time by web servers. The smaller is the better.
Allocs is the heap allocations by web servers when test is running. The unit is MB. The smaller is the better.
If we enable http pipelining, test result as below:
In 30 ms processing time, the test result for 100, 1000, 5000 clients is:
If we enable http pipelining, test result as below:
You should install this package first if you want to run this test.
go get github.com/smallnest/go-web-framework-benchmark
It takes a while to install a large number of dependencies that need to be downloaded. Once that command completes, you can run:
cd $GOPATH/src/github.com/smallnest/go-web-framework-benchmark
go build -o gowebbenchmark *.go
./test.sh
It will generate test results in processtime.csv and concurrency.csv. You can modify test.sh to execute your customized test cases.
./test-latency.sh
./test-latency-nonkeepalive.sh
./test-pipelining.sh
……
web_frameworks=( "default" "ace" "beego" "bone" "denco" "echov1" "echov2standard" "echov2fasthttp" "fasthttp-raw" "fasthttprouter" "fasthttp-routing" "gin" "gocraftWeb" "goji" "gojiv2" "gojsonrest" "gorestful" "gorilla" "httprouter" "httptreemux" "lars" "lion" "macaron" "martini" "pat" "r2router" "tango" "tiger" "traffic" "violetear" "vulcan")
……
./test-all.sh
you can run the shell script plot.sh
in testresults directory and it can generate all images in its parent directory.
Welcome to add new Go web frameworks. You can follow the below steps and send me a pull request.
Please add your web framework alphabetically.
Only test those webframeworks which are stable
some libs have not been maintained and the test code has removed them
Author: Smallnest
Source Code: https://github.com/smallnest/go-web-framework-benchmark
License: Apache-2.0 license
1613122689
Golang is one of the most powerful and famous tools used to write APIs and web frameworks. Google’s ‘Go’ otherwise known as Golan orders speedy running local code. It is amazing to run a few programming advancements rethinking specialists and software engineers from various sections. We can undoubtedly say that this is on the grounds that the engineers have thought that it was easiest to utilize Go. It is always considered as ago for web and mobile app development because it is ranked highest among all the web programming languages.
Top 3 Golang web frameworks in 2021:
1.Martini: Martini is said to be a low-profile framework as it’s a small community but also known for its various unique things like injecting various data sets or working on handlers of different types. It is very active and there are some twenty and above plug-ins which could also be the reason for the need for add-ons. It deals with some principles of techniques like routing, dealing, etc, basic common tricks to do middleware.
2.Buffalo: Buffalo is known for its fast application development services. It is a complete process of starting any project from scratch and providing end to end facility for back-end web building. Buffalo comes with the dev command which helps directly to experience transformations in front of you and redevelop your whole binary. It is rather an ecosystem used to create the best app development.
3.Gorilla: Gorilla is the largest and longest-running Go web framework. It can be little and maximum for any user. It is also the biggest English-speaking community that comes with robust web sockets features so you can attach the REST codes to the endpoints giving a third-party service like Pusher.
So, these are some web frameworks that can be used for Golang language. Each framework has its unique points which are only found in them but all of them are the best. IF your developer is in search of one this is where you can find the best.
#top 3 golang web frameworks in 2021 #golang #framework #web-service #web #web-development