1678273080
ZincSearch is a search engine that does full text indexing. It is a lightweight alternative to Elasticsearch and runs using a fraction of the resources. It uses bluge as the underlying indexing library.
It is very simple and easy to operate as opposed to Elasticsearch which requires a couple dozen knobs to understand and tune which you can get up and running in 2 minutes
It is a drop-in replacement for Elasticsearch if you are just ingesting data using APIs and searching using kibana (Kibana is not supported with zinc. Zinc provides its own UI).
Check the below video for a quick demo of Zinc.
Playground Server
You could try ZincSearch without installing using below details:
Server | https://playground.dev.zincsearch.com |
User ID | admin |
Password | Complexpass#123 |
Note: Do not store sensitive data on this server as its available to everyone on internet. Data will also be cleaned on this server regularly.
Why ZincSearch
While Elasticsearch is a very good product, it is complex and requires lots of resources and is more than a decade old. I built Zinc so it becomes easier for folks to use full text search indexing without doing a lot of work.
Features:
How to get support
Easiest way to get support is to join the Slack channel.
Roadmap items:
Public roadmap is available at https://github.com/orgs/zinclabs/projects/3/views/1
Please create an issue if you would like something to be added to the roadmap.
Screenshots
Getting started
Check Quickstart
Releases
ZincSearch currently has most of its API contracts frozen. It's data format may still experience changes as we improve things. Currently ZincSearch is in beta. Data format should become highly stable when we move to GA (version 1).
Editions
Feature | Zinc | Zinc Cloud |
---|---|---|
Ideal use case | App search | Logs and Events (Immutable Data) |
Storage | Disk | Object (S3), GCS, Azure blob coming soon |
Preferred Use case | App search | Log / event search |
Max data supported | 100s of GBs | Petabyte scale |
High availability | Will be available soon | Yes |
Open source | Yes | Yes, ZincObserve |
ES API compatibility | Search and Ingestion | Ingestion only |
GUI | Basic | Advanced for log search |
Cost | Free (self hosting may cost money based on size) | Generous free tier. 1 TB ingest / month free. |
Get started | Quick start | ![]() |
❗Note: If your use case is of log search (app and security logs) instead of app search (implement search feature in your application or website) then you should check zinclabs/zincobserve project that is specifically built for observability use case.
Author: Zinclabs
Source Code: https://github.com/zinclabs/zincsearch
License: View license
1675695720
Magda is a data catalog system that will provide a single place where all of an organization's data can be catalogued, enriched, searched, tracked and prioritized - whether big or small, internally or externally sourced, available as files, databases or APIs. Magda is designed specifically around the concept of federation - providing a single view across all data of interest to a user, regardless of where the data is stored or where it was sourced from. The system is able to quickly crawl external data sources, track changes, make automatic enhancements and make notifications when changes occur, giving data users a one-stop shop to discover all the data that's available to them.
Magda is under active development by a small team - we often have to prioritise between making the open-source side of the project more robust and adding features to our own deployments, which can mean newer features aren't documented well, or require specific configuration to work. If you run into problems using Magda, we're always happy to help on GitHub Discussions.
Magda has been used in production for over a year by data.gov.au, and is relatively mature for use in this use case.
Over the past 18 months, our focus has been to develop Magda into a more general-purpose data catalogue for use within organisations. If you want to use it as a data catalog, please do, but expect some rough edges! If you'd like to contribute to the project with issues or PRs, we love to recieve them.
Our current roadmap is available at https://magda.io/docs/roadmap
Magda is built around a collection of microservices that are distributed as docker containers. This was done to provide easy extensibility - Magda can be customised by simply adding new services using any technology as docker images, and integrating them with the rest of the system via stable HTTP APIs. Using Helm and Kubernetes for orchestration means that configuration of a customised Magda instance can be stored and tracked as plain text, and instances with identical configuration can be quickly and easily reproduced.
If you are interested in the architecture details of Magda, you might want to have a look at this doc.
Magda revolves around the Registry - an unopinionated datastore built on top of Postgres. The Registry stores records as a set of JSON documents called aspects. For instance, a dataset is represented as a record with a number of aspects - a basic one that records the name, description and so on as well as more esoteric ones that might not be present for every dataset, like temporal coverage or determined data quality. Likewise, distributions (the actual data files, or URLs linking to them) are also modelled as records, with their own sets of aspects covering both basic metadata once again, as well as more specific aspects like whether the URL to the file worked when last tested.
Most importantly, aspects are able to be declared dynamically by other services by simply making a call with a name, description and JSON schema. This means that if you have a requirement to store extra information about a dataset or distribution you can easily do so by declaring your own aspect. Because the system isn't opinionated about what a record is beyond a set of aspects, you can also use this to add new entities to the system that link together - for instance, we've used this to store projects with a name and description that link to a number of datasets.
Connectors go out to external datasources and copy their metadata into the Registry, so that they can be searched and have other aspects attached to them. A connector is simply a docker-based microservice that is invoked as a job. It scans the target datasource (usually an open-data portal), then completes and shuts down. We have connectors for a number of existing open data formats, otherwise you can easily write and run your own.
A minion is a service that listens for new records or changes to existing records, performs some kind of operation and then writes the result back to the registry. For instance, we have a broken link minion that listens for changes to distributions, retrieves the URLs described, records whether they were able to be accessed successfully and then writes that back to the registry in its own aspect.
Other aspects exist that are written to by many minions - for instance, we have a "quality" aspect that contains a number of different quality ratings from different sources, which are averaged out and used by search.
Datasets and distributions in the registry are ingested into an ElasticSearch cluster, which indexes a few core aspects of each and exposes an API.
Magda provides a user interface, which is served from its own microservice and consumes the APIs. We're planning to make the UI itself extensible with plugins at some point in the future.
If you just want to install a local testing version, installing Magda using Helm is relatively easier (you can use minikube to install a local k8s test cluster):
# Add Magda Helm Chart Repo:
helm repo add magda-io https://charts.magda.io
# create a namespace "magda" in your cluster
kubectl create namespace magda
# install Magda version v2.2.0 to namespace "magda", turn off openfass function and expose the service via loadBalancer
helm upgrade --namespace magda --install --version 2.2.0 --timeout 9999s --set magda-core.gateway.service.type=LoadBalancer magda magda-io/magda
You can find out the load balancer IP and access it:
echo $(kubectl get svc --namespace magda gateway --template "{{ range (index .status.loadBalancer.ingress 0) }}{{ . }}{{ end }}")
If you are interested in playing more, you might find useful docs from here. Particularly:
You might also want to have a look at this tutorial repo:
https://github.com/magda-io/magda-brown-bag
Or find out more on: https://magda.io/docs/building-and-running if you are interested in the development or play with the code.
https://magda.io/docs/building-and-running
Start a discussion at https://github.com/magda-io/magda/discussions. There's not a lot on there yet, but we monitor it closely :).
Email us at contact@magda.io.
Great! Take a look at https://github.com/magda-io/magda/blob/master/.github/CONTRIBUTING.md :).
More documents can be found from the folder docs/docs/.
Author: Magda-io
Source Code: https://github.com/magda-io/magda
License: Apache-2.0 license
1674787523
Welcome to our Elasticsearch Tutorial! This video series covers all aspects of Elastic Stack (ELK Stack) including introduction to Elasticsearch, installation of Elasticsearch and Kibana on MacOS, Elasticsearch REST API and searching Elasticsearch. We will also cover Logstash and stashing your first event. We will dive deep into the basics of search in Elasticsearch, query and filter context, and how to write our own queries in Elasticsearch. You will learn about compound queries in Elasticsearch and how to implement them. Additionally, we will cover full text queries and how to use Elasticsearch with the Python programming language. We will also cover Elasticsearch DSL with the Python programming language. This tutorial is perfect for anyone who wants to learn about Elasticsearch and the Elastic Stack. Watch this video series to learn about Elasticsearch and the Elastic Stack.
Subscribe: https://www.youtube.com/@ProgrammingKnowledge/featured
1673546580
Rails 7 App with Preinstalled Tools is Ready in Minutes!
Usually It is difficult and time consuming to setup a typical Rails environment from scratch.
Since now if you have Ruby and Docker then you have working Rails environment in about 5 minutes without any manual efforts.
Logotype | Description | Why it was added |
---|---|---|
![]() | Docker | Helps to keep all required services in containers. To have fast and predictable installation process in minutes |
![]() | PostgresSQL | Most popular relation database |
![]() | Ruby 3.2 | Most recent version of Ruby |
![]() | Rails 7 | Most recent version of Rails |
![]() | gem "config" | Configuration management tool |
![]() | Elasticsearch | The world’s leading Search engine |
![]() | Chewy | Ruby Connector to Elasticsearch |
![]() | Redis | In-memory data store. For caching and as a dependency of Sidekiq |
![]() | Sidekiq | Job Scheduler and Async Tasks Executor. Can be used as a stand alone tool or as ActiveJob backend |
![]() | Import Maps | Rails' recommended way to process JavaScript |
![]() | Puma | Application Web Server. To launch Rails app |
What I'm going to add...
Logotype | Description | Why it was added |
---|---|---|
![]() | Kaminari | Pagination solution |
![]() | Devise | Authentication solution for Rails |
![]() | Devise | Login with Facebook and Google |
![]() | Devise and Action Mailer | Sending emails for account confirmations |
![]() | Letter Opener | Email previwer for development |
![]() | whenever | Linux Cron based periodical tasks |
![]() | RSpec | Testing Framework for Rails |
![]() | Rubocop | Ruby static code analyzer (a.k.a. linter) and formatter. |
All trademarks, logos and brand names are the property of their respective owners.
On your host you have:
ONE!
git clone https://github.com/the-teacher/rails7-startkit.git
TWO!
cd rails7-startkit
THREE!
bin/setup
You will see something like that:
1. Launching PgSQL container
2. Launching ElasticSearch Container
3. Launching Rails container
4. Installing Gems. Please Wait
5. Create DB. Migrate DB. Create Seeds
6. Launching Redis Container
7. Indexing Article Model
8. Launching Rails App with Puma
9. Launching Sidekiq
10. Visit: http://localhost:3000
Index Page of the Project
bin/
commandsFrom the root of the project
Command | Description |
---|---|
bin/setup | Download images, run containers, initialize data, launch all processes. |
bin/open | Get in Rails Container (`rails` by default) |
bin/open rails | Get in Rails Container |
bin/open psql | Get in PgSQL Container |
bin/open redis | Get in Redis Container |
bin/open elastic | Get in ElasticSearch Container |
bin/status | To see running containers and launched services |
bin/start | Start everything if it is stopped |
bin/stop | Stop processes in Rails container |
bin/stop_all | Stop everything if it is running |
bin/index | Run Search engines indexation |
bin/reset | Reset data od services in ./db folder |
For demonstration, education and maintainance purposes I use the following approach:
Data
./db
UPPERCASED
./db
├── ELASTIC
├── PGSQL
└── REDIS
Configuration Files
./config
_UNDERSCORED
and UPPERCASED
./config
├── _CONFIG.yml
├── _PUMA.rb
└── _SIDEKIQ.yml
Initialazers
./config/initializers
_UNDERSCORED
and UPPERCASED
./config/initializers/
├── _CHEWY.rb
├── _CONFIG.rb
├── _REDIS.rb
└── _SIDEKIQ.rb
As a user to own files and run Rails inside a container I use
user:group
=> lucky:lucky
=> 7777:7777
If you would like to run the project on a linux environment then:
lucky (7777)
and user lucky (7777)
RUN_AS=7777:7777
optionFrom the root of the project
bin/open rails
Now you are in the Rails container and you can do everything as usual
RAILS_ENV=test rake db:create
rake test
What is an idea of this project?
For many years Rails gives you freedom to choose development tools. Different databases, different paginators, different search engines, different delayed job solutions.
It is great. But all the time you need to choose something and install it from scratch.
I think I did my choice about many solutions and tools.
I want to install my minimal pack of tools now and reuse my StartKit every time when I start a new project.
With Docker I can roll out my minimal application with all required preinstalled tools in minutes, not in hours or in days.
Why did you create this project?
I didn't work with Rails last 4 or 5 years. I wanted to learn new approaches and techniques. I found that there is still no a simple way to setup a blank app with most popular tools.
So. Why not to make my own playground?
How do you choose technologies for the StartKit?
I use tools that I like or want to learn.
I use tools that I think are the most popular ones.
It looks good for development. What about production?
I'm not a DevOps, but I have a vision how to deploy this code to production.
Right now it is not described somehow. It is in my plans.
Author: the-teacher
Source Code: https://github.com/the-teacher/rails7-startkit
License: MIT
1673462400
Hello Readers!! In this blog, we will see how we can send GitHub commits and PR logs to Elasticsearch using a custom script. Here we will use a bash script that will send GitHub logs to elasticsearch. It will create an index in elasticsearch and push there the logs.
After sending logs to elasticsearch we can visualize the following github events in kibana:-
1. GitHub User: Users will be responsible for performing actions in a GitHub repository like commits and pull requests.
2. GitHub Repository: Source Code Management system on which users will perform actions.
3. GitHub Action: Continuous integration and continuous delivery (CI/CD) platform which will run each time when a GitHub user will commit any change and make a pull request.
4. Bash Script: The custom script is written in bash for shipping GitHub logs to Elasticsearch.
5. ElasticSearch: Stores all of the logs in the index created.
6. Kibana: Web interface for searching and visualizing logs.
1. GitHub users will make commits and raise pull requests to the GitHub repository. Here is my GitHub repository which I have created for this blog.
https://github.com/NaincyKumariKnoldus/Github_logs
2. Create two Github actions in this repository. This GitHub action will get trigger on the events perform by the GitHub user.
GitHub action workflow file for getting trigger on commit events:
commit_workflow.yml:
# The name of the workflow
name: CI
#environment variables
env:
GITHUB_REF_NAME: $GITHUB_REF_NAME
ES_URL: ${{ secrets.ES_URL }}
# Controls when the workflow will run
on: [push]
#A job is a set of steps in a workflow
jobs:
send-push-events:
name: Push Logs to ES
#The job will run on the latest version of an Ubuntu Linux runner.
runs-on: ubuntu-latest
steps:
#This is an action that checks out your repository onto the runner, allowing you to run scripts
- uses: actions/checkout@v2
#The run keyword tells the job to execute a command on the runner
- run: ./git_commit.sh
GitHub action workflow file for getting trigger on pull events:
pr_workflow.yml:
name: CI
env:
GITHUB_REF_NAME: $GITHUB_REF_NAME
ES_URL: ${{ secrets.ES_URL }}
on: [pull_request]
jobs:
send-pull-events:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- run: ./git_pr.sh
3. Create two files inside your GitHub repository for putting bash scripts. Following is the bash script for shipping GitHub logs to Elasticsearch. This script will get executed by the GitHub actions mentioned above.
git_commit.sh will get triggered by GitHub action workflow file commit_workflow.yml:
#!/bin/bash
# get github commits
getCommitResponse=$(
curl -s \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"https://api.github.com/repos/NaincyKumariKnoldus/Github_logs/commits?sha=$GITHUB_REF_NAME&per_page=100&page=1"
)
# get commit SHA
commitSHA=$(echo "$getCommitResponse" |
jq '.[].sha' |
tr -d '"')
# get the loop count based on number of commits
loopCount=$(echo "$commitSHA" |
wc -w)
echo "loopcount= $loopCount"
# get data from ES
getEsCommitSHA=$(curl -H "Content-Type: application/json" -X GET "$ES_URL/github_commit/_search?pretty" -d '{
"size": 10000,
"query": {
"wildcard": {
"commit_sha": {
"value": "*"
}}}}' |
jq '.hits.hits[]._source.commit_sha' |
tr -d '"')
# store ES commit sha in a temp file
echo $getEsCommitSHA | tr " " "\n" > sha_es.txt
# looping through each commit detail
for ((count = 0; count < $loopCount; count++)); do
# get commitSHA
commitSHA=$(echo "$getCommitResponse" |
jq --argjson count "$count" '.[$count].sha' |
tr -d '"')
# match result for previous existing commit on ES
matchRes=$(grep -o $commitSHA sha_es.txt)
echo $matchRes | tr " " "\n" >> match.txt
# filtering and pushing unmatched commit sha details to ES
if [ -z $matchRes ]; then
echo "Unmatched SHA: $commitSHA"
echo $commitSHA | tr " " "\n" >> unmatch.txt
# get author name
authorName=$(echo "$getCommitResponse" |
jq --argjson count "$count" '.[$count].commit.author.name' |
tr -d '"')
# get commit message
commitMessage=$(echo "$getCommitResponse" |
jq --argjson count "$count" '.[$count].commit.message' |
tr -d '"')
# get commit html url
commitHtmlUrl=$(echo "$getCommitResponse" |
jq --argjson count "$count" '.[$count].html_url' |
tr -d '"')
# get commit time
commitTime=$(echo "$getCommitResponse" |
jq --argjson count "$count" '.[$count].commit.author.date' |
tr -d '"')
# send data to es
curl -X POST "$ES_URL/github_commit/commit" \
-H "Content-Type: application/json" \
-d "{ \"commit_sha\" : \"$commitSHA\",
\"branch_name\" : \"$GITHUB_REF_NAME\",
\"author_name\" : \"$authorName\",
\"commit_message\" : \"$commitMessage\",
\"commit_html_url\" : \"$commitHtmlUrl\",
\"commit_time\" : \"$commitTime\" }"
fi
done
# removing temporary file
rm -rf sha_es.txt
rm -rf match.txt
rm -rf unmatch.txt
git_pr.sh will get triggered by GitHub action workflow file pr_workflow.yml:
#!/bin/bash
# get github PR details
getPrResponse=$(curl -s \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"https://api.github.com/repos/NaincyKumariKnoldus/Github_logs/pulls?state=all&per_page=100&page=1")
# get number of PR
totalPR=$(echo "$getPrResponse" |
jq '.[].number' |
tr -d '"')
# get the loop count based on number of PRs
loopCount=$(echo "$totalPR" |
wc -w)
echo "loopcount= $loopCount"
# get data from ES
getEsPR=$(curl -H "Content-Type: application/json" -X GET "$ES_URL/github_pr/_search?pretty" -d '{
"size": 10000,
"query": {
"wildcard": {
"pr_number": {
"value": "*"
}}}}' |
jq '.hits.hits[]._source.pr_number' |
tr -d '"')
# store ES PR number in a temp file
echo $getEsPR | tr " " "\n" > sha_es.txt
# looping through each PR detail
for ((count = 0; count < $loopCount; count++)); do
# get PR_number
totalPR=$(echo "$getPrResponse" |
jq --argjson count "$count" '.[$count].number' |
tr -d '"')
# looping through each PR detail
matchRes=$(grep -o $totalPR sha_es.txt)
echo $matchRes | tr " " "\n" >>match.txt
# filtering and pushing unmatched PR number details to ES
if [ -z $matchRes ]; then
# get PR html url
PrHtmlUrl=$(echo "$getPrResponse" |
jq --argjson count "$count" '.[$count].html_url' |
tr -d '"')
# get PR Body
PrBody=$(echo "$getPrResponse" |
jq --argjson count "$count" '.[$count].body' |
tr -d '"')
# get PR Number
PrNumber=$(echo "$getPrResponse" |
jq --argjson count "$count" '.[$count].number' |
tr -d '"')
# get PR Title
PrTitle=$(echo "$getPrResponse" |
jq --argjson count "$count" '.[$count].title' |
tr -d '"')
# get PR state
PrState=$(echo "$getPrResponse" |
jq --argjson count "$count" '.[$count].state' |
tr -d '"')
# get PR created at
PrCreatedAt=$(echo "$getPrResponse" |
jq --argjson count "$count" '.[$count].created_at' |
tr -d '"')
# get PR closed at
PrCloseAt=$(echo "$getPrResponse" |
jq --argjson count "$count" '.[$count].closed_at' |
tr -d '"')
# get PR merged at
PrMergedAt=$(echo "$getPrResponse" |
jq --argjson count "$count" '.[$count].merged_at' |
tr -d '"')
# get base branch name
PrBaseBranch=$(echo "$getPrResponse" |
jq --argjson count "$count" '.[$count].base.ref' |
tr -d '"')
# get source branch name
PrSourceBranch=$(echo "$getPrResponse" |
jq --argjson count "$count" '.[$count].head.ref' |
tr -d '"')
# send data to es
curl -X POST "$ES_URL/github_pr/pull_request" \
-H "Content-Type: application/json" \
-d "{ \"pr_number\" : \"$PrNumber\",
\"pr_url\" : \"$PrHtmlUrl\",
\"pr_title\" : \"$PrTitle\",
\"pr_body\" : \"$PrBody\",
\"pr_base_branch\" : \"$PrBaseBranch\",
\"pr_source_branch\" : \"$PrSourceBranch\",
\"pr_state\" : \"$PrState\",
\"pr_creation_time\" : \"$PrCreatedAt\",
\"pr_closed_time\" : \"$PrCloseAt\",
\"pr_merge_at\" : \"$PrMergedAt\"}"
fi
done
# removing temporary file
rm -rf sha_es.txt
rm -rf match.txt
rm -rf unmatch.txt
4. Now make a push in the GitHub repository. After making a commit, GitHub action on push will run and it will send commit logs to elasticsearch.
Move to your elasticsearch for getting GitHub commits logs there.
We are now getting GitHub commits here.
5. Now raise a pull request in your GitHub repository. It will also run GitHub action on pull and this will trigger the bash script which will push pull request logs to elasticsearch.
GitHub action got executed on the pull request:
Now, move to elasticsearch and you will find pull request logs there.
6. We can visualize these logs in kibana also.
GitHub commit logs in kibana:
GitHub pull request logs in kibana:
This is how we can analyze our GitHub logs in elasticsearch and kibana using the custom script.
We are all done now!!
Thank you for sticking to the end. In this blog, we have learned how we can send GitHub commits and PR logs to Elasticsearch using a custom script. This is really very quick and simple. If you like this blog, please share my blog and show your appreciation by giving thumbs-ups, and don’t forget to give me suggestions on how I can improve my future blogs that can suit your needs.
Original article source at: https://blog.knoldus.com/
1673440680
pfSense/OPNsense + Elastic Stack
pfelk is a highly customizable open-source tool for ingesting and visualizing your firewall traffic with the full power of Elasticsearch, Logstash and Kibana.
ingest and enrich your pfSense/OPNsense firewall traffic logs by leveraging Logstash
search your indexed data in near-real-time with the full power of the Elasticsearch
visualize you network traffic with interactive dashboards, Maps, graphs in Kibana
Supported entries include:
pfelk aims to replace the vanilla pfSense/OPNsense web UI with extended search and visualization features. You can deploy this solution via ansible-playbook, docker-compose, bash script, or manually.
$ ansible-playbook -i hosts --ask-become deploy-stack.yml
$ docker-compose up
$ wget https://raw.githubusercontent.com/pfelk/pfelk/main/etc/pfelk/scripts/pfelk-installer.sh
$ chmod +x pfelk-installer.sh
$ sudo ./pfelk-installer.sh
This is the experimental public roadmap for the pfelk project.
Please reference to the CONTRIBUTING file. Collectively we can enhance and improve this product. Issues, feature requests, PRs, and documentation contributions are encouraged and welcomed!
https://docs.elastic.co/en/integrations/pfsense
Author: pfelk
Source Code: https://github.com/pfelk/pfelk
License: View license
1671716040
Hello everyone! Today in this blog, we will learn how to backup and restore Elasticsearch using snapshots. Before diving in, let’s first brush up on the basics of the topic.
You should be seeing the following output –
As now we have successfully taken the backup of our indices, let us just make sure if we’re able to retrieve the data if it gets lost. So, let us first delete our data using the following command –
Now, if you’ll check, all the data must have been gone. So, let us try to restore our data using the snapshots we created.
The above command will successfully restore all the lost or deleted data.
That’s it for now. I hope this article was useful to you. Please feel free to drop any comments, questions, or suggestions.
Original article source at: https://blog.knoldus.com/
1670465582
How to Configure ElasticSearch and Speed Up WordPress. In this guide you going to learn how to install ElasticSearch and configure it with your WordPress and optimize the search queries using ElasticPress WordPress plugin.
If your site is having more search queries then you should consider using a search engine for your site. ElasticSearch is a full-text search engine which indexes your data and searches them very quickly.
This setup is tested on Google Cloud and AWS. So, you can use this guide to setup ElasticSearch on any VPS or any other cloud or any dedicated servers.
SSH to your EC2 Instance and perform the steps listed below.
Elasticsearch runs on port 9200
, so it is necessary to open this port for the setup to work.
Go to your Security group and create a rule to allow connections from your IP address on this port.
If you have configured UFW on your server, you need to add rule on this too.
sudo ufw allow from IP_ADDRESS to any port 9200
Make sure to update the IP_ ADDRESS
with your server’s public IP.
Java is necessary to install ElasticSearch. Install Java JDK using the following command.
sudo apt install openjdk-8-jdk
Use the update-alternatives
command to get the installation path of your Java version.
sudo update-alternatives --config java
OpenJDK 8 is located at /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
Copy the installation path of your default version and add it in the JAVA_HOME
environment variable.
sudo nano /etc/environment
At the end of this file, add the following line with your installation path. To use the official Java 8 by Oracle the variable will be as follows.
JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java"
Hit Ctrl+X
followed by Y
and Enter
to save and exit the nano editor.
Now JAVA_HOME
environment variable is set and available for all users.
Reload to apply changes.
source /etc/environment
To verify the environment variable of Java
echo $JAVA_HOME
You will get the installation path you just set.
Now Java is successfully installed and you can install Elasticsearch.
Import ElasticSearch repository’s GPG key.
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
Add the repository to the sources list of your Ubuntu server or system.
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
Update the package list and install ElasticSearch.
sudo apt update
sudo apt install elasticsearch
Once Elasticsearch is installed you can restrict port 9200
from outside access by editing the elasticsearch.yml file
and uncomment the network.host
and replace the value with localhost
.
sudo nano /etc/elasticsearch/elasticsearch.yml
So it looks looks like this..
network.host: localhost
Hit Ctrl+X
followed by Y
and Enter
to save the file and exit.
Now start and enable Elasticsearch on server boot.
sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch
Now make sure your Elasticsearch service is running.
sudo systemctl status elasticsearch
Test your installation by sending a HTTP request.
curl -X GET "localhost:9200"
You will get a response with name, cluster_name, cluster_uuid, version.
Login to your WordPress admin and go to Plugins >> Add new, search for ElasticPress and install it and activate.
Once activated go to ElasticPress >> Settings and add the Elasticsearch URL (http://localhost:9200
) and save the settings.
Once the settings are saved click the sync icon on the top tight near the gear icon to sync the content. Next you can view the Index health of your setup.
Now you have configured Elasticsearh on your WordPress website to speed up search queries.
Learn the most Advanced Techniques of WordPress with this easy to learn course now.
Now you have learned how to install and configure ElasticSearch for your WordPress website
Thanks for your time. If you face any problem or any feedback, please leave a comment below.
Original article source at: https://www.cloudbooklet.com/
1669955460
How to Install Elasticsearch on Ubuntu 22.04 with SSL. Elasticsearch 8 is a powerful scalable real time distributed search and data analysis. Here you will learn how to configure SSL to your Elasticsearch installation with Nginx reverse proxy on Ubuntu 22.04.
You will create a subdomain for your Elasticsearch service and install free Let’s Encrypt SSL certificate using Certbot.
This setup is tested on Google Cloud Platform running Ubuntu 22.04 LTS. So this guide will work perfect on other cloud service providers like AWS, Azure or any VPS or dedicated servers.
Start by updating the server software packages to the latest version available.
sudo apt update
sudo apt upgrade
Make sure you use a sub-domain to access your Elasticsearch installation.
Go to your DNS management section and create a new A
record with the name of you wish for your subdomain (for example search
) and value of your your server IP address.
So your sub-domain will look similar to the one below. If you wish to configure your main domain you can do that also.
search.yourdomain.com
Java is already included with the Elasticsearch package, so you don’t want to install Java manually. Learn more about installing Java on Ubuntu 22.04.
Here we will install Elasticsearch 8.
Start by importing Elasticsearch repository’s GPG key.
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
Add the repository to the sources list of your Ubuntu server or system.
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
Update the package list and install ElasticSearch.
sudo apt update
sudo apt install elasticsearch
Once the installation is completed you will receive the super user password, please note that and secure it.
------------------- Security autoconfiguration information ----------------------
Authentication and authorization are enabled.
TLS for the transport and HTTP layers is enabled and configured.
The generated password for the elastic built-in superuser is : houbJ1uivo5b=aVYYPa5
If this node should join an existing cluster, you can reconfigure this with
'/usr/share/elasticsearch/bin/elasticsearch-reconfigure-node --enrollment-token <token-here>'
after creating an enrollment token on your existing cluster.
You can complete the following actions at any time:
Reset the password of the elastic built-in superuser with
'/usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic'.
Generate an enrollment token for Kibana instances with
'/usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s kibana'.
Generate an enrollment token for Elasticsearch nodes with
'/usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s node'.
---------------------------------------------------------------------------------
Elasticsearch service is not started automatically upon installation, you need execute the below commands to configure Elasticsearch service to start automatically using systemd.
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
sudo systemctl start elasticsearch.service
Once Elasticsearch is installed you can restrict port 9200
from outside access by editing the elasticsearch.yml file
and uncomment the network.host
and replace the value with Internal IP or any IP or localhost
.
sudo nano /etc/elasticsearch/elasticsearch.yml
So it looks looks like this..
network.host: INTERNAL_IP
You can also use localhost
as host or any IP address you wish.
Hit Ctrl+X
followed by Y
and Enter
to save the file and exit.
Now start and enable Elasticsearch on server boot.
sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch
Now make sure your Elasticsearch service is running.
sudo systemctl status elasticsearch
Test your installation by sending a HTTPs request by attaching the certificate using the below command.
Take note of the password you received earlier, you will need to use that while prompted.
sudo su
curl --cacert /etc/elasticsearch/certs/http_ca.crt -u elastic https://INTERNAL_IP:9200
Enter the password while prompted.
You will receive a response as shown below.
{
"name" : "elasticsearch-vm",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "vGrj3z4rQEWRBUdd9IhZWA",
"version" : {
"number" : "8.2.2",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "9876968ef3c745186b94fdabd4483e01499224ef",
"build_date" : "2022-05-25T15:47:06.259735307Z",
"build_snapshot" : false,
"lucene_version" : "9.1.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}
Now it’s time to install and configure Nginx. Execute the below command to install Nginx.
sudo apt install nginx
Now you can configure Nginx reverse proxy fro your Elasticsearch.
Remove default configurations
sudo rm /etc/nginx/sites-available/default
sudo rm /etc/nginx/sites-enabled/default
Create a new Nginx configuration file.
sudo nano /etc/nginx/sites-available/search.conf
Paste the following.
Note: You need to use exact same IP
or localhost
that you used in the host of Elasticsearch configuration.
server {
listen [::]:80;
listen 80;
server_name search.yourdomain.com;
location / {
proxy_pass http://INTERNAL_IP:9200;
proxy_redirect off;
proxy_read_timeout 90;
proxy_connect_timeout 90;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
}
Save and exit the file.
Enable your configuration by creating a symbolic link.
sudo ln -s /etc/nginx/sites-available/search.conf /etc/nginx/sites-enabled/search.conf
HTTPS is a protocol for secure communication between a server (instance) and a client (web browser). Due to the introduction of Let’s Encrypt, which provides free SSL certificates, HTTPS are adopted by everyone and also provides trust to your audiences.
sudo apt install python3-certbot-nginx
Now we have installed Certbot by Let’s Encrypt for Ubuntu 22.04, run this command to receive your certificates.
sudo certbot --nginx --agree-tos --no-eff-email --redirect -m youremail@email.com -d search.domainname.com
This command will install Free SSL, configure redirection to HTTPS and restarts the Nginx server.
Certificates provided by Let’s Encrypt are valid for 90 days only, so you need to renew them often. So, let’s test the renewal feature using the following command.
sudo certbot renew --dry-run
This command will test the certificate expiry and configures the auto-renewable feature.
Prepare yourself for a role working as an Information Technology Professional with Linux operating system
Now you have learned how to install Elasticsearch 8 and secure it with Let’s Encrypt free ssl on Ubuntu 22.04.
Thanks for your time. If you face any problem or any feedback, please leave a comment below.
Original article source at: https://www.cloudbooklet.com/
1668836834
In this tutorial, we'll look at how to integrate Django REST Framework (DRF) with Elasticsearch. We'll use Django to model our data and DRF to serialize and serve it. Finally, we'll index the data with Elasticsearch and make it searchable.
Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. It's known for its simple RESTful APIs, distributed nature, speed, and scalability. Elasticsearch is the central component of the Elastic Stack (also known as the ELK Stack), a set of free and open tools for data ingestion, enrichment, storage, analysis, and visualization.
Its use cases include:
To learn more about Elasticsearch check out What is Elasticsearch? from the official documentation.
Before working with Elasticsearch, we should get familiar with the basic Elasticsearch concepts. These are listed from biggest to smallest:
The Elasticsearch cluster has the following structure:
Curious how relational database concepts relate to Elasticsearch concepts?
Relational Database | Elasticsearch |
---|---|
Cluster | Cluster |
RDBMS Instance | Node |
Table | Index |
Row | Document |
Column | Field |
Review Mapping concepts across SQL and Elasticsearch for more on how concepts in SQL and Elasticsearch relate to one another.
With regards to full-text search, Elasticsearch and PostgreSQL both have their advantages and disadvantages. When choosing between them you should consider speed, query complexity, and budget.
PostgreSQL advantages:
Elasticsearch advantages:
If you're working on a simple project where speed isn't important you should opt for PostgreSQL. If performance is important and you want to write complex lookups opt for Elasticsearch.
For more on full-text search with Django and Postgres, check out the Basic and Full-text Search with Django and Postgres article.
We'll be building a simple blog application. Our project will consist of multiple models, which will be serialized and served via Django REST Framework. After integrating Elasticsearch, we'll create an endpoint that will allow us to look up different authors, categories, and articles.
To keep our code clean and modular, we'll split our project into the following two apps:
blog
- for our Django models, serializers, and ViewSetssearch
- for Elasticsearch documents, indexes, and queriesStart by creating a new directory and setting up a new Django project:
$ mkdir django-drf-elasticsearch && cd django-drf-elasticsearch
$ python3.9 -m venv env
$ source env/bin/activate
(env)$ pip install django==3.2.6
(env)$ django-admin.py startproject core .
After that, create a new app called blog
:
(env)$ python manage.py startapp blog
Register the app in core/settings.py under INSTALLED_APPS
:
# core/settings.py
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'blog.apps.BlogConfig', # new
]
Next, create Category
and Article
models in blog/models.py:
# blog/models.py
from django.contrib.auth.models import User
from django.db import models
class Category(models.Model):
name = models.CharField(max_length=32)
description = models.TextField(null=True, blank=True)
class Meta:
verbose_name_plural = 'categories'
def __str__(self):
return f'{self.name}'
ARTICLE_TYPES = [
('UN', 'Unspecified'),
('TU', 'Tutorial'),
('RS', 'Research'),
('RW', 'Review'),
]
class Article(models.Model):
title = models.CharField(max_length=256)
author = models.ForeignKey(to=User, on_delete=models.CASCADE)
type = models.CharField(max_length=2, choices=ARTICLE_TYPES, default='UN')
categories = models.ManyToManyField(to=Category, blank=True, related_name='categories')
content = models.TextField()
created_datetime = models.DateTimeField(auto_now_add=True)
updated_datetime = models.DateTimeField(auto_now=True)
def __str__(self):
return f'{self.author}: {self.title} ({self.created_datetime.date()})'
Notes:
Category
represents an article category -- i.e, programming, Linux, testing.Article
represents an individual article. Each article can have multiple categories. Articles have a specific type -- Tutorial
, Research
, Review
, or Unspecified
.Make migrations and then apply them:
(env)$ python manage.py makemigrations
(env)$ python manage.py migrate
Register the models in blog/admin.py:
# blog/admin.py
from django.contrib import admin
from blog.models import Category, Article
admin.site.register(Category)
admin.site.register(Article)
Before moving to the next step, we need some data to work with. I've created a simple command we can use to populate the database.
Create a new folder in "blog" called "management", and then inside that folder create another folder called "commands". Inside of the "commands" folder, create a new file called populate_db.py.
management
└── commands
└── populate_db.py
Copy the file contents from populate_db.py and paste it inside your populate_db.py.
Run the following command to populate the DB:
(env)$ python manage.py populate_db
If everything went well you should see a Successfully populated the database.
message in the console and there should be a few articles in your database.
Now let's install djangorestframework
using pip:
(env)$ pip install djangorestframework==3.12.4
Register it in our settings.py like so:
# core/settings.py
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'blog.apps.BlogConfig',
'rest_framework', # new
]
Add the following settings:
# core/settings.py
REST_FRAMEWORK = {
'DEFAULT_PAGINATION_CLASS': 'rest_framework.pagination.LimitOffsetPagination',
'PAGE_SIZE': 25
}
We'll need these settings to implement pagination.
To serialize our Django models, we need to create a serializer for each of them. The easiest way to create serializers that depend on Django models is by using the ModelSerializer
class.
blog/serializers.py:
# blog/serializers.py
from django.contrib.auth.models import User
from rest_framework import serializers
from blog.models import Article, Category
class UserSerializer(serializers.ModelSerializer):
class Meta:
model = User
fields = ('id', 'username', 'first_name', 'last_name')
class CategorySerializer(serializers.ModelSerializer):
class Meta:
model = Category
fields = '__all__'
class ArticleSerializer(serializers.ModelSerializer):
author = UserSerializer()
categories = CategorySerializer(many=True)
class Meta:
model = Article
fields = '__all__'
Notes:
UserSerializer
and CategorySerializer
are fairly simple: We just provided the fields we want serialized.ArticleSerializer
, we needed to take care of the relationships to make sure they also get serialized. This is why we provided UserSerializer
and CategorySerializer
.Want to learn more about DRF serializers? Check out Effectively Using Django REST Framework Serializers.
Let's create a ViewSet for each of our models in blog/views.py:
# blog/views.py
from django.contrib.auth.models import User
from rest_framework import viewsets
from blog.models import Category, Article
from blog.serializers import CategorySerializer, ArticleSerializer, UserSerializer
class UserViewSet(viewsets.ModelViewSet):
serializer_class = UserSerializer
queryset = User.objects.all()
class CategoryViewSet(viewsets.ModelViewSet):
serializer_class = CategorySerializer
queryset = Category.objects.all()
class ArticleViewSet(viewsets.ModelViewSet):
serializer_class = ArticleSerializer
queryset = Article.objects.all()
In this block of code, we created the ViewSets by providing the serializer_class
and queryset
for each ViewSet.
Create the app-level URLs for the ViewSets:
# blog/urls.py
from django.urls import path, include
from rest_framework import routers
from blog.views import UserViewSet, CategoryViewSet, ArticleViewSet
router = routers.DefaultRouter()
router.register(r'user', UserViewSet)
router.register(r'category', CategoryViewSet)
router.register(r'article', ArticleViewSet)
urlpatterns = [
path('', include(router.urls)),
]
Then, wire up the app URLs to the project URLs:
# core/urls.py
from django.contrib import admin
from django.urls import path, include
urlpatterns = [
path('blog/', include('blog.urls')),
path('admin/', admin.site.urls),
]
Our app now has the following URLs:
/blog/user/
lists all users/blog/user/<USER_ID>/
fetches a specific user/blog/category/
lists all categories/blog/category/<CATEGORY_ID>/
fetches a specific category/blog/article/
lists all articles/blog/article/<ARTICLE_ID>/
fetches a specific articleNow that we've registered the URLs, we can test the endpoints to see if everything works correctly.
Run the development server:
(env)$ python manage.py runserver
Then, in your browser of choice, navigate to http://127.0.0.1:8000/blog/article/. The response should look something like this:
{
"count": 4,
"next": null,
"previous": null,
"results": [
{
"id": 1,
"author": {
"id": 3,
"username": "jess_",
"first_name": "Jess",
"last_name": "Brown"
},
"categories": [
{
"id": 2,
"name": "SEO optimization",
"description": null
}
],
"title": "How to improve your Google rating?",
"type": "TU",
"content": "Firstly, add the correct SEO tags...",
"created_datetime": "2021-08-12T17:34:31.271610Z",
"updated_datetime": "2021-08-12T17:34:31.322165Z"
},
{
"id": 2,
"author": {
"id": 4,
"username": "johnny",
"first_name": "Johnny",
"last_name": "Davis"
},
"categories": [
{
"id": 4,
"name": "Programming",
"description": null
}
],
"title": "Installing latest version of Ubuntu",
"type": "TU",
"content": "In this tutorial, we'll take a look at how to setup the latest version of Ubuntu. Ubuntu (/ʊˈbʊntuː/ is a Linux distribution based on Debian and composed mostly of free and open-source software. Ubuntu is officially released in three editions: Desktop, Server, and Core for Internet of things devices and robots.",
"created_datetime": "2021-08-12T17:34:31.540628Z",
"updated_datetime": "2021-08-12T17:34:31.592555Z"
},
...
]
}
Manually test the other endpoints as well.
Start by installing and running Elasticsearch in the background.
Need help getting Elasticsearch up and running? Check out the Installing Elasticsearch guide. If you're familiar with Docker, you can simply run the following command to pull the official image and spin up a container with Elasticsearch running:
$ docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.14.0
To integrate Elasticsearch with Django, we need to install the following packages:
Install:
(env)$ pip install elasticsearch==7.14.0
(env)$ pip install elasticsearch-dsl==7.4.0
(env)$ pip install django-elasticsearch-dsl==7.2.0
Start a new app called search
, which will hold our Elasticsearch documents, indexes, and queries:
(env)$ python manage.py startapp search
Register the search
and django_elasticsearch_dsl
in core/settings.py under INSTALLED_APPS
:
# core/settings.py
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'django_elasticsearch_dsl', # new
'blog.apps.BlogConfig',
'search.apps.SearchConfig', # new
'rest_framework',
]
Now we need to let Django know where Elasticsearch is running. We do that by adding the following to our core/settings.py file:
# core/settings.py
# Elasticsearch
# https://django-elasticsearch-dsl.readthedocs.io/en/latest/settings.html
ELASTICSEARCH_DSL = {
'default': {
'hosts': 'localhost:9200'
},
}
If your Elasticsearch is running on a different port, make sure to change the above settings accordingly.
We can test if Django can connect to the Elasticsearch by starting our server:
(env)$ python manage.py runserver
If your Django server fails, Elasticsearch is probably not working correctly.
Before creating the documents, we need to make sure all the data is going to get saved in the proper format. We're using CharField(max_length=2)
for our article type
, which by itself doesn't make much sense. This is why we'll transform it to human-readable text.
We'll achieve this by adding a type_to_string()
method inside our model like so:
# blog/models.py
class Article(models.Model):
title = models.CharField(max_length=256)
author = models.ForeignKey(to=User, on_delete=models.CASCADE)
type = models.CharField(max_length=2, choices=ARTICLE_TYPES, default='UN')
categories = models.ManyToManyField(to=Category, blank=True, related_name='categories')
content = models.TextField()
created_datetime = models.DateTimeField(auto_now_add=True)
updated_datetime = models.DateTimeField(auto_now=True)
# new
def type_to_string(self):
if self.type == 'UN':
return 'Unspecified'
elif self.type == 'TU':
return 'Tutorial'
elif self.type == 'RS':
return 'Research'
elif self.type == 'RW':
return 'Review'
def __str__(self):
return f'{self.author}: {self.title} ({self.created_datetime.date()})'
Without type_to_string()
our model would be serialized like this:
{
"title": "This is my article.",
"type": "TU",
...
}
After implementing type_to_string()
our model is serialized like this:
{
"title": "This is my article.",
"type": "Tutorial",
...
}
Now let's create the documents. Each document needs to have an Index
and Django
class. In the Index
class, we need to provide the index name and Elasticsearch index settings. In the Django
class, we tell the document which Django model to associate it to and provide the fields we want to be indexed.
blog/documents.py:
# blog/documents.py
from django.contrib.auth.models import User
from django_elasticsearch_dsl import Document, fields
from django_elasticsearch_dsl.registries import registry
from blog.models import Category, Article
@registry.register_document
class UserDocument(Document):
class Index:
name = 'users'
settings = {
'number_of_shards': 1,
'number_of_replicas': 0,
}
class Django:
model = User
fields = [
'id',
'first_name',
'last_name',
'username',
]
@registry.register_document
class CategoryDocument(Document):
id = fields.IntegerField()
class Index:
name = 'categories'
settings = {
'number_of_shards': 1,
'number_of_replicas': 0,
}
class Django:
model = Category
fields = [
'name',
'description',
]
@registry.register_document
class ArticleDocument(Document):
author = fields.ObjectField(properties={
'id': fields.IntegerField(),
'first_name': fields.TextField(),
'last_name': fields.TextField(),
'username': fields.TextField(),
})
categories = fields.ObjectField(properties={
'id': fields.IntegerField(),
'name': fields.TextField(),
'description': fields.TextField(),
})
type = fields.TextField(attr='type_to_string')
class Index:
name = 'articles'
settings = {
'number_of_shards': 1,
'number_of_replicas': 0,
}
class Django:
model = Article
fields = [
'title',
'content',
'created_datetime',
'updated_datetime',
]
Notes:
type
attribute to the ArticleDocument
.Article
model is in a many-to-many (M:N) relationship with Category
and a many-to-one (N:1) relationship with User
we needed to take care of the relationships. We did that by adding ObjectField
attributes.To create and populate the Elasticsearch index and mapping, use the search_index
command:
(env)$ python manage.py search_index --rebuild
Deleting index 'users'
Deleting index 'categories'
Deleting index 'articles'
Creating index 'users'
Creating index 'categories'
Creating index 'articles'
Indexing 3 'User' objects
Indexing 4 'Article' objects
Indexing 4 'Category' objects
You need to run this command every time you change your index settings.
django-elasticsearch-dsl created the appropriate database signals so that your Elasticsearch storage gets updated every time an instance of a model is created, deleted, or edited.
Before creating the appropriate views, let's look at how Elasticsearch queries work.
We first have to obtain the Search
instance. We do that by calling search()
on our Document like so:
from blog.documents import ArticleDocument
search = ArticleDocument.search()
Feel free to run these queries within the Django shell.
Once we have the Search
instance we can pass queries to the query()
method and fetch the response:
from elasticsearch_dsl import Q
from blog.documents import ArticleDocument
# Looks up all the articles that contain `How to` in the title.
query = 'How to'
q = Q(
'multi_match',
query=query,
fields=[
'title'
])
search = ArticleDocument.search().query(q)
response = search.execute()
# print all the hits
for hit in search:
print(hit.title)
We can also combine multiple Q statements like so:
from elasticsearch_dsl import Q
from blog.documents import ArticleDocument
"""
Looks up all the articles that:
1) Contain 'language' in the 'title'
2) Don't contain 'ruby' or 'javascript' in the 'title'
3) And contain the query either in the 'title' or 'description'
"""
query = 'programming'
q = Q(
'bool',
must=[
Q('match', title='language'),
],
must_not=[
Q('match', title='ruby'),
Q('match', title='javascript'),
],
should=[
Q('match', title=query),
Q('match', description=query),
],
minimum_should_match=1)
search = ArticleDocument.search().query(q)
response = search.execute()
# print all the hits
for hit in search:
print(hit.title)
Another important thing when working with Elasticsearch queries is fuzziness. Fuzzy queries are queries that allow us to handle typos. They use the Levenshtein Distance Algorithm which calculates the distance between the result in our database and the query.
Let's look at an example.
By running the following query we won't get any results, because the user misspelled 'django'.
from elasticsearch_dsl import Q
from blog.documents import ArticleDocument
query = 'djengo' # notice the typo
q = Q(
'multi_match',
query=query,
fields=[
'title'
])
search = ArticleDocument.search().query(q)
response = search.execute()
# print all the hits
for hit in search:
print(hit.title)
If we enable fuzziness like so:
from elasticsearch_dsl import Q
from blog.documents import ArticleDocument
query = 'djengo' # notice the typo
q = Q(
'multi_match',
query=query,
fields=[
'title'
],
fuzziness='auto')
search = ArticleDocument.search().query(q)
response = search.execute()
# print all the hits
for hit in search:
print(hit.title)
The user will get the correct result.
The difference between a full-text search and exact match is that full-text search runs an analyzer on the text before it gets indexed to Elasticsearch. The text gets broken down into different tokens, which are transformed to their root form (e.g., reading -> read). These tokens then get saved into the Inverted Index. Because of that, full-text search yields more results, but takes longer to process.
Elasticsearch has a number of additional features. To get familiar with the API, try implementing:
You can see all the Elasticsearch Search APIs here.
With that, let's create sime views. To make our code more DRY we can use the following abstract class in search/views.py:
# search/views.py
import abc
from django.http import HttpResponse
from elasticsearch_dsl import Q
from rest_framework.pagination import LimitOffsetPagination
from rest_framework.views import APIView
class PaginatedElasticSearchAPIView(APIView, LimitOffsetPagination):
serializer_class = None
document_class = None
@abc.abstractmethod
def generate_q_expression(self, query):
"""This method should be overridden
and return a Q() expression."""
def get(self, request, query):
try:
q = self.generate_q_expression(query)
search = self.document_class.search().query(q)
response = search.execute()
print(f'Found {response.hits.total.value} hit(s) for query: "{query}"')
results = self.paginate_queryset(response, request, view=self)
serializer = self.serializer_class(results, many=True)
return self.get_paginated_response(serializer.data)
except Exception as e:
return HttpResponse(e, status=500)
Notes:
serializer_class
and document_class
and override generate_q_expression()
.generate_q_expression()
query, fetch the response, paginate it, and return serialized data.All the views should now inherit from PaginatedElasticSearchAPIView
:
# search/views.py
import abc
from django.http import HttpResponse
from elasticsearch_dsl import Q
from rest_framework.pagination import LimitOffsetPagination
from rest_framework.views import APIView
from blog.documents import ArticleDocument, UserDocument, CategoryDocument
from blog.serializers import ArticleSerializer, UserSerializer, CategorySerializer
class PaginatedElasticSearchAPIView(APIView, LimitOffsetPagination):
serializer_class = None
document_class = None
@abc.abstractmethod
def generate_q_expression(self, query):
"""This method should be overridden
and return a Q() expression."""
def get(self, request, query):
try:
q = self.generate_q_expression(query)
search = self.document_class.search().query(q)
response = search.execute()
print(f'Found {response.hits.total.value} hit(s) for query: "{query}"')
results = self.paginate_queryset(response, request, view=self)
serializer = self.serializer_class(results, many=True)
return self.get_paginated_response(serializer.data)
except Exception as e:
return HttpResponse(e, status=500)
# views
class SearchUsers(PaginatedElasticSearchAPIView):
serializer_class = UserSerializer
document_class = UserDocument
def generate_q_expression(self, query):
return Q('bool',
should=[
Q('match', username=query),
Q('match', first_name=query),
Q('match', last_name=query),
], minimum_should_match=1)
class SearchCategories(PaginatedElasticSearchAPIView):
serializer_class = CategorySerializer
document_class = CategoryDocument
def generate_q_expression(self, query):
return Q(
'multi_match', query=query,
fields=[
'name',
'description',
], fuzziness='auto')
class SearchArticles(PaginatedElasticSearchAPIView):
serializer_class = ArticleSerializer
document_class = ArticleDocument
def generate_q_expression(self, query):
return Q(
'multi_match', query=query,
fields=[
'title',
'author',
'type',
'content'
], fuzziness='auto')
Lastly, let's create the URLs for our views:
# search.urls.py
from django.urls import path
from search.views import SearchArticles, SearchCategories, SearchUsers
urlpatterns = [
path('user/<str:query>/', SearchUsers.as_view()),
path('category/<str:query>/', SearchCategories.as_view()),
path('article/<str:query>/', SearchArticles.as_view()),
]
Then, wire up the app URLs to the project URLs:
# core/urls.py
from django.contrib import admin
from django.urls import path, include
urlpatterns = [
path('blog/', include('blog.urls')),
path('search/', include('search.urls')), # new
path('admin/', admin.site.urls),
]
Our web application is done. We can test our search endpoints by visiting the following URLs:
URL | Description |
---|---|
http://127.0.0.1:8000/search/user/mike/ | Returns user 'mike13' |
http://127.0.0.1:8000/search/user/jess_/ | Returns user 'jess_' |
http://127.0.0.1:8000/search/category/seo/ | Returns category 'SEO optimization' |
http://127.0.0.1:8000/search/category/progreming/ | Returns category 'Programming' |
http://127.0.0.1:8000/search/article/linux/ | Returns article 'Installing the latest version of Ubuntu' |
http://127.0.0.1:8000/search/article/java/ | Returns article 'Which programming language is the best?' |
Notice the typo with the fourth request. We spelled 'progreming', but still got the correct result thanks to fuzziness.
The path we took isn't the only way to integrate Django with Elasticsearch. There are a few other libraries you might want to check out:
In this tutorial, you learned the basics of working with Django REST Framework and Elasticsearch. You now know how to integrate them, create Elasticsearch documents and queries, and serve the data via a RESTful API.
Before launching your project in production, consider using one of the managed Elasticsearch services like Elastic Cloud, Amazon Elasticsearch Service, or Elastic on Azure. The cost of using a managed service will be higher than managing your own cluster, but they provide all of the infrastructure required for deploying, securing, and running Elasticsearch clusters. Plus, they'll handle version updates, regular backups, and scaling.
Grab the code from django-drf-elasticsearch repo on GitHub.
Original article source at: https://testdriven.io/
1668052161
What is CORS
CORS is Cross Origin Resource sharing.
Its a violation of Same origin policy
What is same-origin policy (SOP)
The same-origin policy (SOP) is a web security mechanism built into web browsers that influences how websites can access one another. The concept of same-origin policy was introduced by Netscape Navigator 2.02 in 1995,
Without SOP, a malicious website or web application could access another without restrictions. That would allow attackers to easily steal sensitive information from other websites or even perform actions on other sites without user consent.
SOP does not need to be turned on – it is automatically enabled in every browser that supports it.
The SOP mechanism was designed to protect against attacks such as cross-site request forgery (CSRF), which basically attempt to take advantage of vulnerabilities due to differing origins
What is Origin
Two URLs have the same origin if the protocol, port (if specified), and host are the same for both.
Eg : http://x.y.com/z/page.html
Why we Backend developers need CORS if its implemented at browser level?
Though we are just the backend developers working on backend application but we do have Front end application too .
Now Backend and Front end urls are usually
http://localhost:8080/rest/api
http://localhost:4200/xyz
We can clearly see difference in origin. Hence when ever Front end tries to call ur rest APi, It fails due to SOP policy saying No “Access-Control-Allow-Origin” header is present on requested resource. Origin 4200 is not allowed access.
Solution is CORS.
Use @CrossOrigin annotation or send Access-Control-Allow-Origin: http://localhost:4200 in your header to allow it
#cors #springboot #elasticsearch #interviewquestions
1667366760
All documentation for Elastica can be found under Elastica.io. If you have questions, don't hesitate to ask them on Stack Overflow and add the Tag "Elastica" or in our Gitter channel. All library issues should go to the issue tracker from GitHub.
This release is compatible with all Elasticsearch 7.0 releases and onwards.
The testsuite is run against the most recent minor version of Elasticsearch, currently 7.14.1.
Contributions are always welcome. For details on how to contribute, check the CONTRIBUTING file.
This project tries to follow Elasticsearch in terms of End of Life and maintenance since 5.x. It is generally recommended to use the latest point release of the relevant branch.
Unmaintained versions:
Author: ruflin
Source Code: https://github.com/ruflin/Elastica
License: MIT license
1666234687
From real-time search and event management to sophisticated analytics and logging at scale, Elasticsearch has a great number of uses. Getting Started with Elasticsearch course will help you learn the basics of Elasticsearch. If you already have a knowledge of Relational Databases and you are eager to learn Elasticsearch, then this course is for you. You will end your journey as a Elasticsearch Padawan.
You will begin learning Elasticsearch with a gentle introduction where you can setup your environment and launch your node of Elasticsearch for the first time. After that, we will dive into Create/Read/Update/Delete operations where you will grasp basics of Elasticsearch. All lectures are up to date with Elasticsearch 2.0.
What you’ll learn:
Are there any course requirements or prerequisites?
Who this course is for:
#elasticsearch #programming
1665036660
This is the official PHP client for Elasticsearch.
Using this client assumes that you have an Elasticsearch server installed and running.
You can install the client in your PHP project using composer:
composer require elasticsearch/elasticsearch
After the installation you can connect to Elasticsearch using the ClientBuilder
class. For instance, if your Elasticsearch is running on localhost:9200
you can use the following code:
use Elastic\Elasticsearch\ClientBuilder;
$client = ClientBuilder::create()
->setHosts(['localhost:9200'])
->build();
// Info API
$response = $client->info();
echo $response['version']['number']; // 8.0.0
The $response
is an object of Elastic\Elasticsearch\Response\Elasticsearch
class that implements ElasticsearchInterface
, PSR-7 ResponseInterface and ArrayAccess.
This means the $response
is a PSR-7 object:
echo $response->getStatusCode(); // 200
echo (string) $response->getBody(); // Response body in JSON
and also an "array", meaning you can access the response body as an associative array, as follows:
echo $response['version']['number']; // 8.0.0
var_dump($response->asArray()); // response body content as array
Moreover, you can access the response body as object, string or bool:
echo $response->version->number; // 8.0.0
var_dump($response->asObject()); // response body content as object
var_dump($response->asString()); // response body as string (JSON)
var_dump($response->asBool()); // true if HTTP response code between 200 and 300
Elasticsearch 8.0 offers security by default, that means it uses TLS for protect the communication between client and server.
In order to configure elasticsearch-php
for connecting to Elasticsearch 8.0 we need to have the certificate authority file (CA).
You can install Elasticsearch in different ways, for instance using Docker you need to execute the followind command:
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.0.1
Once you have the docker image installed, you can execute Elasticsearch, for instance using a single-node cluster configuration, as follows:
docker network create elastic
docker run --name es01 --net elastic -p 9200:9200 -p 9300:9300 -it docker.elastic.co/elasticsearch/elasticsearch:8.0.1
This command creates an elastic
Docker network and start Elasticsearch using the port 9200
(default).
When you run the docker image a password is generated for the elastic
user and it's printed to the terminal (you might need to scroll back a bit in the terminal to view it). You have to copy it since we will need to connect to Elasticsearch.
Now that Elasticsearch is running we can get the http_ca.crt
file certificate. We need to copy it from the docker instance, using the following command:
docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .
Once we have the http_ca.crt
certificate and the password
, copied during the start of Elasticsearch, we can use it to connect with elasticsearch-php
as follows:
$client = ClientBuilder::create()
->setHosts(['https://localhost:9200'])
->setBasicAuthentication('elastic', 'password copied during Elasticsearch start')
->setCABundle('path/to/http_ca.crt')
->build();
For more information about the Docker configuration of Elasticsearch you can read the official documentation here.
You can use Elastic Cloud as server with elasticsearch-php
. Elastic Cloud is the PaaS solution offered by Elastic.
For connecting to Elastic Cloud you just need the Cloud ID
and the API key
.
You can get the Cloud ID
from the My deployment
page of your dashboard (see the red rectangle reported in the screenshot).
You can generate an API key
in the Management
page under the section Security
.
When you click on Create API key
button you can choose a name and set the other options (for example, restrict privileges, expire after time, and so on).
After this step you will get the API key
in the API keys page.
IMPORTANT: you need to copy and store the API key
in a secure place, since you will not be able to view it again in Elastic Cloud.
Once you have collected the Cloud ID
and the API key
, you can use elasticsearch-php
to connect to your Elastic Cloud instance, as follows:
$client = ClientBuilder::create()
->setElasticCloudId('insert here the Cloud ID')
->setApiKey('insert here the API key')
->build();
The elasticsearch-php
client offers 400+ endpoints for interacting with Elasticsearch. A list of all these endpoints is available in the official documentation of Elasticsearch APIs.
Here we reported the basic operation that you can perform with the client: index, search and delete.
You can store (index) a JSON document in Elasticsearch using the following code:
use Elastic\Elasticsearch\Exception\ClientResponseException;
use Elastic\Elasticsearch\Exception\ServerResponseException;
$params = [
'index' => 'my_index',
'body' => [ 'testField' => 'abc']
];
try {
$response = $client->index($params);
} catch (ClientResponseException $e) {
// manage the 4xx error
} catch (ServerResponseException $e) {
// manage the 5xx error
} catch (Exception $e) {
// eg. network error like NoNodeAvailableException
}
print_r($response->asArray()); // response body content as array
Elasticsearch stores the {"testField":"abc"}
JSON document in the my_index
index. The ID
of the document is created automatically by Elasticsearch and stored in $response['_id']
field value. If you want to specify an ID
for the document you need to store it in $params['id']
.
You can manage errors using ClientResponseException
and ServerResponseException
. The PSR-7 response is available using $e->getResponse()
and the HTTP status code is available using $e->getCode()
.
Elasticsearch provides many different way to search documents. The simplest search that you can perform is a match query, as follows:
$params = [
'index' => 'my_index',
'body' => [
'query' => [
'match' => [
'testField' => 'abc'
]
]
]
];
$response = $client->search($params);
printf("Total docs: %d\n", $response['hits']['total']['value']);
printf("Max score : %.4f\n", $response['hits']['max_score']);
printf("Took : %d ms\n", $response['took']);
print_r($response['hits']['hits']); // documents
Using Elasticsearch you can perform different query search, for more information we suggest toread the official documention reported here.
You can delete a document specifing the index
name and the ID
of the document, as follows:
use Elastic\Elasticsearch\Exception\ClientResponseException;
try {
$response = $client->delete([
'index' => 'my_index',
'id' => 'my_id'
]);
} catch (ClientResponseException $e) {
if ($e->getCode() === 404) {
// the document does not exist
}
}
if ($response['acknowledge'] === 1) {
// the document has been delete
}
For more information about the Elasticsearch REST API you can read the official documentation here.
This client is versioned and released alongside Elasticsearch server.
To guarantee compatibility, use the most recent version of this library within the major version of the corresponding Enterprise Search implementation.
For example, for Elasticsearch 7.16
, use 7.16
of this library or above, but not 8.0
.
The 8.0.0 version of elasticsearch-php
contains a new implementation compared with 7.x. It supports PSR-7 for HTTP messages and PSR-18 for HTTP client communications.
We tried to reduce the BC breaks as much as possible with 7.x
but there are some (big) differences:
Elastic\Elasticsearch
Exception
model, using the namespace Elastic\Elasticsearch\Exception
. All the exceptions extends the ElasticsearchException
interface, as in 7.xConnectionPool
in NodePool
. The connection
naming was ambigous since the objects are nodes (hosts)You can have a look at the BREAKING_CHANGES file for more information.
If you need to mock the Elasticsearch client you just need to mock a PSR-18 HTTP Client.
For instance, you can use the php-http/mock-client as follows:
use Elastic\Elasticsearch\ClientBuilder;
use Elastic\Elasticsearch\Response\Elasticsearch;
use Http\Mock\Client;
use Nyholm\Psr7\Response;
$mock = new Client(); // This is the mock client
$client = ClientBuilder::create()
->setHttpClient($mock)
->build();
// This is a PSR-7 response
$response = new Response(
200,
[Elasticsearch::HEADER_CHECK => Elasticsearch::PRODUCT_NAME],
'This is the body!'
);
$mock->addResponse($response);
$result = $client->info(); // Just calling an Elasticsearch endpoint
echo $result->asString(); // This is the body!
We are using the ClientBuilder::setHttpClient()
to set the mock client. You can specify the response that you want to have using the addResponse($response)
function. As you can see the $response
is a PSR-7 response object. In this example we used the Nyholm\Psr7\Response
object from the nyholm/psr7 project. If you are using PHPUnit you can even mock the ResponseInterface
as follows:
$response = $this->createMock('Psr\Http\Message\ResponseInterface');
Notice: we added a special header in the HTTP response. This is the product check header, and it is required for guarantee that elasticsearch-php
is communicating with an Elasticsearch server 8.0+.
For more information you can read the Mock client section of PHP-HTTP documentation.
If something is not working as expected, please open an issue.
You can checkout the Elastic community discuss forums.
We welcome contributors to the project. Before you begin, some useful info...
8.0
please use the 8.0
branch, for 8.1
use the 8.1
branch and so on.master
unless you want to contribute to the development version of the client (master
represents the next major version).Thanks in advance for your contribution! :heart:
Author: Elastic
Source Code: https://github.com/elastic/elasticsearch-php
License: MIT license
1662694447
In today's post we will learn about 7 Favorite Node.js ElasticSearch Query Builder Libraries.
Elasticsearch query body builder is a query DSL (domain-specific language) or client that provides an API layer over raw Elasticsearch queries. It makes full-text search data querying and complex data aggregation easier, more convenient, and cleaner in terms of syntax.
An elasticsearch query body builder
npm install bodybuilder --save
var bodybuilder = require('bodybuilder')
var body = bodybuilder().query('match', 'message', 'this is a test')
body.build() // Build 2.x or greater DSL (default)
body.build('v1') // Build 1.x DSL
For each elasticsearch query body, create an instance of bodybuilder
, apply the desired query/filter/aggregation clauses, and call build
to retrieve the built query body.
Simplification of Elasticsearch interactions
npm i --save es-alchemy
Models definitions contain the fields of a model and their types. They restrict how an index can be put together.
Example: address.json
{
"fields": {
"id": "uuid",
"street": "string",
"city": "string",
"country": "string",
"centre": "point",
"area": "shape",
"timezone": "string"
}
}
Preferably a folder models
contains a json file for each model. An example can be found in the test folder.
Fields that can be used and how they get mapped in Opensearch can be found here.
A Node.js implementation of the elasticsearch Query DSL
npm install elastic-builder --save
const esb = require('elastic-builder'); // the builder
const requestBody = esb.requestBodySearch()
.query(esb.matchQuery('message', 'this is a test'));
// OR
const requestBody = new esb.RequestBodySearch().query(
new esb.MatchQuery('message', 'this is a test')
);
requestBody.toJSON(); // or print to console - esb.prettyPrint(requestBody)
{
"query": {
"match": {
"message": "this is a test"
}
}
}
For each class, MyClass
, a utility function myClass
has been provided which constructs the object for us without the need for new
keyword.
Geospatial queries used by the pelias api
$ npm install pelias-query
Variables are used as placeholders in order to pre-build queries before we know the final values which will be provided by the user.
note: Variables can only be Javascript primitive types: string
or numeric
or boolean
, plus array
. No objects allowed.
var query = require('pelias-query');
// create a new variable store
var vs = new query.Vars();
// set a variable
vs.var('input:name', 'hackney city farm');
// or
vs.var('input:name').set('hackney city farm');
// get a variable
var a = vs.var('input:name');
// get the primitive value of a variable
var a = vs.var('input:name');
a.get(); // hackney city farm
a.toString(); // hackney city farm
a.valueOf(); // hackney city farm
a.toJSON(); // hackney city farm
// check if a variable has been set
vs.isset('input:name'); // true
vs.isset('foo'); // false
// bulk set many variables
vs.set({
'boundary:rect:top': 1,
'boundary:rect:right': 2,
'boundary:rect:bottom': 2,
'boundary:rect:left': 1
});
// export variables for debugging
var dict = vs.export();
console.log( dict );
Simple query builder for elasticsearch
Example
var ESQ = require('esq');
var esq = new ESQ();
esq.query('bool', ['must'], { match: { user: 'kimchy' } });
esq.query('bool', 'minimum_should_match', 1);
var query = esq.getQuery();
Generates
{
"bool": {
"must": [
{
"match": {
"user": "kimchy"
}
}
],
"minimum_should_match": 1
}
}
<script src="esq.js"></script>
<script>
var esq = new ESQ();
esq.query('bool', ['must'], { match: { user: 'kimchy' } });
var query = esq.getQuery();
</script>
Like Mongoose but for Elasticsearch. Define models, preform CRUD operations, and build advanced search queries.
If you currently have npm elasticsearch installed, you can remove it and access it from client in this library if you still need it.
$ npm install elasticsearch-odm
You'll find the API is intuitive if you've used Mongoose or Waterline.
Example (no schema):
var elasticsearch = require('elasticsearch-odm');
var Car = elasticsearch.model('Car');
var car = new Car({
type: 'Ford', color: 'Black'
});
elasticsearch.connect('my-index').then(function(){In today's post we will learn about 7 Favorite Node.js ElasticSearch Query Builder Libraries.
// be sure to call connect before bootstrapping your app.
car.save().then(function(document){
console.log(document);
});
});
Example (using a schema):
var elasticsearch = require('elasticsearch-odm');
var carSchema = new elasticsearch.Schema({
type: String,
color: {type: String, required: true}
});
var Car = elasticsearch.model('Car', carSchema);
Query builder for elasticsearch (Node.js / Javascript)
$ npm install equery
var Query = require('equery');
var query = new Query();
var result = q.toJSON();
query
.sort('followers:desc')
.toJSON();
Thank you for following this article.
Introduction into the JavaScript Elasticsearch Client