Monty  Boehm

Monty Boehm

1678273080

ZincSearch: A Search Engine That Does Full Text indexing

ZincSearch

ZincSearch is a search engine that does full text indexing. It is a lightweight alternative to Elasticsearch and runs using a fraction of the resources. It uses bluge as the underlying indexing library.

It is very simple and easy to operate as opposed to Elasticsearch which requires a couple dozen knobs to understand and tune which you can get up and running in 2 minutes

It is a drop-in replacement for Elasticsearch if you are just ingesting data using APIs and searching using kibana (Kibana is not supported with zinc. Zinc provides its own UI).

Check the below video for a quick demo of Zinc.

Zinc Youtube

Playground Server

You could try ZincSearch without installing using below details:

  
Serverhttps://playground.dev.zincsearch.com
User IDadmin
PasswordComplexpass#123

Note: Do not store sensitive data on this server as its available to everyone on internet. Data will also be cleaned on this server regularly.

Why ZincSearch

While Elasticsearch is a very good product, it is complex and requires lots of resources and is more than a decade old. I built Zinc so it becomes easier for folks to use full text search indexing without doing a lot of work.

Features:

  1. Provides full text indexing capability
  2. Single binary for installation and running. Binaries available under releases for multiple platforms.
  3. Web UI for querying data written in Vue
  4. Compatibility with Elasticsearch APIs for ingestion of data (single record and bulk API)
  5. Out of the box authentication
  6. Schema less - No need to define schema upfront and different documents in the same index can have different fields.
  7. Index storage in disk (default), s3 or minio (deprecated)
  8. aggregation support

How to get support

Easiest way to get support is to join the Slack channel.

Roadmap items:

Public roadmap is available at https://github.com/orgs/zinclabs/projects/3/views/1

Please create an issue if you would like something to be added to the roadmap.

Screenshots

Search screen

Search screen

User management screen

Users screen

Getting started

Quickstart

Check Quickstart

Releases

ZincSearch currently has most of its API contracts frozen. It's data format may still experience changes as we improve things. Currently ZincSearch is in beta. Data format should become highly stable when we move to GA (version 1).

Editions

FeatureZincZinc Cloud
Ideal use caseApp searchLogs and Events (Immutable Data)
StorageDiskObject (S3), GCS, Azure blob coming soon
Preferred Use caseApp searchLog / event search
Max data supported100s of GBsPetabyte scale
High availabilityWill be available soonYes
Open sourceYesYes, ZincObserve
ES API compatibilitySearch and IngestionIngestion only
GUIBasicAdvanced for log search
CostFree (self hosting may cost money based on size)Generous free tier. 1 TB ingest / month free.
Get startedQuick startSign up

❗Note: If your use case is of log search (app and security logs) instead of app search (implement search feature in your application or website) then you should check zinclabs/zincobserve project that is specifically built for observability use case.


Download Details:

Author: Zinclabs
Source Code: https://github.com/zinclabs/zincsearch 
License: View license

#go #golang #search #elasticsearch #vuejs 

ZincSearch: A Search Engine That Does Full Text indexing
Royce  Reinger

Royce Reinger

1675695720

Federated, Open-source Data Catalog for All Your Big Data & Small Data

Magda

Magda is a data catalog system that will provide a single place where all of an organization's data can be catalogued, enriched, searched, tracked and prioritized - whether big or small, internally or externally sourced, available as files, databases or APIs. Magda is designed specifically around the concept of federation - providing a single view across all data of interest to a user, regardless of where the data is stored or where it was sourced from. The system is able to quickly crawl external data sources, track changes, make automatic enhancements and make notifications when changes occur, giving data users a one-stop shop to discover all the data that's available to them.

Magda Search Demo

Current Status

Magda is under active development by a small team - we often have to prioritise between making the open-source side of the project more robust and adding features to our own deployments, which can mean newer features aren't documented well, or require specific configuration to work. If you run into problems using Magda, we're always happy to help on GitHub Discussions.

As an open data search engine

Magda has been used in production for over a year by data.gov.au, and is relatively mature for use in this use case.

As a data catalogue

Over the past 18 months, our focus has been to develop Magda into a more general-purpose data catalogue for use within organisations. If you want to use it as a data catalog, please do, but expect some rough edges! If you'd like to contribute to the project with issues or PRs, we love to recieve them.

Features

  • Powerful and scalable search based on ElasticSearch
  • Quick and reliable aggregation of external sources of datasets
  • An unopinionated central store of metadata, able to cater for most metadata schemas
  • Federated authentication via passport.js - log in via Google, Facebook, WSFed, AAF, CKAN, and easily create new providers.
  • Based on Kubernetes for cloud agnosticism - deployable to nearly any cloud, on-premises, or on a local machine.
  • Easy (as long as you know Kubernetes) installation and upgrades
  • Extensions are based on adding new docker images to the cluster, and hence can be developed in any language

Currently Under Development

  • A heavily automated, quick and easy to use data cataloguing process intended to produce high-quality metadata for discovery
  • A robust, policy-based authorization system built on Open Policy Agent - write flexible policies to restrict access to datasets and have them work across the system, including by restricting search results to what you're allowed to see.
  • Storage of datasets

Our current roadmap is available at https://magda.io/docs/roadmap

Architecture

Magda is built around a collection of microservices that are distributed as docker containers. This was done to provide easy extensibility - Magda can be customised by simply adding new services using any technology as docker images, and integrating them with the rest of the system via stable HTTP APIs. Using Helm and Kubernetes for orchestration means that configuration of a customised Magda instance can be stored and tracked as plain text, and instances with identical configuration can be quickly and easily reproduced.

Magda Architecture Diagram

If you are interested in the architecture details of Magda, you might want to have a look at this doc.

Registry

Magda revolves around the Registry - an unopinionated datastore built on top of Postgres. The Registry stores records as a set of JSON documents called aspects. For instance, a dataset is represented as a record with a number of aspects - a basic one that records the name, description and so on as well as more esoteric ones that might not be present for every dataset, like temporal coverage or determined data quality. Likewise, distributions (the actual data files, or URLs linking to them) are also modelled as records, with their own sets of aspects covering both basic metadata once again, as well as more specific aspects like whether the URL to the file worked when last tested.

Most importantly, aspects are able to be declared dynamically by other services by simply making a call with a name, description and JSON schema. This means that if you have a requirement to store extra information about a dataset or distribution you can easily do so by declaring your own aspect. Because the system isn't opinionated about what a record is beyond a set of aspects, you can also use this to add new entities to the system that link together - for instance, we've used this to store projects with a name and description that link to a number of datasets.

Connectors

Connectors go out to external datasources and copy their metadata into the Registry, so that they can be searched and have other aspects attached to them. A connector is simply a docker-based microservice that is invoked as a job. It scans the target datasource (usually an open-data portal), then completes and shuts down. We have connectors for a number of existing open data formats, otherwise you can easily write and run your own.

Minions

A minion is a service that listens for new records or changes to existing records, performs some kind of operation and then writes the result back to the registry. For instance, we have a broken link minion that listens for changes to distributions, retrieves the URLs described, records whether they were able to be accessed successfully and then writes that back to the registry in its own aspect.

Other aspects exist that are written to by many minions - for instance, we have a "quality" aspect that contains a number of different quality ratings from different sources, which are averaged out and used by search.

Search

Datasets and distributions in the registry are ingested into an ElasticSearch cluster, which indexes a few core aspects of each and exposes an API.

User Interface

Magda provides a user interface, which is served from its own microservice and consumes the APIs. We're planning to make the UI itself extensible with plugins at some point in the future.

To try the last version (with prebuilt images)

If you just want to install a local testing version, installing Magda using Helm is relatively easier (you can use minikube to install a local k8s test cluster):

# Add Magda Helm Chart Repo:
helm repo add magda-io https://charts.magda.io

# create a namespace "magda" in your cluster
kubectl create namespace magda

# install Magda version v2.2.0 to namespace "magda", turn off openfass function and expose the service via loadBalancer
helm upgrade --namespace magda --install --version 2.2.0 --timeout 9999s --set magda-core.gateway.service.type=LoadBalancer magda magda-io/magda

You can find out the load balancer IP and access it:

echo $(kubectl get svc --namespace magda gateway --template "{{ range (index .status.loadBalancer.ingress 0) }}{{ . }}{{ end }}")

If you are interested in playing more, you might find useful docs from here. Particularly:

You might also want to have a look at this tutorial repo:

https://github.com/magda-io/magda-brown-bag

Or find out more on: https://magda.io/docs/building-and-running if you are interested in the development or play with the code.

To build and run from source

https://magda.io/docs/building-and-running

To get help with developing or running Magda

Start a discussion at https://github.com/magda-io/magda/discussions. There's not a lot on there yet, but we monitor it closely :).

Want to get help deploying it into your organisation?

Email us at contact@magda.io.

Want to contribute?

Great! Take a look at https://github.com/magda-io/magda/blob/master/.github/CONTRIBUTING.md :).

Documentation links

More documents can be found from the folder docs/docs/.

Download Details:

Author: Magda-io
Source Code: https://github.com/magda-io/magda 
License: Apache-2.0 license

#machinelearning #nodejs #kubernetes #elasticsearch 

Federated, Open-source Data Catalog for All Your Big Data & Small Data
Code  Geek

Code Geek

1674787523

Elasticsearch Tutorial | Kibana, Logstash & Elasticsearch

Welcome to our Elasticsearch Tutorial! This video series covers all aspects of Elastic Stack (ELK Stack) including introduction to Elasticsearch, installation of Elasticsearch and Kibana on MacOS, Elasticsearch REST API and searching Elasticsearch. We will also cover Logstash and stashing your first event. We will dive deep into the basics of search in Elasticsearch, query and filter context, and how to write our own queries in Elasticsearch. You will learn about compound queries in Elasticsearch and how to implement them. Additionally, we will cover full text queries and how to use Elasticsearch with the Python programming language. We will also cover Elasticsearch DSL with the Python programming language. This tutorial is perfect for anyone who wants to learn about Elasticsearch and the Elastic Stack. Watch this video series to learn about Elasticsearch and the Elastic Stack.

Subscribe: https://www.youtube.com/@ProgrammingKnowledge/featured 

#elasticsearch  

Elasticsearch Tutorial | Kibana, Logstash & Elasticsearch
Hunter  Krajcik

Hunter Krajcik

1673546580

Rails7-startkit: Launch App in minutes!

Rails 7. Start Kit

Rails 7 App with Preinstalled Tools is Ready in Minutes!

Why?

Usually It is difficult and time consuming to setup a typical Rails environment from scratch.

Since now if you have Ruby and Docker then you have working Rails environment in about 5 minutes without any manual efforts.

What is under the hood?

LogotypeDescriptionWhy it was added
dockerDockerHelps to keep all required services in containers. To have fast and predictable installation process in minutes
pgsqlPostgresSQLMost popular relation database
rubyRuby 3.2Most recent version of Ruby
rails7Rails 7Most recent version of Rails
gem configgem "config"Configuration management tool
elasticElasticsearchThe world’s leading Search engine
chewyChewyRuby Connector to Elasticsearch
redisRedisIn-memory data store. For caching and as a dependency of Sidekiq
sidekiqSidekiqJob Scheduler and Async Tasks Executor. Can be used as a stand alone tool or as ActiveJob backend
import-mapsImport MapsRails' recommended way to process JavaScript
pumaPumaApplication Web Server. To launch Rails app

What I'm going to add...

LogotypeDescriptionWhy it was added
kaminariKaminariPagination solution
deviseDeviseAuthentication solution for Rails
deviseDeviseLogin with Facebook and Google
deviseDevise and Action MailerSending emails for account confirmations
letter_openerLetter OpenerEmail previwer for development
wheneverwheneverLinux Cron based periodical tasks
rspecRSpecTesting Framework for Rails
rspecRubocopRuby static code analyzer (a.k.a. linter) and formatter.

All trademarks, logos and brand names are the property of their respective owners.

Prerequisites

On your host you have:

  • Ruby 2+
  • Docker
  • Git

How to start?

ONE!

git clone https://github.com/the-teacher/rails7-startkit.git

TWO!

cd rails7-startkit

THREE!

bin/setup

You will see something like that:

1. Launching PgSQL container
2. Launching ElasticSearch Container
3. Launching Rails container
4. Installing Gems. Please Wait
5. Create DB. Migrate DB. Create Seeds
6. Launching Redis Container
7. Indexing Article Model
8. Launching Rails App with Puma
9. Launching Sidekiq
10. Visit: http://localhost:3000

Index Page of the Projectrails7-startkit

bin/ commands

From the root of the project

CommandDescription
bin/setupDownload images, run containers, initialize data, launch all processes.
bin/openGet in Rails Container (`rails` by default)
bin/open railsGet in Rails Container
bin/open psqlGet in PgSQL Container
bin/open redisGet in Redis Container
bin/open elasticGet in ElasticSearch Container
bin/statusTo see running containers and launched services
bin/startStart everything if it is stopped
bin/stopStop processes in Rails container
bin/stop_allStop everything if it is running
bin/indexRun Search engines indexation
bin/resetReset data od services in ./db folder

Conventions and Agreements

For demonstration, education and maintainance purposes I use the following approach:

Data

  • All services' data related folders are placed in ./db
  • All folders are UPPERCASED
./db
├── ELASTIC
├── PGSQL
└── REDIS

Configuration Files

  • All services' configurations are placed in ./config
  • All configs are _UNDERSCORED and UPPERCASED
./config
├── _CONFIG.yml
├── _PUMA.rb
└── _SIDEKIQ.yml

Initialazers

  • All services' initializers are placed in ./config/initializers
  • All files are _UNDERSCORED and UPPERCASED
./config/initializers/
├── _CHEWY.rb
├── _CONFIG.rb
├── _REDIS.rb
└── _SIDEKIQ.rb

Rails user

As a user to own files and run Rails inside a container I use

user:group => lucky:lucky => 7777:7777

If you would like to run the project on a linux environment then:

  • create group lucky (7777) and user lucky (7777)
  • run the project with RUN_AS=7777:7777 option

How to Run Tests

From the root of the project

  bin/open rails

Now you are in the Rails container and you can do everything as usual

  RAILS_ENV=test rake db:create
  rake test

Questions and Answers

What is an idea of this project?
 

For many years Rails gives you freedom to choose development tools. Different databases, different paginators, different search engines, different delayed job solutions.

It is great. But all the time you need to choose something and install it from scratch.

I think I did my choice about many solutions and tools.

I want to install my minimal pack of tools now and reuse my StartKit every time when I start a new project.

With Docker I can roll out my minimal application with all required preinstalled tools in minutes, not in hours or in days.

Why did you create this project?
 

I didn't work with Rails last 4 or 5 years. I wanted to learn new approaches and techniques. I found that there is still no a simple way to setup a blank app with most popular tools.

So. Why not to make my own playground?

How do you choose technologies for the StartKit?
 

I use tools that I like or want to learn.

I use tools that I think are the most popular ones.

It looks good for development. What about production?
 

I'm not a DevOps, but I have a vision how to deploy this code to production.

Right now it is not described somehow. It is in my plans.

TODO

Download Details:

Author: the-teacher
Source Code: https://github.com/the-teacher/rails7-startkit 
License: MIT

#ruby #rails #docker #redis #elasticsearch #sphinx 

Rails7-startkit: Launch App in minutes!
Gordon  Murray

Gordon Murray

1673462400

Send Github Commits and PR Logs to ElasticSearch using A Custom Script

Hello Readers!! In this blog, we will see how we can send GitHub commits and PR logs to Elasticsearch using a custom script. Here we will use a bash script that will send GitHub logs to elasticsearch. It will create an index in elasticsearch and push there the logs.

After sending logs to elasticsearch we can visualize the following github events in kibana:-

  • Track commit details made to the GitHub repository
  • Track events related to PRs  in the GitHub repository in a timestamp
  • Analyze relevant information related to the GitHub repository

workflow

1. GitHub User: Users will be responsible for performing actions in a GitHub repository like commits and pull requests.

2. GitHub Repository: Source Code Management system on which users will perform actions.

3. GitHub Action:  Continuous integration and continuous delivery (CI/CD) platform which will run each time when a GitHub user will commit any change and make a pull request.

4. Bash Script: The custom script is written in bash for shipping GitHub logs to Elasticsearch.

5. ElasticSearch: Stores all of the logs in the index created.

6. Kibana: Web interface for searching and visualizing logs.

Steps for sending logs to Elasticsearch using bash script: 

1. GitHub users will make commits and raise pull requests to the GitHub repository. Here is my GitHub repository which I have created for this blog.

https://github.com/NaincyKumariKnoldus/Github_logs

github repo

2. Create two Github actions in this repository. This GitHub action will get trigger on the events perform by the GitHub user.

github actions

GitHub action workflow file for getting trigger on commit events:

commit_workflow.yml:

# The name of the workflow
name: CI
#environment variables
env:
    GITHUB_REF_NAME: $GITHUB_REF_NAME
    ES_URL: ${{ secrets.ES_URL }}
 
# Controls when the workflow will run
on: [push]
#A job is a set of steps in a workflow
jobs:
    send-push-events:
        name: Push Logs to ES
        #The job will run on the latest version of an Ubuntu Linux runner.
        runs-on: ubuntu-latest
        steps:
           #This is an action that checks out your repository onto the runner, allowing you to run scripts
           - uses: actions/checkout@v2
           #The run keyword tells the job to execute a command on the runner
           - run: ./git_commit.sh

GitHub action workflow file for getting trigger on pull events:

pr_workflow.yml:

name: CI
 
env:
  GITHUB_REF_NAME: $GITHUB_REF_NAME
  ES_URL: ${{ secrets.ES_URL }}
 
on: [pull_request]
jobs:
  send-pull-events:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - run: ./git_pr.sh

3. Create two files inside your GitHub repository for putting bash scripts. Following is the bash script for shipping GitHub logs to Elasticsearch. This script will get executed by the GitHub actions mentioned above.

git_commit.sh will get triggered by GitHub action workflow file commit_workflow.yml:

#!/bin/bash

# get github commits
getCommitResponse=$(
   curl -s \
      -H "Accept: application/vnd.github+json" \
      -H "X-GitHub-Api-Version: 2022-11-28" \
      "https://api.github.com/repos/NaincyKumariKnoldus/Github_logs/commits?sha=$GITHUB_REF_NAME&per_page=100&page=1"
)

# get commit SHA
commitSHA=$(echo "$getCommitResponse" |
   jq '.[].sha' |
   tr -d '"')

# get the loop count based on number of commits
loopCount=$(echo "$commitSHA" |
   wc -w)
echo "loopcount= $loopCount"

# get data from ES
getEsCommitSHA=$(curl -H "Content-Type: application/json" -X GET "$ES_URL/github_commit/_search?pretty" -d '{
                  "size": 10000,                                                                  
                  "query": {
                     "wildcard": {
                           "commit_sha": {
                              "value": "*"
                           }}}}' |
                  jq '.hits.hits[]._source.commit_sha' |
                  tr -d '"')

# store ES commit sha in a temp file
echo $getEsCommitSHA | tr " " "\n" > sha_es.txt

# looping through each commit detail
for ((count = 0; count < $loopCount; count++)); do
   
   # get commitSHA
   commitSHA=$(echo "$getCommitResponse" |
      jq --argjson count "$count" '.[$count].sha' |
      tr -d '"')

   # match result for previous existing commit on ES
   matchRes=$(grep -o $commitSHA sha_es.txt)
   echo $matchRes | tr " " "\n" >> match.txt

   # filtering and pushing unmatched commit sha details to ES
   if [ -z $matchRes ]; then
      echo "Unmatched SHA: $commitSHA"
      echo $commitSHA | tr " " "\n" >> unmatch.txt
      
      # get author name
      authorName=$(echo "$getCommitResponse" |
         jq --argjson count "$count" '.[$count].commit.author.name' |
         tr -d '"')

      # get commit message
      commitMessage=$(echo "$getCommitResponse" |
         jq --argjson count "$count" '.[$count].commit.message' |
         tr -d '"')

      # get commit html url
      commitHtmlUrl=$(echo "$getCommitResponse" |
         jq --argjson count "$count" '.[$count].html_url' |
         tr -d '"')

      # get commit time
      commitTime=$(echo "$getCommitResponse" |
         jq --argjson count "$count" '.[$count].commit.author.date' |
         tr -d '"')

      # send data to es
      curl -X POST "$ES_URL/github_commit/commit" \
         -H "Content-Type: application/json" \
         -d "{ \"commit_sha\" : \"$commitSHA\",
            \"branch_name\" : \"$GITHUB_REF_NAME\",
            \"author_name\" : \"$authorName\",
            \"commit_message\" : \"$commitMessage\",
            \"commit_html_url\" : \"$commitHtmlUrl\",
            \"commit_time\" : \"$commitTime\" }"
   fi
done

# removing temporary file
rm -rf sha_es.txt
rm -rf match.txt
rm -rf unmatch.txt

git_pr.sh will get triggered by GitHub action workflow file pr_workflow.yml:

#!/bin/bash

# get github PR details
getPrResponse=$(curl -s \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  "https://api.github.com/repos/NaincyKumariKnoldus/Github_logs/pulls?state=all&per_page=100&page=1")

# get number of PR
totalPR=$(echo "$getPrResponse" |
  jq '.[].number' |
  tr -d '"')

# get the loop count based on number of PRs
loopCount=$(echo "$totalPR" |
  wc -w)
echo "loopcount= $loopCount"

# get data from ES
getEsPR=$(curl -H "Content-Type: application/json" -X GET "$ES_URL/github_pr/_search?pretty" -d '{
                  "size": 10000,                                                                  
                  "query": {
                     "wildcard": {
                           "pr_number": {
                              "value": "*"
                           }}}}' |
                  jq '.hits.hits[]._source.pr_number' |
                  tr -d '"')

# store ES PR number in a temp file
echo $getEsPR | tr " " "\n" > sha_es.txt

# looping through each PR detail
for ((count = 0; count < $loopCount; count++)); do

  # get PR_number
  totalPR=$(echo "$getPrResponse" |
    jq --argjson count "$count" '.[$count].number' |
    tr -d '"')
  
  # looping through each PR detail
  matchRes=$(grep -o $totalPR sha_es.txt)
  echo $matchRes | tr " " "\n" >>match.txt

  # filtering and pushing unmatched PR number details to ES
  if [ -z $matchRes ]; then
    # get PR html url
    PrHtmlUrl=$(echo "$getPrResponse" |
      jq --argjson count "$count" '.[$count].html_url' |
      tr -d '"')

    # get PR Body
    PrBody=$(echo "$getPrResponse" |
      jq --argjson count "$count" '.[$count].body' |
      tr -d '"')

    # get PR Number
    PrNumber=$(echo "$getPrResponse" |
      jq --argjson count "$count" '.[$count].number' |
      tr -d '"')

    # get PR Title
    PrTitle=$(echo "$getPrResponse" |
      jq --argjson count "$count" '.[$count].title' |
      tr -d '"')

    # get PR state
    PrState=$(echo "$getPrResponse" |
      jq --argjson count "$count" '.[$count].state' |
      tr -d '"')

    # get PR created at
    PrCreatedAt=$(echo "$getPrResponse" |
      jq --argjson count "$count" '.[$count].created_at' |
      tr -d '"')

    # get PR closed at
    PrCloseAt=$(echo "$getPrResponse" |
      jq --argjson count "$count" '.[$count].closed_at' |
      tr -d '"')

    # get PR merged at
    PrMergedAt=$(echo "$getPrResponse" |
      jq --argjson count "$count" '.[$count].merged_at' |
      tr -d '"')

    # get base branch name
    PrBaseBranch=$(echo "$getPrResponse" |
      jq --argjson count "$count" '.[$count].base.ref' |
      tr -d '"')

    # get source branch name
    PrSourceBranch=$(echo "$getPrResponse" |
      jq --argjson count "$count" '.[$count].head.ref' |
      tr -d '"')

    # send data to es
    curl -X POST "$ES_URL/github_pr/pull_request" \
      -H "Content-Type: application/json" \
      -d "{ \"pr_number\" : \"$PrNumber\",
            \"pr_url\" : \"$PrHtmlUrl\",
            \"pr_title\" : \"$PrTitle\",
            \"pr_body\" : \"$PrBody\",
            \"pr_base_branch\" : \"$PrBaseBranch\",
            \"pr_source_branch\" : \"$PrSourceBranch\",
            \"pr_state\" : \"$PrState\",
            \"pr_creation_time\" : \"$PrCreatedAt\",
            \"pr_closed_time\" : \"$PrCloseAt\",
            \"pr_merge_at\" : \"$PrMergedAt\"}"
  fi
done

# removing temporary file
rm -rf sha_es.txt
rm -rf match.txt
rm -rf unmatch.txt

4. Now make a push in the GitHub repository. After making a commit, GitHub action on push will run and it will send commit logs to elasticsearch.

commit action

Move to your elasticsearch for getting GitHub commits logs there.

es_data

We are now getting GitHub commits here.

5. Now raise a pull request in your GitHub repository. It will also run GitHub action on pull and this will trigger the bash script which will push pull request logs to elasticsearch.

pull

GitHub action got executed on the pull request:

github action

Now, move to elasticsearch and you will find pull request logs there.

es_pull data

6. We can visualize these logs in kibana also.

GitHub commit logs in kibana:

kibana data

GitHub pull request logs in kibana:

kibana

This is how we can analyze our GitHub logs in elasticsearch and kibana using the custom script.

We are all done now!!

Conclusion:

Thank you for sticking to the end. In this blog, we have learned how we can send GitHub commits and PR logs to Elasticsearch using a custom script. This is really very quick and simple. If you like this blog, please share my blog and show your appreciation by giving thumbs-ups, and don’t forget to give me suggestions on how I can improve my future blogs that can suit your needs.

Original article source at: https://blog.knoldus.com/

#script #github #elasticsearch #log 

Send Github Commits and PR Logs to ElasticSearch using A Custom Script
Nigel  Uys

Nigel Uys

1673440680

Pfelk: PfSense/OPNsense + Elastic Stack

Elastic Integration

pfSense/OPNsense + Elastic Stack

pfelk dashboard

Prerequisites

  • Ubuntu Server v18.04+ or Debian Server 9+ (stretch and buster tested)
  • pfSense v2.4.4+ or OPNsense 19.7.4+
  • Minimum of 8GB of RAM but recommend 32GB (WiKi Reference)
  • Setting up remote logging (WiKi Reference)

pfelk is a highly customizable open-source tool for ingesting and visualizing your firewall traffic with the full power of Elasticsearch, Logstash and Kibana.

Key features:

ingest and enrich your pfSense/OPNsense firewall traffic logs by leveraging Logstash

search your indexed data in near-real-time with the full power of the Elasticsearch

visualize you network traffic with interactive dashboards, Maps, graphs in Kibana

Supported entries include:

  • pfSense/OPNSense setups
  • TCP/UDP/ICMP protocols
  • DHCP message types with dashboard (dhcpdv4)
  • IPv4/IPv6 mapping
  • pfSense CARP data
  • openVPN log parsing
  • Unbound DNS Resolver with dashboard and Kibana SIEM compliance
  • Suricata IDS with dashboard and Kibana SIEM compliance
  • Snort IDS with dashboard and Kibana SIEM compliance
  • Squid with dashboard and Kibana SIEM compliance
  • HAProxy with dashboard
  • Captive Portal with dashboard
  • NGINX with dashboard

pfelk aims to replace the vanilla pfSense/OPNsense web UI with extended search and visualization features. You can deploy this solution via ansible-playbook, docker-compose, bash script, or manually.

How pfelk works?

  • How pfelk works

Quick start

Installation

ansible-playbook

  • Clone the ansible-pfelk repository
  • $ ansible-playbook -i hosts --ask-become deploy-stack.yml

docker-compose

  • Clone the docker-pfelk repository
  • Setup MaxMind
  • $ docker-compose up
  • YouTube Guide

script installation method

  • Download installer script from pfelk repository
  • $ wget https://raw.githubusercontent.com/pfelk/pfelk/main/etc/pfelk/scripts/pfelk-installer.sh
  • Make script executable
  • $ chmod +x pfelk-installer.sh
  • Run installer script
  • $ sudo ./pfelk-installer.sh
  • Configure Security here
  • Finish Configuring here
  • YouTube Guide

manual installation method

Roadmap

This is the experimental public roadmap for the pfelk project.

See the roadmap »

Comparison to similar solutions

Comparisions »

Contributing

Please reference to the CONTRIBUTING file. Collectively we can enhance and improve this product. Issues, feature requests, PRs, and documentation contributions are encouraged and welcomed!


https://docs.elastic.co/en/integrations/pfsense


Download Details:

Author: pfelk
Source Code: https://github.com/pfelk/pfelk 
License: View license

#ansible #visualization #docker #elasticsearch 

Pfelk: PfSense/OPNsense + Elastic Stack
Gordon  Matlala

Gordon Matlala

1671716040

Backup and Restore Elasticsearch using Snapshots

Introduction

Hello everyone! Today in this blog, we will learn how to backup and restore Elasticsearch using snapshots. Before diving in, let’s first brush up on the basics of the topic.

Elasticsearch at a glance

  • It is a search and analytics engine
  • It is based on NoSQL technology
  • It exposes REST API instead of CLI to perform various operations
  • It is a combination of different nodes such as data, master, ingest, and client connected together.

Backup strategy at Elasticsearch

  • Elasticsearch uses snapshots
  • A snapshot is a backup taken from a running Elasticsearch cluster
  • Repositories are used to store snapshots
  • You must register a repository before you perform snapshot and restore operations
  • Repositories can be either local or remote
  • Different types of repositories supported by Elasticsearch are as follows:
    • Windows shares using Microsoft UNC path
    • NFS on Linux
    • Directory on Single Node Cluster
    • AWS
    • Azure Cloud
    • HPFS for Hadoop

Demo

  • First, we should have elasticsearch up and running. To check the status, use the command –
    • sudo systemctl status elasticsearch

You should be seeing the following output –

elasticsearch

  • Next, we’ll make a directory where we’ll be storing all our snapshots.
    • mkdir elasticsearch-backup
  • We need to make sure that the service elasticsearch can write into this directory. To give write permissions to the directory, use the command –
    • sudo chown -R elasticsearch:elasticsearch elasticsearch-backup
  • We need to give the path of our directory to elasticsearch. So, we need to make these changes in the /etc/elasticsearch/elasticsearch.yml file.

addpath

  • Restart the service using the following command –
    • sudo systemctl restart elasticsearch.service
  • Now, we need to create the repository. Use the following command to create the repo –

snapshots

snapshots

indices

snapshots

snapshots

Restore

As now we have successfully taken the backup of our indices, let us just make sure if we’re able to retrieve the data if it gets lost. So, let us first delete our data using the following command –

Now, if you’ll check, all the data must have been gone. So, let us try to restore our data using the snapshots we created.

  • curl -XGET ‘http:localhost:9200/_snapshot/elasticsearch-backup/first-snapshot/_restore?wait_for_completion=true’

The above command will successfully restore all the lost or deleted data.

That’s it for now. I hope this article was useful to you. Please feel free to drop any comments, questions, or suggestions.

Original article source at: https://blog.knoldus.com/

#backup #elasticsearch #snapshot 

Backup and Restore Elasticsearch using Snapshots
Oral  Brekke

Oral Brekke

1670465582

Configure ElasticSearch and Speed Up WordPress

How to Configure ElasticSearch and Speed Up WordPress. In this guide you going to learn how to install ElasticSearch and configure it with your WordPress and optimize the search queries using ElasticPress WordPress plugin.

If your site is having more search queries then you should consider using a search engine for your site. ElasticSearch is a full-text search engine which indexes your data and searches them very quickly.

This setup is tested on Google Cloud and AWS. So, you can use this guide to setup ElasticSearch on any VPS or any other cloud or any dedicated servers.

Prerequisites for AWS

  1. A running EC2 Instance. Learn how to create an AWS EC2 instance.
  2. Assigned a Elastic IP to your EC2 Instance.
  3. Setup and configure Route 53 and point your domain to AWS.
  4. Successful SSH connection to your EC2 Instance.
  5. Make sure your machine has a minimum requirement of 3 GB RAM

SSH to your EC2 Instance and perform the steps listed below.

Configure Firewall

Elasticsearch runs on port 9200, so it is necessary to open this port for the setup to work.

Go to your Security group and create a rule to allow connections from your IP address on this port.

If you have configured UFW on your server, you need to add rule on this too.

sudo ufw allow from IP_ADDRESS to any port 9200

Make sure to update the IP_ ADDRESS with your server’s public IP.

Install Java

Java is necessary to install ElasticSearch. Install Java JDK using the following command.

sudo apt install openjdk-8-jdk

Configure Java Envitonment variable

Use the update-alternatives command to get the installation path of your Java version.

sudo update-alternatives --config java

OpenJDK 8 is located at /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

Copy the installation path of your default version and add it in the JAVA_HOME environment variable.

sudo nano /etc/environment

At the end of this file, add the following line with your installation path. To use the official Java 8 by Oracle the variable will be as follows.

JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java"

Hit Ctrl+X followed by Y and Enter to save and exit the nano editor.

Now JAVA_HOME environment variable is set and available for all users.

Reload to apply changes.

source /etc/environment

To verify the environment variable of Java

echo $JAVA_HOME

You will get the installation path you just set.

Now Java is successfully installed and you can install Elasticsearch.

Install ElasticSearch

Import ElasticSearch repository’s GPG key.

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

Add the repository to the sources list of your Ubuntu server or system.

echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

Update the package list and install ElasticSearch.

sudo apt update
sudo apt install elasticsearch

Once Elasticsearch is installed you can restrict port 9200 from outside access by editing the elasticsearch.yml file and uncomment the network.host and replace the value with localhost.

sudo nano /etc/elasticsearch/elasticsearch.yml 

So it looks looks like this..

network.host: localhost

Hit Ctrl+X followed by Y and Enter to save the file and exit.

Now start and enable Elasticsearch on server boot.

sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch

Now make sure your Elasticsearch service is running.

sudo systemctl status elasticsearch

Test your installation by sending a HTTP request.

curl -X GET "localhost:9200"

You will get a response with name, cluster_name, cluster_uuid, version.

Configure WordPress

Login to your WordPress admin and go to Plugins >> Add new, search for ElasticPress and install it and activate.

Once activated go to ElasticPress >> Settings and add the Elasticsearch URL (http://localhost:9200) and save the settings.

Once the settings are saved click the sync icon on the top tight near the gear icon to sync the content. Next you can view the Index health of your setup.

Now you have configured Elasticsearh on your WordPress website to speed up search queries.

Learn the most Advanced Techniques of WordPress with this easy to learn course now.

Conclusion

Now you have learned how to install and configure ElasticSearch for your WordPress website

Thanks for your time. If you face any problem or any feedback, please leave a comment below.

Original article source at: https://www.cloudbooklet.com/

#wordpress #elasticsearch 

Configure ElasticSearch and Speed Up WordPress
Bongani  Ngema

Bongani Ngema

1669955460

Install Elasticsearch on Ubuntu 22.04 with SSL

How to Install Elasticsearch on Ubuntu 22.04 with SSL. Elasticsearch 8 is a powerful scalable real time distributed search and data analysis. Here you will learn how to configure SSL to your Elasticsearch installation with Nginx reverse proxy on Ubuntu 22.04.

You will create a subdomain for your Elasticsearch service and install free Let’s Encrypt SSL certificate using Certbot.

This setup is tested on Google Cloud Platform running Ubuntu 22.04 LTS. So this guide will work perfect on other cloud service providers like AWS, Azure or any VPS or dedicated servers.

Prerequisites

  • A server with minimum 2GB RAM and 2vCPU
  • A user with sudo privileges.

Initial Server Setup

Start by updating the server software packages to the latest version available.

sudo apt update 
sudo apt upgrade

Configure Sub-Domain

Make sure you use a sub-domain to access your Elasticsearch installation.

Go to your DNS management section and create a new A record with the name of you wish for your subdomain (for example search) and value of your your server IP address.

So your sub-domain will look similar to the one below. If you wish to configure your main domain you can do that also.

search.yourdomain.com

Step 1: Install ElasticSearch

Java is already included with the Elasticsearch package, so you don’t want to install Java manually. Learn more about installing Java on Ubuntu 22.04.

Here we will install Elasticsearch 8.

Start by importing Elasticsearch repository’s GPG key.

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg

Add the repository to the sources list of your Ubuntu server or system.

echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

Update the package list and install ElasticSearch.

sudo apt update
sudo apt install elasticsearch

Once the installation is completed you will receive the super user password, please note that and secure it.

------------------- Security autoconfiguration information ----------------------

Authentication and authorization are enabled.
TLS for the transport and HTTP layers is enabled and configured.

The generated password for the elastic built-in superuser is : houbJ1uivo5b=aVYYPa5

If this node should join an existing cluster, you can reconfigure this with
'/usr/share/elasticsearch/bin/elasticsearch-reconfigure-node --enrollment-token <token-here>'
after creating an enrollment token on your existing cluster.

You can complete the following actions at any time:

Reset the password of the elastic built-in superuser with 
'/usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic'.

Generate an enrollment token for Kibana instances with 
 '/usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s kibana'.

Generate an enrollment token for Elasticsearch nodes with 
'/usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -s node'.

---------------------------------------------------------------------------------

Elasticsearch service is not started automatically upon installation, you need execute the below commands to configure Elasticsearch service to start automatically using systemd.

sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.service
sudo systemctl start elasticsearch.service

Once Elasticsearch is installed you can restrict port 9200 from outside access by editing the elasticsearch.yml file and uncomment the network.host and replace the value with Internal IP or any IP or localhost.

sudo nano /etc/elasticsearch/elasticsearch.yml 

So it looks looks like this..

network.host: INTERNAL_IP

You can also use localhost as host or any IP address you wish.

Hit Ctrl+X followed by Y and Enter to save the file and exit.

Now start and enable Elasticsearch on server boot.

sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch

Now make sure your Elasticsearch service is running.

sudo systemctl status elasticsearch

Step 2: Verify if Elasticsearch works

Test your installation by sending a HTTPs request by attaching the certificate using the below command.

Take note of the password you received earlier, you will need to use that while prompted.

sudo su
curl --cacert /etc/elasticsearch/certs/http_ca.crt -u elastic https://INTERNAL_IP:9200

Enter the password while prompted.

You will receive a response as shown below.

{
  "name" : "elasticsearch-vm",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "vGrj3z4rQEWRBUdd9IhZWA",
  "version" : {
    "number" : "8.2.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "9876968ef3c745186b94fdabd4483e01499224ef",
    "build_date" : "2022-05-25T15:47:06.259735307Z",
    "build_snapshot" : false,
    "lucene_version" : "9.1.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

Step 3: Install and Configure Nginx for Elasticsearch

Now it’s time to install and configure Nginx. Execute the below command to install Nginx.

sudo apt install nginx

Now you can configure Nginx reverse proxy fro your Elasticsearch.

Remove default configurations

sudo rm /etc/nginx/sites-available/default
sudo rm /etc/nginx/sites-enabled/default

Create a new Nginx configuration file.

sudo nano /etc/nginx/sites-available/search.conf

Paste the following.

Note: You need to use exact same IP or localhost that you used in the host of Elasticsearch configuration.

server {
     listen [::]:80;
     listen 80;

     server_name search.yourdomain.com;

location / {
     proxy_pass http://INTERNAL_IP:9200;
     proxy_redirect off;
     proxy_read_timeout    90;
     proxy_connect_timeout 90;
     proxy_set_header  X-Real-IP  $remote_addr;
     proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
     proxy_set_header  Host $http_host;
} 

Save and exit the file.

Enable your configuration by creating a symbolic link.

sudo ln -s /etc/nginx/sites-available/search.conf /etc/nginx/sites-enabled/search.conf

Step 4: Install Let’s Encrypt SSL

HTTPS is a protocol for secure communication between a server (instance) and a client (web browser). Due to the introduction of Let’s Encrypt, which provides free SSL certificates, HTTPS are adopted by everyone and also provides trust to your audiences.

sudo apt install python3-certbot-nginx

Now we have installed Certbot by Let’s Encrypt for Ubuntu 22.04, run this command to receive your certificates.

sudo certbot --nginx --agree-tos --no-eff-email --redirect -m youremail@email.com -d search.domainname.com

This command will install Free SSL, configure redirection to HTTPS and restarts the Nginx server.

Step 5: Renewing SSL Certificate

Certificates provided by Let’s Encrypt are valid for 90 days only, so you need to renew them often. So, let’s test the renewal feature using the following command.

sudo certbot renew --dry-run

This command will test the certificate expiry and configures the auto-renewable feature.

Prepare yourself for a role working as an Information Technology Professional with Linux operating system

Conclusion

Now you have learned how to install Elasticsearch 8 and secure it with Let’s Encrypt free ssl on Ubuntu 22.04.

Thanks for your time. If you face any problem or any feedback, please leave a comment below.

Original article source at: https://www.cloudbooklet.com/

#elasticsearch #ubuntu #ssl 

Install Elasticsearch on Ubuntu 22.04 with SSL
Sheldon  Grant

Sheldon Grant

1668836834

How to integrate Django REST Framework with Elasticsearch

In this tutorial, we'll look at how to integrate Django REST Framework (DRF) with Elasticsearch. We'll use Django to model our data and DRF to serialize and serve it. Finally, we'll index the data with Elasticsearch and make it searchable.

What is Elasticsearch?

Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. It's known for its simple RESTful APIs, distributed nature, speed, and scalability. Elasticsearch is the central component of the Elastic Stack (also known as the ELK Stack), a set of free and open tools for data ingestion, enrichment, storage, analysis, and visualization.

Its use cases include:

  1. Site search and application search
  2. Monitoring and visualizing your system metrics
  3. Security and business analytics
  4. Logging and log analysis

To learn more about Elasticsearch check out What is Elasticsearch? from the official documentation.

Elasticsearch Structure and Concepts

Before working with Elasticsearch, we should get familiar with the basic Elasticsearch concepts. These are listed from biggest to smallest:

  1. Cluster is a collection of one or more nodes.
  2. Node is a single server instance that runs Elasticsearch. While communicating with the cluster, it:
    1. Stores and indexes your data
    2. Provides search
  3. Index is used to store the documents in dedicated data structures corresponding to the data type of fields (akin to a SQL database). Each index has one or more shards and replicas.
  4. Type is a collection of documents, which have something in common (akin to a SQL table).
  5. Shard is an Apache Lucene index. It's used to split indices and keep large amounts of data manageable.
  6. Replica is a fail-safe mechanism and basically a copy of your index's shard.
  7. Document is a basic unit of information that can be indexed (akin to a SQL row). It's expressed in JSON, which is a ubiquitous internet data interchange format.
  8. Field is the smallest individual unit of data in Elasticsearch (akin to a SQL column).

The Elasticsearch cluster has the following structure:

Elasticsearch cluster structure

Curious how relational database concepts relate to Elasticsearch concepts?

Relational DatabaseElasticsearch
ClusterCluster
RDBMS InstanceNode
TableIndex
RowDocument
ColumnField

Review Mapping concepts across SQL and Elasticsearch for more on how concepts in SQL and Elasticsearch relate to one another.

Elasticsearch vs PostgreSQL Full-text Search

With regards to full-text search, Elasticsearch and PostgreSQL both have their advantages and disadvantages. When choosing between them you should consider speed, query complexity, and budget.

PostgreSQL advantages:

  1. Django support
  2. Faster and easier to setup
  3. Doesn't require maintenance

Elasticsearch advantages:

  1. Optimized just for searching
  2. Elasicsearch is faster (especially as the number of records increases)
  3. Supports different query types (Leaf, Compound, Fuzzy, Regexp, to name a few)

If you're working on a simple project where speed isn't important you should opt for PostgreSQL. If performance is important and you want to write complex lookups opt for Elasticsearch.

For more on full-text search with Django and Postgres, check out the Basic and Full-text Search with Django and Postgres article.

Project Setup

We'll be building a simple blog application. Our project will consist of multiple models, which will be serialized and served via Django REST Framework. After integrating Elasticsearch, we'll create an endpoint that will allow us to look up different authors, categories, and articles.

To keep our code clean and modular, we'll split our project into the following two apps:

  1. blog - for our Django models, serializers, and ViewSets
  2. search - for Elasticsearch documents, indexes, and queries

Start by creating a new directory and setting up a new Django project:

$ mkdir django-drf-elasticsearch && cd django-drf-elasticsearch
$ python3.9 -m venv env
$ source env/bin/activate

(env)$ pip install django==3.2.6
(env)$ django-admin.py startproject core .

After that, create a new app called blog:

(env)$ python manage.py startapp blog

Register the app in core/settings.py under INSTALLED_APPS:

# core/settings.py

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'blog.apps.BlogConfig', # new
]

Database Models

Next, create Category and Article models in blog/models.py:

# blog/models.py

from django.contrib.auth.models import User
from django.db import models


class Category(models.Model):
    name = models.CharField(max_length=32)
    description = models.TextField(null=True, blank=True)

    class Meta:
        verbose_name_plural = 'categories'

    def __str__(self):
        return f'{self.name}'


ARTICLE_TYPES = [
    ('UN', 'Unspecified'),
    ('TU', 'Tutorial'),
    ('RS', 'Research'),
    ('RW', 'Review'),
]


class Article(models.Model):
    title = models.CharField(max_length=256)
    author = models.ForeignKey(to=User, on_delete=models.CASCADE)
    type = models.CharField(max_length=2, choices=ARTICLE_TYPES, default='UN')
    categories = models.ManyToManyField(to=Category, blank=True, related_name='categories')
    content = models.TextField()
    created_datetime = models.DateTimeField(auto_now_add=True)
    updated_datetime = models.DateTimeField(auto_now=True)

    def __str__(self):
        return f'{self.author}: {self.title} ({self.created_datetime.date()})'

Notes:

  1. Category represents an article category -- i.e, programming, Linux, testing.
  2. Article represents an individual article. Each article can have multiple categories. Articles have a specific type -- Tutorial, Research, Review, or Unspecified.
  3. Authors are represented by the default Django user model.

Run Migrations

Make migrations and then apply them:

(env)$ python manage.py makemigrations
(env)$ python manage.py migrate

Register the models in blog/admin.py:

# blog/admin.py

from django.contrib import admin

from blog.models import Category, Article


admin.site.register(Category)
admin.site.register(Article)

Populate the Database

Before moving to the next step, we need some data to work with. I've created a simple command we can use to populate the database.

Create a new folder in "blog" called "management", and then inside that folder create another folder called "commands". Inside of the "commands" folder, create a new file called populate_db.py.

management
└── commands
    └── populate_db.py

Copy the file contents from populate_db.py and paste it inside your populate_db.py.

Run the following command to populate the DB:

(env)$ python manage.py populate_db

If everything went well you should see a Successfully populated the database. message in the console and there should be a few articles in your database.

Django REST Framework

Now let's install djangorestframework using pip:

(env)$ pip install djangorestframework==3.12.4

Register it in our settings.py like so:

# core/settings.py

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'blog.apps.BlogConfig',
    'rest_framework', # new
]

Add the following settings:

# core/settings.py

REST_FRAMEWORK = {
    'DEFAULT_PAGINATION_CLASS': 'rest_framework.pagination.LimitOffsetPagination',
    'PAGE_SIZE': 25
}

We'll need these settings to implement pagination.

Create Serializers

To serialize our Django models, we need to create a serializer for each of them. The easiest way to create serializers that depend on Django models is by using the ModelSerializer class.

blog/serializers.py:

# blog/serializers.py

from django.contrib.auth.models import User
from rest_framework import serializers

from blog.models import Article, Category


class UserSerializer(serializers.ModelSerializer):
    class Meta:
        model = User
        fields = ('id', 'username', 'first_name', 'last_name')


class CategorySerializer(serializers.ModelSerializer):
    class Meta:
        model = Category
        fields = '__all__'


class ArticleSerializer(serializers.ModelSerializer):
    author = UserSerializer()
    categories = CategorySerializer(many=True)

    class Meta:
        model = Article
        fields = '__all__'

Notes:

  1. UserSerializer and CategorySerializer are fairly simple: We just provided the fields we want serialized.
  2. In the ArticleSerializer, we needed to take care of the relationships to make sure they also get serialized. This is why we provided UserSerializer and CategorySerializer.

Want to learn more about DRF serializers? Check out Effectively Using Django REST Framework Serializers.

Create ViewSets

Let's create a ViewSet for each of our models in blog/views.py:

# blog/views.py

from django.contrib.auth.models import User
from rest_framework import viewsets

from blog.models import Category, Article
from blog.serializers import CategorySerializer, ArticleSerializer, UserSerializer


class UserViewSet(viewsets.ModelViewSet):
    serializer_class = UserSerializer
    queryset = User.objects.all()


class CategoryViewSet(viewsets.ModelViewSet):
    serializer_class = CategorySerializer
    queryset = Category.objects.all()


class ArticleViewSet(viewsets.ModelViewSet):
    serializer_class = ArticleSerializer
    queryset = Article.objects.all()

In this block of code, we created the ViewSets by providing the serializer_class and queryset for each ViewSet.

Define URLs

Create the app-level URLs for the ViewSets:

# blog/urls.py

from django.urls import path, include
from rest_framework import routers

from blog.views import UserViewSet, CategoryViewSet, ArticleViewSet

router = routers.DefaultRouter()
router.register(r'user', UserViewSet)
router.register(r'category', CategoryViewSet)
router.register(r'article', ArticleViewSet)

urlpatterns = [
    path('', include(router.urls)),
]

Then, wire up the app URLs to the project URLs:

# core/urls.py

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path('blog/', include('blog.urls')),
    path('admin/', admin.site.urls),
]

Our app now has the following URLs:

  1. /blog/user/ lists all users
  2. /blog/user/<USER_ID>/ fetches a specific user
  3. /blog/category/ lists all categories
  4. /blog/category/<CATEGORY_ID>/ fetches a specific category
  5. /blog/article/ lists all articles
  6. /blog/article/<ARTICLE_ID>/ fetches a specific article

Testing

Now that we've registered the URLs, we can test the endpoints to see if everything works correctly.

Run the development server:

(env)$ python manage.py runserver

Then, in your browser of choice, navigate to http://127.0.0.1:8000/blog/article/. The response should look something like this:

{
    "count": 4,
    "next": null,
    "previous": null,
    "results": [
        {
            "id": 1,
            "author": {
                "id": 3,
                "username": "jess_",
                "first_name": "Jess",
                "last_name": "Brown"
            },
            "categories": [
                {
                    "id": 2,
                    "name": "SEO optimization",
                    "description": null
                }
            ],
            "title": "How to improve your Google rating?",
            "type": "TU",
            "content": "Firstly, add the correct SEO tags...",
            "created_datetime": "2021-08-12T17:34:31.271610Z",
            "updated_datetime": "2021-08-12T17:34:31.322165Z"
        },
        {
            "id": 2,
            "author": {
                "id": 4,
                "username": "johnny",
                "first_name": "Johnny",
                "last_name": "Davis"
            },
            "categories": [
                {
                    "id": 4,
                    "name": "Programming",
                    "description": null
                }
            ],
            "title": "Installing latest version of Ubuntu",
            "type": "TU",
            "content": "In this tutorial, we'll take a look at how to setup the latest version of Ubuntu. Ubuntu (/ʊˈbʊntuː/ is a Linux distribution based on Debian and composed mostly of free and open-source software. Ubuntu is officially released in three editions: Desktop, Server, and Core for Internet of things devices and robots.",
            "created_datetime": "2021-08-12T17:34:31.540628Z",
            "updated_datetime": "2021-08-12T17:34:31.592555Z"
        },
        ...
    ]
}

Manually test the other endpoints as well.

Elasticsearch Setup

Start by installing and running Elasticsearch in the background.

Need help getting Elasticsearch up and running? Check out the Installing Elasticsearch guide. If you're familiar with Docker, you can simply run the following command to pull the official image and spin up a container with Elasticsearch running:

$ docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.14.0

To integrate Elasticsearch with Django, we need to install the following packages:

  1. elasticsearch - official low-level Python client for Elasticsearch
  2. elasticsearch-dsl-py - high-level library for writing and running queries against Elasticsearch
  3. django-elasticsearch-dsl - wrapper around elasticsearch-dsl-py that allows indexing Django models in Elasticsearch

Install:

(env)$ pip install elasticsearch==7.14.0
(env)$ pip install elasticsearch-dsl==7.4.0
(env)$ pip install django-elasticsearch-dsl==7.2.0

Start a new app called search, which will hold our Elasticsearch documents, indexes, and queries:

(env)$ python manage.py startapp search

Register the search and django_elasticsearch_dsl in core/settings.py under INSTALLED_APPS:

# core/settings.py

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'django_elasticsearch_dsl', # new
    'blog.apps.BlogConfig',
    'search.apps.SearchConfig', # new
    'rest_framework',
]

Now we need to let Django know where Elasticsearch is running. We do that by adding the following to our core/settings.py file:

# core/settings.py

# Elasticsearch
# https://django-elasticsearch-dsl.readthedocs.io/en/latest/settings.html

ELASTICSEARCH_DSL = {
    'default': {
        'hosts': 'localhost:9200'
    },
}

If your Elasticsearch is running on a different port, make sure to change the above settings accordingly.

We can test if Django can connect to the Elasticsearch by starting our server:

(env)$ python manage.py runserver

If your Django server fails, Elasticsearch is probably not working correctly.

Creating Documents

Before creating the documents, we need to make sure all the data is going to get saved in the proper format. We're using CharField(max_length=2) for our article type, which by itself doesn't make much sense. This is why we'll transform it to human-readable text.

We'll achieve this by adding a type_to_string() method inside our model like so:

# blog/models.py

class Article(models.Model):
    title = models.CharField(max_length=256)
    author = models.ForeignKey(to=User, on_delete=models.CASCADE)
    type = models.CharField(max_length=2, choices=ARTICLE_TYPES, default='UN')
    categories = models.ManyToManyField(to=Category, blank=True, related_name='categories')
    content = models.TextField()
    created_datetime = models.DateTimeField(auto_now_add=True)
    updated_datetime = models.DateTimeField(auto_now=True)

    # new
    def type_to_string(self):
        if self.type == 'UN':
            return 'Unspecified'
        elif self.type == 'TU':
            return 'Tutorial'
        elif self.type == 'RS':
            return 'Research'
        elif self.type == 'RW':
            return 'Review'

    def __str__(self):
        return f'{self.author}: {self.title} ({self.created_datetime.date()})'

Without type_to_string() our model would be serialized like this:

{
    "title": "This is my article.",
    "type": "TU",
    ...
}

After implementing type_to_string() our model is serialized like this:

{
    "title": "This is my article.",
    "type": "Tutorial",
    ...
}

Now let's create the documents. Each document needs to have an Index and Django class. In the Index class, we need to provide the index name and Elasticsearch index settings. In the Django class, we tell the document which Django model to associate it to and provide the fields we want to be indexed.

blog/documents.py:

# blog/documents.py

from django.contrib.auth.models import User
from django_elasticsearch_dsl import Document, fields
from django_elasticsearch_dsl.registries import registry

from blog.models import Category, Article


@registry.register_document
class UserDocument(Document):
    class Index:
        name = 'users'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
        }

    class Django:
        model = User
        fields = [
            'id',
            'first_name',
            'last_name',
            'username',
        ]


@registry.register_document
class CategoryDocument(Document):
    id = fields.IntegerField()

    class Index:
        name = 'categories'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
        }

    class Django:
        model = Category
        fields = [
            'name',
            'description',
        ]


@registry.register_document
class ArticleDocument(Document):
    author = fields.ObjectField(properties={
        'id': fields.IntegerField(),
        'first_name': fields.TextField(),
        'last_name': fields.TextField(),
        'username': fields.TextField(),
    })
    categories = fields.ObjectField(properties={
        'id': fields.IntegerField(),
        'name': fields.TextField(),
        'description': fields.TextField(),
    })
    type = fields.TextField(attr='type_to_string')

    class Index:
        name = 'articles'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
        }

    class Django:
        model = Article
        fields = [
            'title',
            'content',
            'created_datetime',
            'updated_datetime',
        ]

Notes:

  1. In order to transform the article type, we added the type attribute to the ArticleDocument.
  2. Because our Article model is in a many-to-many (M:N) relationship with Category and a many-to-one (N:1) relationship with User we needed to take care of the relationships. We did that by adding ObjectField attributes.

Populate Elasticsearch

To create and populate the Elasticsearch index and mapping, use the search_index command:

(env)$ python manage.py search_index --rebuild

Deleting index 'users'
Deleting index 'categories'
Deleting index 'articles'
Creating index 'users'
Creating index 'categories'
Creating index 'articles'
Indexing 3 'User' objects
Indexing 4 'Article' objects
Indexing 4 'Category' objects

You need to run this command every time you change your index settings.

django-elasticsearch-dsl created the appropriate database signals so that your Elasticsearch storage gets updated every time an instance of a model is created, deleted, or edited.

Elasticsearch Queries

Before creating the appropriate views, let's look at how Elasticsearch queries work.

We first have to obtain the Search instance. We do that by calling search() on our Document like so:

from blog.documents import ArticleDocument

search = ArticleDocument.search()

Feel free to run these queries within the Django shell.

Once we have the Search instance we can pass queries to the query() method and fetch the response:

from elasticsearch_dsl import Q
from blog.documents import ArticleDocument


# Looks up all the articles that contain `How to` in the title.
query = 'How to'
q = Q(
     'multi_match',
     query=query,
     fields=[
         'title'
     ])
search = ArticleDocument.search().query(q)
response = search.execute()

# print all the hits
for hit in search:
    print(hit.title)

We can also combine multiple Q statements like so:

from elasticsearch_dsl import Q
from blog.documents import ArticleDocument

"""
Looks up all the articles that:
1) Contain 'language' in the 'title'
2) Don't contain 'ruby' or 'javascript' in the 'title'
3) And contain the query either in the 'title' or 'description'
"""
query = 'programming'
q = Q(
     'bool',
     must=[
         Q('match', title='language'),
     ],
     must_not=[
         Q('match', title='ruby'),
         Q('match', title='javascript'),
     ],
     should=[
         Q('match', title=query),
         Q('match', description=query),
     ],
     minimum_should_match=1)
search = ArticleDocument.search().query(q)
response = search.execute()

# print all the hits
for hit in search:
    print(hit.title)

Another important thing when working with Elasticsearch queries is fuzziness. Fuzzy queries are queries that allow us to handle typos. They use the Levenshtein Distance Algorithm which calculates the distance between the result in our database and the query.

Let's look at an example.

By running the following query we won't get any results, because the user misspelled 'django'.

from elasticsearch_dsl import Q
from blog.documents import ArticleDocument

query = 'djengo'  # notice the typo
q = Q(
     'multi_match',
     query=query,
     fields=[
         'title'
     ])
search = ArticleDocument.search().query(q)
response = search.execute()

# print all the hits
for hit in search:
    print(hit.title)

If we enable fuzziness like so:

from elasticsearch_dsl import Q
from blog.documents import ArticleDocument

query = 'djengo'  # notice the typo
q = Q(
     'multi_match',
     query=query,
     fields=[
         'title'
     ],
     fuzziness='auto')
search = ArticleDocument.search().query(q)
response = search.execute()

# print all the hits
for hit in search:
    print(hit.title)

The user will get the correct result.

The difference between a full-text search and exact match is that full-text search runs an analyzer on the text before it gets indexed to Elasticsearch. The text gets broken down into different tokens, which are transformed to their root form (e.g., reading -> read). These tokens then get saved into the Inverted Index. Because of that, full-text search yields more results, but takes longer to process.

Elasticsearch has a number of additional features. To get familiar with the API, try implementing:

  1. Your own analyzer.
  2. Completion suggester - when a user queries 'j' your app should suggest 'johhny' or 'jess_'.
  3. Highlighting - when user makes a typo, highlight it (e.g., Linuks -> Linux).

You can see all the Elasticsearch Search APIs here.

Search Views

With that, let's create sime views. To make our code more DRY we can use the following abstract class in search/views.py:

# search/views.py

import abc

from django.http import HttpResponse
from elasticsearch_dsl import Q
from rest_framework.pagination import LimitOffsetPagination
from rest_framework.views import APIView


class PaginatedElasticSearchAPIView(APIView, LimitOffsetPagination):
    serializer_class = None
    document_class = None

    @abc.abstractmethod
    def generate_q_expression(self, query):
        """This method should be overridden
        and return a Q() expression."""

    def get(self, request, query):
        try:
            q = self.generate_q_expression(query)
            search = self.document_class.search().query(q)
            response = search.execute()

            print(f'Found {response.hits.total.value} hit(s) for query: "{query}"')

            results = self.paginate_queryset(response, request, view=self)
            serializer = self.serializer_class(results, many=True)
            return self.get_paginated_response(serializer.data)
        except Exception as e:
            return HttpResponse(e, status=500)

Notes:

  1. To use the class, we have to provide our serializer_class and document_class and override generate_q_expression().
  2. The class does nothing else than run the generate_q_expression() query, fetch the response, paginate it, and return serialized data.

All the views should now inherit from PaginatedElasticSearchAPIView:

# search/views.py

import abc

from django.http import HttpResponse
from elasticsearch_dsl import Q
from rest_framework.pagination import LimitOffsetPagination
from rest_framework.views import APIView

from blog.documents import ArticleDocument, UserDocument, CategoryDocument
from blog.serializers import ArticleSerializer, UserSerializer, CategorySerializer


class PaginatedElasticSearchAPIView(APIView, LimitOffsetPagination):
    serializer_class = None
    document_class = None

    @abc.abstractmethod
    def generate_q_expression(self, query):
        """This method should be overridden
        and return a Q() expression."""

    def get(self, request, query):
        try:
            q = self.generate_q_expression(query)
            search = self.document_class.search().query(q)
            response = search.execute()

            print(f'Found {response.hits.total.value} hit(s) for query: "{query}"')

            results = self.paginate_queryset(response, request, view=self)
            serializer = self.serializer_class(results, many=True)
            return self.get_paginated_response(serializer.data)
        except Exception as e:
            return HttpResponse(e, status=500)


# views


class SearchUsers(PaginatedElasticSearchAPIView):
    serializer_class = UserSerializer
    document_class = UserDocument

    def generate_q_expression(self, query):
        return Q('bool',
                 should=[
                     Q('match', username=query),
                     Q('match', first_name=query),
                     Q('match', last_name=query),
                 ], minimum_should_match=1)


class SearchCategories(PaginatedElasticSearchAPIView):
    serializer_class = CategorySerializer
    document_class = CategoryDocument

    def generate_q_expression(self, query):
        return Q(
                'multi_match', query=query,
                fields=[
                    'name',
                    'description',
                ], fuzziness='auto')


class SearchArticles(PaginatedElasticSearchAPIView):
    serializer_class = ArticleSerializer
    document_class = ArticleDocument

    def generate_q_expression(self, query):
        return Q(
                'multi_match', query=query,
                fields=[
                    'title',
                    'author',
                    'type',
                    'content'
                ], fuzziness='auto')

Define URLs

Lastly, let's create the URLs for our views:

# search.urls.py

from django.urls import path

from search.views import SearchArticles, SearchCategories, SearchUsers

urlpatterns = [
    path('user/<str:query>/', SearchUsers.as_view()),
    path('category/<str:query>/', SearchCategories.as_view()),
    path('article/<str:query>/', SearchArticles.as_view()),
]

Then, wire up the app URLs to the project URLs:

# core/urls.py

from django.contrib import admin
from django.urls import path, include

urlpatterns = [
    path('blog/', include('blog.urls')),
    path('search/', include('search.urls')), # new
    path('admin/', admin.site.urls),
]

Testing

Our web application is done. We can test our search endpoints by visiting the following URLs:

URLDescription
http://127.0.0.1:8000/search/user/mike/Returns user 'mike13'
http://127.0.0.1:8000/search/user/jess_/Returns user 'jess_'
http://127.0.0.1:8000/search/category/seo/Returns category 'SEO optimization'
http://127.0.0.1:8000/search/category/progreming/Returns category 'Programming'
http://127.0.0.1:8000/search/article/linux/Returns article 'Installing the latest version of Ubuntu'
http://127.0.0.1:8000/search/article/java/Returns article 'Which programming language is the best?'

Notice the typo with the fourth request. We spelled 'progreming', but still got the correct result thanks to fuzziness.

Alternative Libraries

The path we took isn't the only way to integrate Django with Elasticsearch. There are a few other libraries you might want to check out:

  1. django-elasicsearch-dsl-drf is a wrapper around Elasticsearch and Django REST Framework. It provides views, serializers, filter backends, pagination and more. It works well, but it might be overkill for smaller projects. I'd recommend using it if you need advanced Elasticsearch features.
  2. Haystack is a wrapper for a number of search backends, like Elasticsearch, Solr, and Whoosh. It allows you to write your search code once and reuse it with different search backends. It works great for implementing a simple search box. Because Haystack is another abstraction layer, there's more overhead involved so you shouldn't use it if performance is really important or if you're working with big amounts of data. It also requires some configuration.
  3. Haystack for Django REST Framework is a small library which tries to simplify integration of Haystack with Django REST Framework. At the time of writing, the project is a bit outdated and their documentation is badly written. I've spent a decent amount of time trying to get it to work with no luck.

Conclusion

In this tutorial, you learned the basics of working with Django REST Framework and Elasticsearch. You now know how to integrate them, create Elasticsearch documents and queries, and serve the data via a RESTful API.

Before launching your project in production, consider using one of the managed Elasticsearch services like Elastic Cloud, Amazon Elasticsearch Service, or Elastic on Azure. The cost of using a managed service will be higher than managing your own cluster, but they provide all of the infrastructure required for deploying, securing, and running Elasticsearch clusters. Plus, they'll handle version updates, regular backups, and scaling.

Grab the code from django-drf-elasticsearch repo on GitHub.

Original article source at: https://testdriven.io/

#django #rest #framework #elasticsearch 

How to integrate Django REST Framework with Elasticsearch

Implement CORS in Spring Boot Application

This CORS Interview Questions and Answer covers CORS policy And how to implement it in the spring Boot application. Why CORS is important And how it works is also explained here.

What is CORS
CORS is Cross Origin Resource sharing.

Its a violation of Same origin policy

What is same-origin policy (SOP)

The same-origin policy (SOP) is a web security mechanism built into web browsers that influences how websites can access one another. The concept of same-origin policy was introduced by Netscape Navigator 2.02 in 1995,

Without SOP, a malicious website or web application could access another without restrictions. That would allow attackers to easily steal sensitive information from other websites or even perform actions on other sites without user consent.

SOP does not need to be turned on – it is automatically enabled in every browser that supports it.

The SOP mechanism was designed to protect against attacks such as cross-site request forgery (CSRF), which basically attempt to take advantage of vulnerabilities due to differing origins

What is Origin

Two URLs have the same origin if the protocol, port (if specified), and host are the same for both.

Eg : http://x.y.com/z/page.html

Why we Backend developers need CORS if its implemented at browser level?

Though we are just the backend developers working on backend application but we do have Front end application too .

Now Backend and Front end urls are usually  
http://localhost:8080/rest/api
http://localhost:4200/xyz


We can clearly see difference in origin. Hence when ever Front end tries to call ur rest APi, It fails due to SOP policy saying No “Access-Control-Allow-Origin” header is present on requested resource. Origin 4200 is not allowed access.

Solution is CORS.

Use @CrossOrigin annotation or send Access-Control-Allow-Origin: http://localhost:4200 in your header to allow it


#cors #springboot #elasticsearch  #interviewquestions 

Implement CORS in Spring Boot Application

Elastica: A PHP Client for Elasticsearch

Elastica: elasticsearch PHP Client

All documentation for Elastica can be found under Elastica.io. If you have questions, don't hesitate to ask them on Stack Overflow and add the Tag "Elastica" or in our Gitter channel. All library issues should go to the issue tracker from GitHub.

Compatibility

This release is compatible with all Elasticsearch 7.0 releases and onwards.

The testsuite is run against the most recent minor version of Elasticsearch, currently 7.14.1.

Contributing

Contributions are always welcome. For details on how to contribute, check the CONTRIBUTING file.

Versions & Dependencies

This project tries to follow Elasticsearch in terms of End of Life and maintenance since 5.x. It is generally recommended to use the latest point release of the relevant branch.

Elastica branchElasticSearchelasticsearch-phpPHP
7.x7.x^7.0^7.2 || ^8.0
6.x6.x^6.0^7.0 || ^8.0

Unmaintained versions:

Elastica versionElasticSearchelasticsearch-phpPHP
5.x5.x^5.0>=5.6
3.x2.4.0no>=5.4
2.x1.7.2no>=5.3.3

Download Details:

Author: ruflin
Source Code: https://github.com/ruflin/Elastica 
License: MIT license

#php #elasticsearch #hacktoberfest 

Elastica: A PHP Client for Elasticsearch
Sean Robertson

Sean Robertson

1666234687

Learn Elasticsearch from Scratch

Getting Started with Elasticsearch course will help you learn the basics of Elasticsearch from scratch. 

From real-time search and event management to sophisticated analytics and logging at scale, Elasticsearch has a great number of uses. Getting Started with Elasticsearch course will help you learn the basics of Elasticsearch. If you already have a knowledge of Relational Databases and you are eager to learn Elasticsearch, then this course is for you. You will end your journey as a Elasticsearch Padawan.

You will begin learning Elasticsearch with a gentle introduction where you can setup your environment and launch your node of Elasticsearch for the first time. After that, we will dive into Create/Read/Update/Delete operations where you will grasp basics of Elasticsearch. All lectures are up to date with Elasticsearch 2.0.

What you’ll learn:

  •        Install Elasticsearch and start a new node
  •        Install Head and Marvel plugins
  •        Do create, read, update and delete operations on Elasticsearch
  •        Become an Elasticsearch padawan

Are there any course requirements or prerequisites?

  •        Beginner level knowledge in relational databases is needed

Who this course is for:

  •        Getting Started with Elasticsearch course is for everyone with motivation to learn basics of Elasticsearch. The only skill you will need is a basic understanding of relational databases.
  •        No computer science degree, or a programming knowledge is needed.

#elasticsearch #programming 

Learn Elasticsearch from Scratch

Elasticsearch-php: Official PHP Client for Elasticsearch

Elasticsearch PHP client 

This is the official PHP client for Elasticsearch.


Getting started 🐣

Using this client assumes that you have an Elasticsearch server installed and running.

You can install the client in your PHP project using composer:

composer require elasticsearch/elasticsearch

After the installation you can connect to Elasticsearch using the ClientBuilder class. For instance, if your Elasticsearch is running on localhost:9200 you can use the following code:


use Elastic\Elasticsearch\ClientBuilder;

$client = ClientBuilder::create()
    ->setHosts(['localhost:9200'])
    ->build();

// Info API
$response = $client->info();

echo $response['version']['number']; // 8.0.0

The $response is an object of Elastic\Elasticsearch\Response\Elasticsearch class that implements ElasticsearchInterface, PSR-7 ResponseInterface and ArrayAccess.

This means the $response is a PSR-7 object:

echo $response->getStatusCode(); // 200
echo (string) $response->getBody(); // Response body in JSON

and also an "array", meaning you can access the response body as an associative array, as follows:

echo $response['version']['number']; // 8.0.0

var_dump($response->asArray());  // response body content as array

Moreover, you can access the response body as object, string or bool:

echo $response->version->number; // 8.0.0

var_dump($response->asObject()); // response body content as object
var_dump($response->asString()); // response body as string (JSON)
var_dump($response->asBool());   // true if HTTP response code between 200 and 300

Configuration

Elasticsearch 8.0 offers security by default, that means it uses TLS for protect the communication between client and server.

In order to configure elasticsearch-php for connecting to Elasticsearch 8.0 we need to have the certificate authority file (CA).

You can install Elasticsearch in different ways, for instance using Docker you need to execute the followind command:

docker pull docker.elastic.co/elasticsearch/elasticsearch:8.0.1

Once you have the docker image installed, you can execute Elasticsearch, for instance using a single-node cluster configuration, as follows:

docker network create elastic
docker run --name es01 --net elastic -p 9200:9200 -p 9300:9300 -it docker.elastic.co/elasticsearch/elasticsearch:8.0.1

This command creates an elastic Docker network and start Elasticsearch using the port 9200 (default).

When you run the docker image a password is generated for the elastic user and it's printed to the terminal (you might need to scroll back a bit in the terminal to view it). You have to copy it since we will need to connect to Elasticsearch.

Now that Elasticsearch is running we can get the http_ca.crt file certificate. We need to copy it from the docker instance, using the following command:

docker cp es01:/usr/share/elasticsearch/config/certs/http_ca.crt .

Once we have the http_ca.crt certificate and the password, copied during the start of Elasticsearch, we can use it to connect with elasticsearch-php as follows:

$client = ClientBuilder::create()
    ->setHosts(['https://localhost:9200'])
    ->setBasicAuthentication('elastic', 'password copied during Elasticsearch start')
    ->setCABundle('path/to/http_ca.crt')
    ->build();

For more information about the Docker configuration of Elasticsearch you can read the official documentation here.

Use Elastic Cloud

You can use Elastic Cloud as server with elasticsearch-php. Elastic Cloud is the PaaS solution offered by Elastic.

For connecting to Elastic Cloud you just need the Cloud ID and the API key.

You can get the Cloud ID from the My deployment page of your dashboard (see the red rectangle reported in the screenshot).

Cloud ID

You can generate an API key in the Management page under the section Security.

Security

When you click on Create API key button you can choose a name and set the other options (for example, restrict privileges, expire after time, and so on).

Choose an API name

After this step you will get the API keyin the API keys page.

API key

IMPORTANT: you need to copy and store the API keyin a secure place, since you will not be able to view it again in Elastic Cloud.

Once you have collected the Cloud ID and the API key, you can use elasticsearch-php to connect to your Elastic Cloud instance, as follows:

$client = ClientBuilder::create()
    ->setElasticCloudId('insert here the Cloud ID')
    ->setApiKey('insert here the API key')
    ->build();

Usage

The elasticsearch-php client offers 400+ endpoints for interacting with Elasticsearch. A list of all these endpoints is available in the official documentation of Elasticsearch APIs.

Here we reported the basic operation that you can perform with the client: index, search and delete.

Index a document

You can store (index) a JSON document in Elasticsearch using the following code:

use Elastic\Elasticsearch\Exception\ClientResponseException;
use Elastic\Elasticsearch\Exception\ServerResponseException;

$params = [
    'index' => 'my_index',
    'body'  => [ 'testField' => 'abc']
];

try {
  $response = $client->index($params);
} catch (ClientResponseException $e) {
  // manage the 4xx error
} catch (ServerResponseException $e) {
  // manage the 5xx error
} catch (Exception $e) {
  // eg. network error like NoNodeAvailableException
}

print_r($response->asArray());  // response body content as array

Elasticsearch stores the {"testField":"abc"} JSON document in the my_index index. The ID of the document is created automatically by Elasticsearch and stored in $response['_id'] field value. If you want to specify an ID for the document you need to store it in $params['id'].

You can manage errors using ClientResponseException and ServerResponseException. The PSR-7 response is available using $e->getResponse() and the HTTP status code is available using $e->getCode().

Search a document

Elasticsearch provides many different way to search documents. The simplest search that you can perform is a match query, as follows:

$params = [
    'index' => 'my_index',
    'body'  => [
        'query' => [
            'match' => [
                'testField' => 'abc'
            ]
        ]
    ]
];
$response = $client->search($params);

printf("Total docs: %d\n", $response['hits']['total']['value']);
printf("Max score : %.4f\n", $response['hits']['max_score']);
printf("Took      : %d ms\n", $response['took']);

print_r($response['hits']['hits']); // documents

Using Elasticsearch you can perform different query search, for more information we suggest toread the official documention reported here.

Delete a document

You can delete a document specifing the index name and the ID of the document, as follows:

use Elastic\Elasticsearch\Exception\ClientResponseException;

try {
    $response = $client->delete([
        'index' => 'my_index',
        'id' => 'my_id'
    ]);
} catch (ClientResponseException $e) {
    if ($e->getCode() === 404) {
        // the document does not exist
    }
}
if ($response['acknowledge'] === 1) {
    // the document has been delete
}

For more information about the Elasticsearch REST API you can read the official documentation here.

Versioning

This client is versioned and released alongside Elasticsearch server.

To guarantee compatibility, use the most recent version of this library within the major version of the corresponding Enterprise Search implementation.

For example, for Elasticsearch 7.16, use 7.16 of this library or above, but not 8.0.

Backward Incompatible Changes :boom:

The 8.0.0 version of elasticsearch-php contains a new implementation compared with 7.x. It supports PSR-7 for HTTP messages and PSR-18 for HTTP client communications.

We tried to reduce the BC breaks as much as possible with 7.x but there are some (big) differences:

  • we changed the namespace, now everything is under Elastic\Elasticsearch
  • we used the elastic-transport-php library for HTTP communications;
  • we changed the Exception model, using the namespace Elastic\Elasticsearch\Exception. All the exceptions extends the ElasticsearchException interface, as in 7.x
  • we changed the response type of each endpoints using an Elasticsearch response class. This class wraps a a PSR-7 response allowing the access of the body response as array or object. This means you can access the API response as in 7.x, no BC break here! :angel:
  • we changed the ConnectionPool in NodePool. The connection naming was ambigous since the objects are nodes (hosts)

You can have a look at the BREAKING_CHANGES file for more information.

Mock the Elasticsearch client

If you need to mock the Elasticsearch client you just need to mock a PSR-18 HTTP Client.

For instance, you can use the php-http/mock-client as follows:

use Elastic\Elasticsearch\ClientBuilder;
use Elastic\Elasticsearch\Response\Elasticsearch;
use Http\Mock\Client;
use Nyholm\Psr7\Response;

$mock = new Client(); // This is the mock client

$client = ClientBuilder::create()
    ->setHttpClient($mock)
    ->build();

// This is a PSR-7 response
$response = new Response(
    200, 
    [Elasticsearch::HEADER_CHECK => Elasticsearch::PRODUCT_NAME],
    'This is the body!'
);
$mock->addResponse($response);

$result = $client->info(); // Just calling an Elasticsearch endpoint

echo $result->asString(); // This is the body!

We are using the ClientBuilder::setHttpClient() to set the mock client. You can specify the response that you want to have using the addResponse($response) function. As you can see the $response is a PSR-7 response object. In this example we used the Nyholm\Psr7\Response object from the nyholm/psr7 project. If you are using PHPUnit you can even mock the ResponseInterface as follows:

$response = $this->createMock('Psr\Http\Message\ResponseInterface');

Notice: we added a special header in the HTTP response. This is the product check header, and it is required for guarantee that elasticsearch-php is communicating with an Elasticsearch server 8.0+.

For more information you can read the Mock client section of PHP-HTTP documentation.

FAQ 🔮

Where do I report issues with the client?

If something is not working as expected, please open an issue.

Where else can I go to get help?

You can checkout the Elastic community discuss forums.

Contribute 🚀

We welcome contributors to the project. Before you begin, some useful info...

  • If you want to contribute to this project you need to subscribe to a Contributor Agreement.
  • Before opening a pull request, please create an issue to discuss the scope of your proposal.
  • If you want to send a PR for version 8.0 please use the 8.0 branch, for 8.1 use the 8.1 branch and so on.
  • Never send PR to master unless you want to contribute to the development version of the client (master represents the next major version).
  • Each PR should include a unit test using PHPUnit. If you are not familiar with PHPUnit you can have a look at the reference.

Thanks in advance for your contribution! :heart:

Download Details:

Author: Elastic
Source Code: https://github.com/elastic/elasticsearch-php 
License: MIT license

#php #elasticsearch #client 

Elasticsearch-php: Official PHP Client for Elasticsearch

7 Favorite Node.js ElasticSearch Query Builder Libraries

In today's post we will learn about 7 Favorite Node.js ElasticSearch Query Builder Libraries.

Elasticsearch query body builder is a query DSL (domain-specific language) or client that provides an API layer over raw Elasticsearch queries. It makes full-text search data querying and complex data aggregation easier, more convenient, and cleaner in terms of syntax.

1 - Bodybuilder

An elasticsearch query body builder 

Install

npm install bodybuilder --save

Usage

var bodybuilder = require('bodybuilder')
var body = bodybuilder().query('match', 'message', 'this is a test')
body.build() // Build 2.x or greater DSL (default)
body.build('v1') // Build 1.x DSL

For each elasticsearch query body, create an instance of bodybuilder, apply the desired query/filter/aggregation clauses, and call build to retrieve the built query body.

2 - ES-alchemy

Simplification of Elasticsearch interactions

Install

npm i --save es-alchemy

Model and Index Definitions

Models

Models definitions contain the fields of a model and their types. They restrict how an index can be put together.

Example: address.json

{
  "fields": {
    "id": "uuid",
    "street": "string",
    "city": "string",
    "country": "string",
    "centre": "point",
    "area": "shape",
    "timezone": "string"
  }
}

Preferably a folder models contains a json file for each model. An example can be found in the test folder.

Fields that can be used and how they get mapped in Opensearch can be found here.

View on Github

3 - Elastic-builder

A Node.js implementation of the elasticsearch Query DSL

Install

npm install elastic-builder --save

Usage

const esb = require('elastic-builder'); // the builder

const requestBody = esb.requestBodySearch()
  .query(esb.matchQuery('message', 'this is a test'));

// OR

const requestBody = new esb.RequestBodySearch().query(
  new esb.MatchQuery('message', 'this is a test')
);

requestBody.toJSON(); // or print to console - esb.prettyPrint(requestBody)
{
  "query": {
    "match": {
      "message": "this is a test"
    }
  }
}

For each class, MyClass, a utility function myClass has been provided which constructs the object for us without the need for new keyword.

View on Github

4 - Pelias Query

Geospatial queries used by the pelias api

Installation

$ npm install pelias-query

Variables

Variables are used as placeholders in order to pre-build queries before we know the final values which will be provided by the user.

note: Variables can only be Javascript primitive types: string or numeric or boolean, plus array. No objects allowed.

VariableStore API

var query = require('pelias-query');

// create a new variable store
var vs = new query.Vars();

// set a variable
vs.var('input:name', 'hackney city farm');

// or
vs.var('input:name').set('hackney city farm');

// get a variable
var a = vs.var('input:name');

// get the primitive value of a variable
var a = vs.var('input:name');
a.get(); // hackney city farm
a.toString(); // hackney city farm
a.valueOf(); // hackney city farm
a.toJSON(); // hackney city farm

// check if a variable has been set
vs.isset('input:name'); // true
vs.isset('foo'); // false

// bulk set many variables
vs.set({
  'boundary:rect:top': 1,
  'boundary:rect:right': 2,
  'boundary:rect:bottom': 2,
  'boundary:rect:left': 1
});

// export variables for debugging
var dict = vs.export();
console.log( dict );

View on Github

5 - ESQ (Elasticsearch Query)

Simple query builder for elasticsearch

Quick Example

Example

var ESQ = require('esq');
var esq = new ESQ();

esq.query('bool', ['must'], { match: { user: 'kimchy' } });
esq.query('bool', 'minimum_should_match', 1);

var query = esq.getQuery();

Generates

{
  "bool": {
    "must": [
      {
        "match": {
          "user": "kimchy"
        }
      }
    ],
    "minimum_should_match": 1
  }
}

In the browser

<script src="esq.js"></script>
<script>
  var esq = new ESQ();
  esq.query('bool', ['must'], { match: { user: 'kimchy' } });
  var query = esq.getQuery();
</script>

View on Github

6 - Elasticsearch-odm

Like Mongoose but for Elasticsearch. Define models, preform CRUD operations, and build advanced search queries.

Installation

If you currently have npm elasticsearch installed, you can remove it and access it from client in this library if you still need it.

$ npm install elasticsearch-odm

Quick Start

You'll find the API is intuitive if you've used Mongoose or Waterline.

Example (no schema):

var elasticsearch = require('elasticsearch-odm');
var Car = elasticsearch.model('Car');
var car = new Car({
  type: 'Ford', color: 'Black'
});
elasticsearch.connect('my-index').then(function(){In today's post we will learn about 7 Favorite Node.js ElasticSearch Query Builder Libraries.
  // be sure to call connect before bootstrapping your app.
  car.save().then(function(document){
    console.log(document);
  });
});

Example (using a schema):

var elasticsearch = require('elasticsearch-odm');
var carSchema = new elasticsearch.Schema({
  type: String,
  color: {type: String, required: true}
});
var Car = elasticsearch.model('Car', carSchema);

View on Github

7 - Equery

Query builder for elasticsearch (Node.js / Javascript)

Installation

$ npm install equery

Usage

Building a query

var Query = require('equery');

var query = new Query();

query.toJSON

var result = q.toJSON();

query.sort

query
    .sort('followers:desc')
    .toJSON();

View on Github

Thank you for following this article. 

Related videos:

Introduction into the JavaScript Elasticsearch Client

#node #elasticsearch #query #builder 

7 Favorite Node.js ElasticSearch Query Builder Libraries