Gerev: Google-like Search Engine for Your Organization

Search focused on devs

Devs are the best early adopters, they adopt technology early and aid in spreading it to their non-technical peers. That's why gerev is focused on making a product dev's adore and love ❤️

first image

Made for devs 👨‍💻

For finding internal pages fast ⚡️ second image

Troubleshoot Issues 🐛 fourth image

For finding code snippets and code examples 🧑‍💻
Coming Soon... third image


  •  Slack
  •  Confluence
  •  Google Drive (Docs, .docx, .pptx)
  •  Confluence Cloud - by @bryan-pakulski 🙏
  •  Bookstack - by @flifloo 🙏
  •  RocketChat (in PR 🙏)
  •  Gitlab Issues (In PR 🙏)
  •  Mattermost (In PR: 🙏)
  •  Zendesk (In PR 🙏)
  •  Notion (In Progress... 🙏)
  •  Microsoft Teams
  •  Sharepoint
  •  Jira

🙏 - by the community

Natural Langauge

Enables searching using natural language. such as "How to do X", "how to connect to Y", "Do we support Z"

Getting Started

  1. Install Nvidia for docker
  2. Run docker

Nvidia for docker

Install nvidia container toolkit on the host machine.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L | sudo apt-key add - \
   && curl -s -L$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

sudo apt-get install -y nvidia-docker2

sudo systemctl restart docker

Run docker

Then run the docker container like so:

Nvidia hardware

docker run --gpus all -p 80:80 -v ~/.gerev/storage:/opt/storage gerev/gerev

CPU only (no GPU)

docker run -p 80:80 -v ~/.gerev/storage:/opt/storage gerev/gerev

Run from source


  • gerev is also popular with some big names. 😉

first image

Find any conversation, doc, or internal page in seconds ⏲️⚡️
Join 100+ devs by hosting your own gerev instance, become a hero within your org! 💪

Join Discord for early access code!

Discord Shield

Join here!

Download Details:

Author: GerevAI
Source Code: 
License: AGPL-3.0 license

#python #docker #searchengine #ai #bert 

Gerev: Google-like Search Engine for Your Organization
Nat  Grady

Nat Grady


Google Search Engine and OpenAI’s ChatGPT

Google Search Engine and OpenAI’s ChatGPT

The world of technology has come a long way since the first search engine was introduced in the early 90s. Today, there are numerous search engines available, with Google being the most popular one. But Google’s dominance in the search engine market is not the only thing that makes it stand out. It has been constantly evolving to provide its users with the best possible experience.

On the other hand, OpenAI’s ChatGPT is a language model that uses deep learning to generate human-like responses to natural language inputs. It is not a search engine, but rather an AI language model that can be used for various applications, including chatbots and virtual assistants. Although both Google search engine and ChatGPT have different uses, they both have AI at their core, making them powerful tools in the technology landscape.

In this article, we will discuss the differences between Google search engine and OpenAI’s ChatGPT, and how they are being used in the technology world.

Google Search Engine: A Brief Overview

Google search engine is one of the most popular search engines in the world, with billions of searches being made every day. It is known for its simplicity and speed, and its ability to provide relevant results. Google’s search engine uses a complex algorithm to determine the relevance of web pages and rank them accordingly. The algorithm takes into account various factors, including keywords, relevance, and the popularity of the website. The more relevant the results, the higher they appear on the search engine results page (SERP).

Google search engine has been constantly evolving over the years to provide users with the best possible experience. One of the major changes it has undergone is the incorporation of AI into its search algorithm. This has enabled Google to provide users with more relevant and personalized results. For example, Google’s AI can understand the context of a search query and provide results that are relevant to the user’s needs. This has revolutionized the way people use search engines and has made it easier to find the information they are looking for.

OpenAI’s ChatGPT: An Introduction

OpenAI’s ChatGPT is a language model that uses deep learning to generate human-like responses to natural language inputs. It is not a search engine, but rather an AI language model that can be used for various applications, including chatbots and virtual assistants. The AI model has been trained on vast amounts of data, making it capable of generating responses to a wide range of questions and inputs.

ChatGPT has been designed to provide users with a more personalized and natural experience when interacting with chatbots or virtual assistants. Unlike traditional chatbots, which use pre-defined responses, ChatGPT generates responses in real-time based on the input it receives. This makes it possible for ChatGPT to provide human-like responses to users, making their experience more natural and engaging.

Google Search Engine vs ChatGPT: Key Differences

Although both Google search engine and ChatGPT have AI at their core, they are different in several ways. Here are some of the key differences between the two:

PointsTypeGoogle Search Engine Vs ChatGPT
1PurposeGoogle Search Engine is designed to provide relevant information to users in response to a specific search query. ChatGPT is designed to answer natural language questions and provide conversational assistance to users.
2FunctionalityGoogle Search Engine searches the web and indexes web pages, images, videos, and other online content to provide relevant results to users. ChatGPT uses advanced AI and NLP technology to understand user queries and provide relevant answers.
3User InteractionGoogle Search Engine requires users to type in a search query and then presents relevant results. ChatGPT interacts with users through natural language conversation and provides real-time responses.
4Search QualityGoogle Search Engine uses complex algorithms and ranking factors to provide the most relevant results to users. ChatGPT uses advanced NLP and AI technology to understand user questions and provide relevant answers.
5User ExperienceGoogle Search Engine provides a text-based interface for users to search and find information. ChatGPT provides a conversational interface for users to interact and receive information.
6RelevanceGoogle Search Engine uses complex algorithms to determine the relevance of search results to the user’s query. ChatGPT uses advanced AI and NLP technology to understand user questions and provide relevant answers.
7AccuracyGoogle Search Engine strives to provide accurate and up-to-date information to users. ChatGPT also uses advanced AI and NLP technology to provide accurate and relevant answers to users.
8Search SpeedGoogle Search Engine provides fast and efficient search results, with millions of pages indexed and search results generated in milliseconds. ChatGPT also provides real-time responses to users, with advanced AI and NLP technology used to quickly process user queries.
9PersonalizationGoogle Search Engine provides personalized search results based on a user’s search history, location, and other factors. ChatGPT also provides personalized responses to users based on their past interactions and preferences.
10User DataGoogle Search Engine collects and stores user data, including search history and location data, for personalized search results and ad targeting. ChatGPT also collects and stores user data for personalized responses and to improve the accuracy of its AI and NLP technology.
11PrivacyGoogle Search Engine has privacy policies in place to protect user data, but users’ search history and other information is still collected and stored. ChatGPT also has privacy policies in place to protect user data, but users’ conversation history and other information is still collected and stored.
12Data SourcesGoogle Search Engine searches the entire web and indexes web pages, images, videos, and other online content to provide relevant results. ChatGPT uses a vast database of information, including structured and unstructured data, to provide relevant answers to users.
13Search TypesGoogle Search Engine supports a wide range of search types, including web search, image search, video search, and more. ChatGPT is primarily designed to answer natural language questions and provide conversational assistance.
14Search ResultsGoogle Search Engine provides a wide range of search results, including web pages, images, videos, news articles, and more. ChatGPT provides specific answers to user questions, often in the form of text or numerical data.
15User FeedbackGoogle Search Engine provides users with the ability to provide feedback on search results and improve the accuracy of search results. ChatGPT also provides users with the ability to provide feedback and improve the accuracy of its AI and

Wrap it up!

What is the difference between Google Search Engine and ChatGPT?

Google Search Engine is a web-based search tool that allows users to search for information on the Internet. It uses algorithms to search for and retrieve relevant information from billions of web pages and other sources. ChatGPT, on the other hand, is an AI-powered language model that provides users with relevant information and responses to their questions and queries in natural language. While both are designed to provide users with information, Google Search Engine relies on web-based sources while ChatGPT uses its own advanced AI technology to generate responses.

Can ChatGPT replace Google Search Engine?

ChatGPT is not designed to replace Google Search Engine. While both can provide users with information, Google Search Engine is more comprehensive and covers a wider range of information sources, including web pages, news articles, images, and more. ChatGPT is best suited for specific, conversational queries and can provide users with a more interactive and personalised experience.

How accurate is ChatGPT compared to Google Search Engine?

Both ChatGPT and Google Search Engine strive to provide accurate information, but their methods and sources of information can vary. Google Search Engine uses complex algorithms and a vast database of web pages to provide accurate results, while ChatGPT uses advanced AI technology to generate responses. However, the accuracy of ChatGPT’s responses can depend on the quality of the training data and the complexity of the query.

Can ChatGPT provide real-time results like Google Search Engine?

ChatGPT is a conversational AI model that generates responses in real-time. This means that users can receive immediate responses to their queries, making it a quick and convenient way to access information. Google Search Engine also provides real-time results, but the speed of results can vary depending on the complexity of the query and the speed of the user’s internet connection.

Is ChatGPT more secure than Google Search Engine?

Both ChatGPT and Google Search Engine have security measures in place to protect user data and ensure the privacy of users. However, ChatGPT has the advantage of being a closed system, meaning that user data is not shared with third-party sources or used for advertising purposes. Google Search Engine, on the other hand, relies on user data to provide targeted advertising, which may result in the sharing of user data with third-party sources. Ultimately, the security and privacy of both systems will depend on the user’s privacy settings and the security measures implemented by the service provider.

Original article source at:

#chatgpt #google #searchengine 

Google Search Engine and OpenAI’s ChatGPT
Royce  Reinger

Royce Reinger


Datasets: 3,000,000+ Unsplash Images Made Available for Research & ML

The Unsplash Dataset

The Unsplash Dataset is made up of over 250,000+ contributing global photographers and data sourced from hundreds of millions of searches across a nearly unlimited number of uses and contexts. Due to the breadth of intent and semantics contained within the Unsplash dataset, it enables new opportunities for research and learning.

The Unsplash Dataset is offered in two datasets:

  • the Lite dataset: available for commercial and noncommercial usage, containing 25k nature-themed Unsplash photos, 25k keywords, and 1M searches
  • the Full dataset: available for noncommercial usage, containing 3M+ high-quality Unsplash photos, 5M keywords, and over 250M searches

As the Unsplash library continues to grow, we’ll release updates to the dataset with new fields and new images, with each subsequent release being semantically versioned.

We welcome any feedback regarding the content of the datasets or their format. With your input, we hope to close the gap between the data we provide and the data that you would like to leverage. You can open an issue to report a problem or to let us know what you would like to see in the next release of the datasets.

For more on the Unsplash Dataset, see our announcement and site.


Lite Dataset

The Lite dataset contains all of the same fields as the Full dataset, but is limited to ~25,000 photos. It can be used for both commercial and non-commercial usage, provided you abide by the terms.

⬇️ Download the Lite dataset [~650MB compressed, ~1.4GB raw]

Full Dataset

The Full dataset is available for non-commercial usage and all uses must abide by the terms. To access, please go to and request access. The dataset weighs 20 GB compressed (43GB raw)).


See the documentation for a complete list of tables and fields.


You can follow these examples to load the dataset in these common formats:

Share your work

We're making this data open and available with the hopes of enabling researchers and developers to discover interesting and useful connections in the data.

We'd love to see what you create, whether that's a research paper, a machine learning model, a blog post, or just an interesting discovery in the data. Send us an email at

If you're using the dataset in a research paper, you can attribute the dataset as Unsplash Lite Dataset 1.2.0 or Unsplash Full Dataset 1.2.0 and link to the permalink

The Unsplash Dataset is made available for research purposes. It cannot be used to redistribute the images contained within. To use the Unsplash library in a product, see the Unsplash API.

Download Details:

Author: Unsplash
Source Code: 

#machinelearning #searchengine #photos #data #research 

Datasets: 3,000,000+ Unsplash Images Made Available for Research & ML
Royce  Reinger

Royce Reinger


Marqo: Tensor Search for Humans


A tensor-based search and analytics engine that seamlessly integrates with applications and websites. Marqo allows developers to turbocharge search functionality with the latest machine learning models, in 3 lines of code.


Try the demo | View the code 

✨ Core Features

⚡ Performance

  • Embeddings stored in in-memory HNSW indexes, achieving cutting edge search speeds.
  • Scale to hundred-million document indexes with horizontal index sharding.
  • Async and non-blocking data upload and search.

🤖 Machine Learning

  • Use the latest machine learning models from PyTorch, Huggingface, OpenAI and more.
  • Start with a pre-configured model or bring your own.
  • Built in ONNX support and conversion for faster inference and higher throughput.
  • CPU and GPU support.

☁️ Cloud-native

  • Fast deployment using Docker.
  • Run Marqo multi-az and high availability.

🌌 End-to-end

  • Build search and analytics on multiple unstructured data types such as text, image, code, video.
  • Filter search results using Marqo’s query DSL.
  • Store unstructred data and semi-structured metadata together in documents, using a range of supported datatypes like bools, ints and keywords.

🍱 Managed cloud

  • Scale marqo at the click of a button and Marqo at million document scale with high performace, including performant management of in-memory HNSW indexes.
  • Multi-az, accelerated inference.
  • Marqo cloud ☁️ is in beta. If you’re interested, apply here.

Learn more about Marqo

📗 Quick startBuild your first application with Marqo in under 5 minutes.
🔍 What is tensor search?A beginner's guide to the fundamentals of Marqo and tensor search.
🖼 Marqo for image dataBuilding text-to-image search in Marqo in 5 lines of code.
📚 Marqo for textBuilding a multilingual database in Marqo.
🔮 Integrating Marqo with GPTMaking GPT a subject matter expert by using Marqo as a knowledge base.
🎨 Marqo for Creative AICombining stable diffusion with semantic search to generate and categorise 100k images of hotdogs.
🦾 FeaturesMarqo's core features.

Getting started

Marqo requires docker. To install Docker go to the Docker Official website.. Ensure that docker has at least 8GB memory and 50GB storage.

Use docker to run Marqo (Mac users with M-series chips will need to go here):

docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:latest
  1. Install the Marqo client:
pip install marqo
  1. Start indexing and searching! Let's look at a simple example below:
import marqo

mq = marqo.Client(url='http://localhost:8882')

        "Title": "The Travels of Marco Polo",
        "Description": "A 13th-century travelogue describing Polo's travels"
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection, "
                       "mobility, life support, and communications for astronauts",
        "_id": "article_591"

results = mq.index("my-first-index").search(
    q="What is the best outfit to wear on the moon?", searchable_attributes=["Title", "Description"]
  • mq is the client that wraps the marqo API
  • add_documents() takes a list of documents, represented as python dicts for indexing.
  • add_documents() creates an index with default settings, if one does not already exist.
  • You can optionally set a document's ID with the special _id field. Otherwise, Marqo will generate one.
  • If the index doesn't exist, Marqo will create it. If it exists then Marqo will add the documents to the index.

Let's have a look at the results:

# let's print out the results:
import pprint

    'hits': [
            'Title': 'Extravehicular Mobility Unit (EMU)',
            'Description': 'The EMU is a spacesuit that provides environmental protection, mobility, life support, and' 
                           'communications for astronauts',
            '_highlights': {
                'Description': 'The EMU is a spacesuit that provides environmental protection, '
                               'mobility, life support, and communications for astronauts'
            '_id': 'article_591',
            '_score': 0.61938936
            'Title': 'The Travels of Marco Polo',
            'Description': "A 13th-century travelogue describing Polo's travels",
            '_highlights': {'Title': 'The Travels of Marco Polo'},
            '_id': 'e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a',
            '_score': 0.60237324
    'limit': 10,
    'processingTimeMs': 49,
    'query': 'What is the best outfit to wear on the moon?'
  • Each hit corresponds to a document that matched the search query.
  • They are ordered from most to least matching.
  • limit is the maximum number of hits to be returned. This can be set as a parameter during search.
  • Each hit has a _highlights field. This was the part of the document that matched the query the best.

Other basic operations

Get document

Retrieve a document by ID.

result = mq.index("my-first-index").get_document(document_id="article_591")

Note that by adding the document using add_documents again using the same _id will cause a document to be updated.

Get index stats

Get information about an index.

results = mq.index("my-first-index").get_stats()

Lexical search

Perform a keyword search.

result = mq.index("my-first-index").search('marco polo', search_method=marqo.SearchMethods.LEXICAL)

Search specific fields

Using the default tensor search method.

result = mq.index("my-first-index").search('adventure', searchable_attributes=['Title'])

Delete documents

Delete documents.

results = mq.index("my-first-index").delete_documents(ids=["article_591", "article_602"])

Delete index

Delete an index.

results = mq.index("my-first-index").delete()

Multi modal and cross modal search

To power image and text search, Marqo allows users to plug and play with CLIP models from HuggingFace. Note that if you do not configure multi modal search, image urls will be treated as strings. To start indexing and searching with images, first create an index with a CLIP configuration, as below:

settings = {
  "treat_urls_and_pointers_as_images":True,   # allows us to find an image file and index it 
response = mq.create_index("my-multimodal-index", **settings)

Images can then be added within documents as follows. You can use urls from the internet (for example S3) or from the disk of the machine:

response = mq.index("my-multimodal-index").add_documents([{
    "My Image": "",
    "Description": "The hippopotamus, also called the common hippopotamus or river hippopotamus, is a large semiaquatic mammal native to sub-Saharan Africa",
    "_id": "hippo-facts"

Setting searchable_attributes to the image field ['My Image'] ensures only images are searched in this index:

results = mq.index("my-multimodal-index").search('animal',  searchable_attributes=['My Image'])

You can then search using text as usual. Both text and image fields will be searched:

results = mq.index("my-multimodal-index").search('animal')

Setting searchable_attributes to the image field ['My Image'] ensures only images are searched in this index:

results = mq.index("my-multimodal-index").search('animal', searchable_attributes=['My Image'])

Searching using an image

Searching using an image can be achieved by providing the image link.

results = mq.index("my-multimodal-index").search('')


The full documentation for Marqo can be found here


Note that you should not run other applications on Marqo's Opensearch cluster as Marqo automatically changes and adapts the settings on the cluster.

M series Mac users

Marqo does not yet support the docker-in-docker backend configuration for the arm64 architecture. This means that if you have an M series Mac, you will also need to run marqo's backend, marqo-os, locally.

To run Marqo on an M series Mac, follow the next steps.

In one terminal run the following command to start opensearch:

docker rm -f marqo-os; docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" marqoai/marqo-os:0.0.3-arm

In another terminal run the following command to launch Marqo:

docker rm -f marqo; docker run --name marqo --privileged \
    -p 8882:8882 --add-host host.docker.internal:host-gateway \
    -e "OPENSEARCH_URL=https://localhost:9200" \


Marqo is a community project with the goal of making tensor search accessible to the wider developer community. We are glad that you are interested in helping out! Please read this to get started.

Dev set up

Create a virtual env python -m venv ./venv.

Activate the virtual environment source ./venv/bin/activate.

Install requirements from the requirements file: pip install -r requirements.txt.

Run tests by running the tox file. CD into this dir and then run "tox".

If you update dependencies, make sure to delete the .tox dir and rerun.

Merge instructions:

Run the full test suite (by using the command tox in this dir).

Create a pull request with an attached github issue.


This readme is available in the following translations:

Download Details:

Author: marqo-ai
Source Code: 
License: Apache-2.0 license

#searchengine #machinelearning #deeplearning #transform #python 

Marqo: Tensor Search for Humans
Hermann  Frami

Hermann Frami


Para: Multitenant Backend Server for Building Web, Mobile Apps Rapidly


A scalable, multitenant backend for the cloud.

Para is a scalable, multitenant backend server/framework for object persistence and retrieval. It helps you build and prototype applications faster by taking care of backend operations. It can be a part of your JVM-based application or it can be deployed as standalone, multitenant API server with multiple applications and clients connecting to it.

The name "pára" means "steam" in Bulgarian. And just like steam is used to power stuff, you can use Para to power your mobile or web application backend.


  • RESTful JSON API secured with Amazon's Signature V4 algorithm
  • Database-agnostic, designed for scalable data stores (DynamoDB, Cassandra, MongoDB, etc.)
  • Full-text search (Lucene, Elasticsearch)
  • Distributed and local object cache (Hazelcast, Caffeine)
  • Multitenancy - each app has its own table, index and cache
  • Webhooks with signed payloads
  • Flexible security based on Spring Security (LDAP, SAML, social login, CSRF protection, etc.)
  • Stateless client authentication with JSON Web Tokens (JWT)
  • Simple but effective resource permissions for client access control
  • Robust constraint validation mechanism based on JSR-303 and Hibernate Validator
  • Per-object control of persistence, index and cache operations
  • Support for optimistic locking and transactions (implemented by each DAO natively)
  • Advanced serialization and deserialization capabilities (Jackson)
  • Full metrics for monitoring and diagnostics (Dropwizard)
  • Modular design powered by Google Guice and support for plugins
  • I18n utilities for translating language packs and working with currencies
  • Standalone executable JAR with embedded Jetty
  • Para Web Console - admin user interface


|                  ____  ___ _ ____ ___ _                  |
|                 / __ \/ __` / ___/ __` /                 |
|                / /_/ / /_/ / /  / /_/ /                  |
|               / .___/\__,_/_/   \__,_/     +-------------+
|              /_/                           | Persistence |
+-------------------+  +-----------------+   +-------------+
|      REST API     |  |     Search      |---|    Cache    |
          |                     |                   |
+---------+---------+  +--------+--------+   +------+------+
|  Signed Requests  |  |  Search Index   |   |  Data Store |
|  and JWT Tokens   |  |      (Any)      |   |    (Any)    |
+----+---------^----+  +-----------------+   +-------------+
     |         |
| Clients: JavaScript, PHP, Java, C#, Android, iOS, et al. |

Quick Start

  1. Download the latest executable JAR
  2. Create a configuration file application.conf file in the same directory as the JAR package.
  3. Start Para with java -jar -Dconfig.file=./application.conf para-*.jar
  4. Install Para CLI with npm install -g para-cli
  5. Create a new dedicated app for your project and save the access keys:Alternatively, you can use the Para Web Console to manage data, or integrate Para directly into your project with one of the API clients below.
# run setup and set endpoint to either 'http://localhost:8080' or ''
# the keys for the root app are inside application.conf
$ para-cli setup
$ para-cli new-app "myapp" --name "My App"


Tagged Docker images for Para are located at erudikaltd/para on Docker Hub. It's highly recommended that you pull only release images like :1.45.1 or :latest_stable because the :latest tag can be broken or unstable. First, create an application.conf file and a data folder and start the Para container:

$ touch application.conf && mkdir data
$ docker run -ti -p 8080:8080 --rm -v $(pwd)/data:/para/data \
  -v $(pwd)/application.conf:/para/application.conf \
  -e JAVA_OPTS="-Dconfig.file=/para/application.conf" erudikaltd/para:latest_stable

Environment variables

JAVA_OPTS - Java system properties, e.g. -Dpara.port=8000 BOOT_SLEEP - Startup delay, in seconds


To use plugins, create a new Dockerfile-plugins which does a multi-stage build like so:

# change X.Y.Z to the version you want to use
FROM erudikaltd/para:v1.XY.Z-base AS base
FROM erudikaltd/para-search-lucene:1.XY.Z AS search
FROM erudikaltd/para-dao-mongodb:1.XY.Z AS dao
FROM base AS final
COPY --from=search /para/lib/*.jar /para/lib
COPY --from=dao /para/lib/*.jar /para/lib

Then simply run $ docker build -f Dockerfile-plugins -t para-mongo .

Building Para

Para can be compiled with JDK 8+:

To compile it you'll need Maven. Once you have it, just clone and build:

$ git clone && cd para
$ mvn install -DskipTests=true

To generate the executable "fat-jar" run $ mvn package and it will be in ./para-jar/target/para-x.y.z-SNAPSHOT.jar. Two JAR files will be generated in total - the fat one is a bit bigger in size.

To build the base package without plugins (excludes para-dao-sql and para-search-lucene), run:

$ cd para-jar && mvn -Pbase package

To run a local instance of Para for development, use:

$ mvn -Dconfig.file=./application.conf spring-boot:run

Standalone server

You can run Para as a standalone server by downloading the executable JAR and then:

$ java -jar para-X.Y.Z.jar

The you can browse your objects through the Para Web Console Simply change the API endpoint to be your local server and connect your access keys. The admin interface is client-side only and your secret key is never sent over the the network. Instead, a JWT access token is generated locally and sent to the server on each request.

Alternatively, you can build a WAR file and deploy it to your favorite servlet container:

$ cd para-war && mvn package

Download JAR

Maven dependency

You can also integrate Para with your project by adding it as a dependency. Para is hosted on Maven Central. Here's the Maven snippet to include in your pom.xml:


For building lightweight client-only applications connecting to Para, include only the client module:


Command-line tool

$ npm install -g para-cli

API clients

Use these client libraries to quickly integrate Para into your project:

Database integrations

Use these DAO implementations to connect to different databases:

  • DynamoDB: AWSDynamoDAO (included in para-server)
  • MongoDB: para-dao-mongodb
  • Cassandra: para-dao-cassandra
  • SQL (H2/MySQL/SQL Server/PostgreSQL, etc.): para-dao-sql H2DAO is the default DAO and it's part of the SQL plugin (packaged with the JAR file)

Search engine integrations

The Search interface is implemented by:

Cache integrations

The Cache interface is implemented by:

  • Caffeine: default objects are cached locally (included in para-server)
  • Hazelcast: para-cache-hazelcast (distributed)

Queue implementations

The Queue interface is implemented by:

  • AWS SQS: in the AWSQueue class
  • LocalQueue for single-host deployments and local development

Projects using Para

Wishlist / Roadmap

  • Para 2.0 - migration to Quarkus, Java 13+ only, native image
  • GraphQL support

Getting help


  1. Fork this repository and clone the fork to your machine
  2. Create a branch (git checkout -b my-new-feature)
  3. Implement a new feature or fix a bug and add some tests
  4. Commit your changes (git commit -am 'Added a new feature')
  5. Push the branch to your fork on GitHub (git push origin my-new-feature)
  6. Create new Pull Request from your fork

Please try to respect the code style of this project. To check your code, run it through the style checker:

mvn validate

For more information see


Read the Docs


Read more about Para on our blog


We offer hosting and premium support at where you can try Para online with a free developer account. Browse and manage your users and objects, do backups and edit permissions with a few clicks in the web console. By upgrading to a premium account you will be able to scale you projects up and down in seconds and manage multiple apps.

See how Para compares to other open source backend frameworks.

This project is fully funded and supported by Erudika - an independent, bootstrapped company.

Download Details:

Author: Erudika
Source Code: 
License: Apache-2.0 license

#serverless #java #api #searchengine #modular 

Para: Multitenant Backend Server for Building Web, Mobile Apps Rapidly
Royce  Reinger

Royce Reinger


Vespa: The Open Big Data Serving Engine


The open big data serving engine - Store, search, organize and make machine-learned inferences over big data at serving time.

This is the primary repository for Vespa where all development is happening. New production releases from this repository's master branch are made each weekday from Monday through Thursday.


Use cases such as search, recommendation and personalization need to select a subset of data in a large corpus, evaluate machine-learned models over the selected data, organize and aggregate it and return it, typically in less than 100 milliseconds, all while the data corpus is continuously changing.

This is hard to do, especially with large data sets that needs to be distributed over multiple nodes and evaluated in parallel. Vespa is a platform which performs these operations for you with high availability and performance. It has been in development for many years and is used on a number of large internet services and apps which serve hundreds of thousands of queries from Vespa per second.


Run your own Vespa instance: Or deploy your Vespa applications to the cloud service:


  • The application created in the getting started guide is fully functional and production ready, but you may want to add more nodes for redundancy.
  • See developing applications on adding your own Java components to your Vespa application.
  • Vespa APIs is useful to understand how to interface with Vespa
  • Explore the sample applications
  • Follow the Vespa Blog for feature updates / use cases

Full documentation is at


We welcome contributions! See to learn how to contribute.

If you want to contribute to the documentation, see


You do not need to build Vespa to use it, but if you want to contribute you need to be able to build the code. This section explains how to build and test Vespa. To understand where to make changes, see Some suggested improvements with pointers to code are in

Development environment

C++ and Java building is supported on CentOS Stream 8. The Java source can also be built on any platform having Java 17 and Maven installed. Use the following guide to set up a complete development environment using Docker for building Vespa, running unit tests and running system tests: Vespa development on CentOS Stream 8.

Build Java modules

export MAVEN_OPTS="-Xms128m -Xmx1024m"
./ java
mvn install --threads 1C

Use this if you only need to build the Java modules, otherwise follow the complete development guide above.

Vespa build status: Vespa Build Status

Download Details:

Author: Vespa-engine
Source Code: 
License: Apache-2.0 license

#machinelearning #java #searchengine #bigdata #ai 

Vespa: The Open Big Data Serving Engine
Rupert  Beatty

Rupert Beatty


TNTSearch: A Fully Featured Full Text Search Engine Written in PHP


TNTSearch is a full-text search (FTS) engine written entirely in PHP. A simple configuration allows you to add an amazing search experience in just minutes. Features include:

  • Fuzzy search
  • Search as you type
  • Geo-search
  • Text classification
  • Stemming
  • Custom tokenizers
  • Bm25 ranking algorithm
  • Boolean search
  • Result highlighting
  • Dynamic index updates (no need to reindex each time)
  • Easily deployable via

We also created some demo pages that show tolerant retrieval with n-grams in action. The package has a bunch of helper functions like Jaro-Winkler and Cosine similarity for distance calculations. It supports stemming for English, Croatian, Arabic, Italian, Russian, Portuguese and Ukrainian. If the built-in stemmers aren't enough, the engine lets you easily plugin any compatible snowball stemmer. Some forks of the package even support Chinese. And please contribute other languages!

Unlike many other engines, the index can be easily updated without doing a reindex or using deltas.


The easiest way to install TNTSearch is via composer:

composer require teamtnt/tntsearch


Before you proceed, make sure your server meets the following requirements:

  • PHP >= 7.1
  • PDO PHP Extension
  • SQLite PHP Extension
  • mbstring PHP Extension


Creating an index

In order to be able to make full text search queries, you have to create an index.


use TeamTNT\TNTSearch\TNTSearch;

$tnt = new TNTSearch;

    'driver'    => 'mysql',
    'host'      => 'localhost',
    'database'  => 'dbname',
    'username'  => 'user',
    'password'  => 'pass',
    'storage'   => '/var/www/tntsearch/examples/',
    'stemmer'   => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class//optional

$indexer = $tnt->createIndex('name.index');
$indexer->query('SELECT id, article FROM articles;');

Important: "storage" settings marks the folder where all of your indexes will be saved so make sure to have permission to write to this folder otherwise you might expect the following exception thrown:

  • [PDOException] SQLSTATE[HY000] [14] unable to open database file *

Note: If your primary key is different than id set it like:


Making the primary key searchable

By default, the primary key isn't searchable. If you want to make it searchable, simply run:



Searching for a phrase or keyword is trivial:

use TeamTNT\TNTSearch\TNTSearch;

$tnt = new TNTSearch;


$res = $tnt->search("This is a test search", 12);

print_r($res); //returns an array of 12 document ids that best match your query

// to display the results you need an additional query against your application database
// SELECT * FROM articles WHERE id IN $res ORDER BY FIELD(id, $res);

The ORDER BY FIELD clause is important, otherwise the database engine will not return the results in the required order.

Boolean Search

use TeamTNT\TNTSearch\TNTSearch;

$tnt = new TNTSearch;


//this will return all documents that have romeo in it but not juliet
$res = $tnt->searchBoolean("romeo -juliet");

//returns all documents that have romeo or hamlet in it
$res = $tnt->searchBoolean("romeo or hamlet");

//returns all documents that have either romeo AND juliet or prince AND hamlet
$res = $tnt->searchBoolean("(romeo juliet) or (prince hamlet)");

Fuzzy Search

The fuzziness can be tweaked by setting the following member variables:

public $fuzzy_prefix_length  = 2;
public $fuzzy_max_expansions = 50;
public $fuzzy_distance       = 2; //represents the Levenshtein distance;
use TeamTNT\TNTSearch\TNTSearch;

$tnt = new TNTSearch;

$tnt->fuzziness = true;

//when the fuzziness flag is set to true, the keyword juleit will return
//documents that match the word juliet, the default Levenshtein distance is 2
$res = $tnt->search("juleit");

Updating the index

Once you created an index, you don't need to reindex it each time you make some changes to your document collection. TNTSearch supports dynamic index updates.

use TeamTNT\TNTSearch\TNTSearch;

$tnt = new TNTSearch;


$index = $tnt->getIndex();

//to insert a new document to the index
$index->insert(['id' => '11', 'title' => 'new title', 'article' => 'new article']);

//to update an existing document
$index->update(11, ['id' => '11', 'title' => 'updated title', 'article' => 'updated article']);

//to delete the document from index

Custom Tokenizer

First, create your own Tokenizer class. It should extend AbstractTokenizer class, define word split $pattern value and must implement TokenizerInterface:

use TeamTNT\TNTSearch\Support\AbstractTokenizer;
use TeamTNT\TNTSearch\Support\TokenizerInterface;

class SomeTokenizer extends AbstractTokenizer implements TokenizerInterface
    static protected $pattern = '/[\s,\.]+/';

    public function tokenize($text) {
        return preg_split($this->getPattern(), strtolower($text), -1, PREG_SPLIT_NO_EMPTY);

This tokenizer will split words using spaces, commas and periods.

After you have the tokenizer ready, you should pass it to TNTIndexer via setTokenizer method.

$someTokenizer = new SomeTokenizer;

$indexer = new TNTIndexer;

Another way would be to pass the tokenizer via config:

use TeamTNT\TNTSearch\TNTSearch;

$tnt = new TNTSearch;

    'driver'    => 'mysql',
    'host'      => 'localhost',
    'database'  => 'dbname',
    'username'  => 'user',
    'password'  => 'pass',
    'storage'   => '/var/www/tntsearch/examples/',
    'stemmer'   => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class//optional,
    'tokenizer' => \TeamTNT\TNTSearch\Support\SomeTokenizer::class

$indexer = $tnt->createIndex('name.index');
$indexer->query('SELECT id, article FROM articles;');

Geo Search


$candyShopIndexer = new TNTGeoIndexer;
$candyShopIndexer->query('SELECT id, longitude, latitude FROM candy_shops;');


$currentLocation = [
    'longitude' => 11.576124,
    'latitude'  => 48.137154

$distance = 2; //km

$candyShopIndex = new TNTGeoSearch();

$candyShops = $candyShopIndex->findNearest($currentLocation, $distance, 10);


use TeamTNT\TNTSearch\Classifier\TNTClassifier;

$classifier = new TNTClassifier();
$classifier->learn("A great game", "Sports");
$classifier->learn("The election was over", "Not sports");
$classifier->learn("Very clean match", "Sports");
$classifier->learn("A clean but forgettable game", "Sports");

$guess = $classifier->predict("It was a close election");
var_dump($guess['label']); //returns "Not sports"

Saving the classifier


Loading the classifier

$classifier = new TNTClassifier();




Premium products

If you're using TNT Search and finding it useful, take a look at our premium analytics tool:

Support us on Open Collective


You're free to use this package, but if it makes it to your production environment, we would highly appreciate you sending us a PS4 game of your choice. This way you support us to further develop and add new features.

Our address is: TNT Studio, Sv. Mateja 19, 10010 Zagreb, Croatia.

We'll publish all received games here

Author: Teamtnt
Source Code: 
License: MIT license

#laravel #search #php #searchengine 

TNTSearch: A Fully Featured Full Text Search Engine Written in PHP

Toshi: A Full-text Search Engine in Rust


What is a Toshi?

Toshi is a three year old Shiba Inu. He is a very good boy and is the official mascot of this project. Toshi personally reviews all code before it is committed to this repository and is dedicated to only accepting the highest quality contributions from his human. He will, though, accept treats for easier code reviews.

Please note that this is far from production ready, also Toshi is still under active development, I'm just slow.


Toshi is meant to be a full-text search engine similar to Elasticsearch. Toshi strives to be to Elasticsearch what Tantivy is to Lucene.


Toshi will always target stable Rust and will try our best to never make any use of unsafe Rust. While underlying libraries may make some use of unsafe, Toshi will make a concerted effort to vet these libraries in an effort to be completely free of unsafe Rust usage. The reason I chose this was because I felt that for this to actually become an attractive option for people to consider it would have to have be safe, stable and consistent. This was why stable Rust was chosen because of the guarantees and safety it provides. I did not want to go down the rabbit hole of using nightly features to then have issues with their stability later on. Since Toshi is not meant to be a library, I'm perfectly fine with having this requirement because people who would want to use this more than likely will take it off the shelf and not modify it. My motivation was to cater to that use case when building Toshi.

Build Requirements

At this current time Toshi should build and work fine on Windows, Mac OS X, and Linux. From dependency requirements you are going to need 1.39.0 and Cargo installed in order to build. You can get rust easily from rustup.


There is a default configuration file in config/config.toml:

host = ""
port = 8080
path = "data2/"
writer_memory = 200000000
log_level = "info"
json_parsing_threads = 4
bulk_buffer_size = 10000
auto_commit_duration = 10
experimental = false

master = true
nodes = [

kind = "log"
min_merge_size = 8
min_layer_size = 10_000
level_log_size = 0.75


host = "localhost"

The hostname Toshi will bind to upon start.


port = 8080

The port Toshi will bind to upon start.


path = "data/"

The data path where Toshi will store its data and indices.

Writer Memory

writer_memory = 200000000

The amount of memory (in bytes) Toshi should allocate to commits for new documents.

Log Level

log_level = "info"

The detail level to use for Toshi's logging.

Json Parsing

json_parsing_threads = 4

When Toshi does a bulk ingest of documents it will spin up a number of threads to parse the document's json as it's received. This controls the number of threads spawned to handle this job.

Bulk Buffer

bulk_buffer_size = 10000

This will control the buffer size for parsing documents into an index. It will control the amount of memory a bulk ingest will take up by blocking when the message buffer is filled. If you want to go totally off the rails you can set this to 0 in order to make the buffer unbounded.

Auto Commit Duration

auto_commit_duration = 10

This controls how often an index will automatically commit documents if there are docs to be committed. Set this to 0 to disable this feature, but you will have to do commits yourself when you submit documents.

Merge Policy

kind = "log"

Tantivy will merge index segments according to the configuration outlined here. There are 2 options for this. "log" which is the default segment merge behavior. Log has 3 additional values to it as well. Any of these 3 values can be omitted to use Tantivy's default value. The default values are listed below.

min_merge_size = 8
min_layer_size = 10_000
level_log_size = 0.75

In addition there is the "nomerge" option, in which Tantivy will do no merging of segments.

Experimental Settings

experimental = false

master = true
nodes = [

In general these settings aren't ready for usage yet as they are very unstable or flat out broken. Right now the distribution of Toshi is behind this flag, so if experimental is set to false then all these settings are ignored.

Building and Running

Toshi can be built using cargo build --release. Once Toshi is built you can run ./target/release/toshi from the top level directory to start Toshi according to the configuration in config/config.toml

You should get a startup message like this.

  ______         __   _   ____                 __
 /_  __/__  ___ / /  (_) / __/__ ___ _________/ /
  / / / _ \(_-</ _ \/ / _\ \/ -_) _ `/ __/ __/ _ \
 /_/  \___/___/_//_/_/ /___/\__/\_,_/_/  \__/_//_/
 Such Relevance, Much Index, Many Search, Wow
 INFO  toshi::index > Indexes: []

You can verify Toshi is running with:

curl -X GET http://localhost:8080/

which should return:

  "name": "Toshi Search",
  "version": "0.1.1"

Once toshi is running it's best to check the requests.http file in the root of this project to see some more examples of usage.

Example Queries

Term Query

{ "query": {"term": {"test_text": "document" } }, "limit": 10 }

Fuzzy Term Query

{ "query": {"fuzzy": {"test_text": {"value": "document", "distance": 0, "transposition": false } } }, "limit": 10 }

Phrase Query

{ "query": {"phrase": {"test_text": {"terms": ["test","document"] } } }, "limit": 10 }

Range Query

{ "query": {"range": { "test_i64": { "gte": 2012, "lte": 2015 } } }, "limit": 10 }

Regex Query

{ "query": {"regex": { "test_text": "d[ou]{1}c[k]?ument" } }, "limit": 10 }

Boolean Query

{ "query": {"bool": {"must": [ { "term": { "test_text": "document" } } ], "must_not": [ {"range": {"test_i64": { "gt": 2017 } } } ] } }, "limit": 10 }


To try any of the above queries you can use the above example

curl -X POST http://localhost:8080/test_index -H 'Content-Type: application/json' -d '{ "query": {"term": {"test_text": "document" } }, "limit": 10 }'

Also, to note, limit is optional, 10 is the default value. It's only included here for completeness.

Running Tests

cargo test

Download Details:
Author: toshi-search
Source Code:
License: MIT License

#rust  #rustlang  #machinelearing #searchengine 

Toshi: A Full-text Search Engine in Rust

Zina maier


Skyrocket Your Digital Platforms By Using Binance Smart Chain

The blockchain is the key to everything when it comes to cryptocurrencies and non-fungible tokens. These two platforms have submitted themselves as the two pillars of the crypto ecosystem. But the foundation of these pillars is laid by the power of blockchain. There are numerous blockchains in the crypto ecosystem, but nothing has come closer to the Binance Smart Chain in terms of blockchain fundamentals. It would elevate any crypto platform that is being built on it at a rapid rate.

#BinanceSmartChainDevelopment #BinanceSmartChain #BinanceSmartChainDevelopmentServices #BinanceSmartChainDevelopmentCompany #BinanceSmartChainDevelopmentServices #BinanceChain #etherum #synthetix #searchengine #testnet 

Julie Murray

Julie Murray


Hire Best SEO Expert | Best Local SEO Marketing Consultant 2022

Just like the rest of the world, the marketing industry develops with the speed of light. Several decades ago, people couldn’t imagine that some products and services would no longer be advertised with magazines, newspapers, and billboards. Today these marketing channels are rarely used. We live in the digital epoch when traditional marketing cannot reach the necessary audience. Now, online marketing is the most effective way to promote a business, and the predictions say that the demand for it will only grow.

We know the Google Ads platform in & out and can help you expand your paid search service offerings along with helping you add a second revenue stream for your agency. Our team of Google certified experts acts as an extension to your in-house team, serving you with the required expertise and productivity.

List of Top 10 SEO Marketing Consultant Companies:

  1. Auxano Global Services
  2. Bamboo Apps
  3. AccelOne
  4. Slick Development
  5. Oxagile
  6. Smart & Soft
  7. Adexin
  8. IT Plus
  9. Ovolab
  10. Celadon

1.Auxano Global Services

Auxano Global Services is best SEO Marketing Consultant, We offers a comprehensive array of professional search engine optimization services to get your business more visibility in search using only trustworthy, future-proof, white hat SEO techniques, Our SEO experts work closely with our clients to develop personalized SEO strategies that drive long-term profitability. By using a proven, efficient and effective methodology, we are able to create high-quality, measurable results. 

2.Bamboo Apps

Bamboo Apps, we’re committed to perfection in the design and simplicity of processes that make mobile development hassle-free and let our clients focus on growing their businesses with beautiful apps. We handle every aspect of the development of native and cross-platform apps: UX/UI design, front-end, back-end programming. And apply our extensive expertise in automotive, education & e-learning, insurance, healthcare, IoT to meet the industry-specific requirements.

3. AccelOne

Our company was created by small team of seasoned professionals with almost 100 years of combined experience in the IT Industry who have come together with the common goal of solving software development's biggest challenge: delivering quality solutions on-time and on-budget. Now, with multi-national teams across the Americas we are ready to help you with your software consulting and staffing needs.

More Results….

#seo #seoexperts #topseoexperts  #searchengine #searchengineoptimization  #hireseoexperts  #topseoexpertsforhire

Hire Best SEO Expert | Best Local SEO Marketing Consultant 2022
Nadine Daiz

Nadine Daiz


Hire 10+ Best Local SEO Companies in the World 2022

Approximately 93 percent of U.S. consumers search for local businesses online and about 88 percent of SEO local mobile searches result in a store visit or phone call within 24 hours. Local SEO is the process of improving the local search visibility of small and medium-sized businesses (SMBs), brick-and-mortar stores and multiple-location businesses within a geographic area.

Need expert help to unravel what is local SEO and how to boost your local SEO ranking? The best local SEO companies can answer all your local SEO marketing questions and guide you with your local SEO optimization efforts.

List of Best 10 Local SEO companies:

  1. Auxano Global Services
  2. Wezom Mobile
  3. Orangesoft
  4. Altimize
  5. Genium
  6. Rangle
  7. Fusion Informatics
  8. BIO communication agency
  9. CactusSoft
  10. CreativeBox

1.Auxano Global Services

Auxano Global Services is a top local SEO company based in India, that provides comprehensive local search engine optimization services to businesses worldwide. We understand that local SEO is essential to brand success. That is why we are here to assist you with your local digital marketing and local search optimization endeavours. Our local SEO experts also explain to your team ensure we are on the same page; we create a local SEO checklist that outlines each local SEO strategy included in your packages.

2.Wezom Mobile

For over 20 years we have been developing custom IT solutions for medium-sized businesses and corporations. We specialize in Logistics & Supply Chain, and also have extensive experience in Real estate and B2B eCommerce

3. Orangesoft

Orangesoft is a mobile app & web development company from Belarus. We started guiding companies into mobile and web development in 2011 and have successfully completed more than 300 projects ever since. Over the years, we have become a full-cycle software development company delivering highly productive and cost-effective app development solutions across various domains.

More Results….

#seo #localseo  #localseoexperts  #searchengine #searchengineoptimization #toplocalseoexperts  #hirelocalseoexperts  #localseoexpertsforhire 

Hire 10+ Best Local SEO Companies in the World 2022