1679627160
Devs are the best early adopters, they adopt technology early and aid in spreading it to their non-technical peers. That's why gerev is focused on making a product dev's adore and love ❤️
For finding internal pages fast ⚡️
For finding code snippets and code examples 🧑💻
Coming Soon...
🙏 - by the community
Enables searching using natural language. such as "How to do X"
, "how to connect to Y"
, "Do we support Z"
Getting Started
Install nvidia container toolkit on the host machine.
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
Then run the docker container like so:
docker run --gpus all -p 80:80 -v ~/.gerev/storage:/opt/storage gerev/gerev
docker run -p 80:80 -v ~/.gerev/storage:/opt/storage gerev/gerev
See CONTRIBUTING.md
Find any conversation, doc, or internal page in seconds ⏲️⚡️
Join 100+ devs by hosting your own gerev instance, become a hero within your org! 💪
Author: GerevAI
Source Code: https://github.com/GerevAI/gerev
License: AGPL-3.0 license
1676350080
The world of technology has come a long way since the first search engine was introduced in the early 90s. Today, there are numerous search engines available, with Google being the most popular one. But Google’s dominance in the search engine market is not the only thing that makes it stand out. It has been constantly evolving to provide its users with the best possible experience.
On the other hand, OpenAI’s ChatGPT is a language model that uses deep learning to generate human-like responses to natural language inputs. It is not a search engine, but rather an AI language model that can be used for various applications, including chatbots and virtual assistants. Although both Google search engine and ChatGPT have different uses, they both have AI at their core, making them powerful tools in the technology landscape.
In this article, we will discuss the differences between Google search engine and OpenAI’s ChatGPT, and how they are being used in the technology world.
Google search engine is one of the most popular search engines in the world, with billions of searches being made every day. It is known for its simplicity and speed, and its ability to provide relevant results. Google’s search engine uses a complex algorithm to determine the relevance of web pages and rank them accordingly. The algorithm takes into account various factors, including keywords, relevance, and the popularity of the website. The more relevant the results, the higher they appear on the search engine results page (SERP).
Google search engine has been constantly evolving over the years to provide users with the best possible experience. One of the major changes it has undergone is the incorporation of AI into its search algorithm. This has enabled Google to provide users with more relevant and personalized results. For example, Google’s AI can understand the context of a search query and provide results that are relevant to the user’s needs. This has revolutionized the way people use search engines and has made it easier to find the information they are looking for.
OpenAI’s ChatGPT is a language model that uses deep learning to generate human-like responses to natural language inputs. It is not a search engine, but rather an AI language model that can be used for various applications, including chatbots and virtual assistants. The AI model has been trained on vast amounts of data, making it capable of generating responses to a wide range of questions and inputs.
ChatGPT has been designed to provide users with a more personalized and natural experience when interacting with chatbots or virtual assistants. Unlike traditional chatbots, which use pre-defined responses, ChatGPT generates responses in real-time based on the input it receives. This makes it possible for ChatGPT to provide human-like responses to users, making their experience more natural and engaging.
Although both Google search engine and ChatGPT have AI at their core, they are different in several ways. Here are some of the key differences between the two:
Points | Type | Google Search Engine Vs ChatGPT |
---|---|---|
1 | Purpose | Google Search Engine is designed to provide relevant information to users in response to a specific search query. ChatGPT is designed to answer natural language questions and provide conversational assistance to users. |
2 | Functionality | Google Search Engine searches the web and indexes web pages, images, videos, and other online content to provide relevant results to users. ChatGPT uses advanced AI and NLP technology to understand user queries and provide relevant answers. |
3 | User Interaction | Google Search Engine requires users to type in a search query and then presents relevant results. ChatGPT interacts with users through natural language conversation and provides real-time responses. |
4 | Search Quality | Google Search Engine uses complex algorithms and ranking factors to provide the most relevant results to users. ChatGPT uses advanced NLP and AI technology to understand user questions and provide relevant answers. |
5 | User Experience | Google Search Engine provides a text-based interface for users to search and find information. ChatGPT provides a conversational interface for users to interact and receive information. |
6 | Relevance | Google Search Engine uses complex algorithms to determine the relevance of search results to the user’s query. ChatGPT uses advanced AI and NLP technology to understand user questions and provide relevant answers. |
7 | Accuracy | Google Search Engine strives to provide accurate and up-to-date information to users. ChatGPT also uses advanced AI and NLP technology to provide accurate and relevant answers to users. |
8 | Search Speed | Google Search Engine provides fast and efficient search results, with millions of pages indexed and search results generated in milliseconds. ChatGPT also provides real-time responses to users, with advanced AI and NLP technology used to quickly process user queries. |
9 | Personalization | Google Search Engine provides personalized search results based on a user’s search history, location, and other factors. ChatGPT also provides personalized responses to users based on their past interactions and preferences. |
10 | User Data | Google Search Engine collects and stores user data, including search history and location data, for personalized search results and ad targeting. ChatGPT also collects and stores user data for personalized responses and to improve the accuracy of its AI and NLP technology. |
11 | Privacy | Google Search Engine has privacy policies in place to protect user data, but users’ search history and other information is still collected and stored. ChatGPT also has privacy policies in place to protect user data, but users’ conversation history and other information is still collected and stored. |
12 | Data Sources | Google Search Engine searches the entire web and indexes web pages, images, videos, and other online content to provide relevant results. ChatGPT uses a vast database of information, including structured and unstructured data, to provide relevant answers to users. |
13 | Search Types | Google Search Engine supports a wide range of search types, including web search, image search, video search, and more. ChatGPT is primarily designed to answer natural language questions and provide conversational assistance. |
14 | Search Results | Google Search Engine provides a wide range of search results, including web pages, images, videos, news articles, and more. ChatGPT provides specific answers to user questions, often in the form of text or numerical data. |
15 | User Feedback | Google Search Engine provides users with the ability to provide feedback on search results and improve the accuracy of search results. ChatGPT also provides users with the ability to provide feedback and improve the accuracy of its AI and |
Google Search Engine is a web-based search tool that allows users to search for information on the Internet. It uses algorithms to search for and retrieve relevant information from billions of web pages and other sources. ChatGPT, on the other hand, is an AI-powered language model that provides users with relevant information and responses to their questions and queries in natural language. While both are designed to provide users with information, Google Search Engine relies on web-based sources while ChatGPT uses its own advanced AI technology to generate responses.
ChatGPT is not designed to replace Google Search Engine. While both can provide users with information, Google Search Engine is more comprehensive and covers a wider range of information sources, including web pages, news articles, images, and more. ChatGPT is best suited for specific, conversational queries and can provide users with a more interactive and personalised experience.
Both ChatGPT and Google Search Engine strive to provide accurate information, but their methods and sources of information can vary. Google Search Engine uses complex algorithms and a vast database of web pages to provide accurate results, while ChatGPT uses advanced AI technology to generate responses. However, the accuracy of ChatGPT’s responses can depend on the quality of the training data and the complexity of the query.
ChatGPT is a conversational AI model that generates responses in real-time. This means that users can receive immediate responses to their queries, making it a quick and convenient way to access information. Google Search Engine also provides real-time results, but the speed of results can vary depending on the complexity of the query and the speed of the user’s internet connection.
Both ChatGPT and Google Search Engine have security measures in place to protect user data and ensure the privacy of users. However, ChatGPT has the advantage of being a closed system, meaning that user data is not shared with third-party sources or used for advertising purposes. Google Search Engine, on the other hand, relies on user data to provide targeted advertising, which may result in the sharing of user data with third-party sources. Ultimately, the security and privacy of both systems will depend on the user’s privacy settings and the security measures implemented by the service provider.
Original article source at: https://www.cloudbooklet.com/
1676027100
The Unsplash Dataset is made up of over 250,000+ contributing global photographers and data sourced from hundreds of millions of searches across a nearly unlimited number of uses and contexts. Due to the breadth of intent and semantics contained within the Unsplash dataset, it enables new opportunities for research and learning.
The Unsplash Dataset is offered in two datasets:
As the Unsplash library continues to grow, we’ll release updates to the dataset with new fields and new images, with each subsequent release being semantically versioned.
We welcome any feedback regarding the content of the datasets or their format. With your input, we hope to close the gap between the data we provide and the data that you would like to leverage. You can open an issue to report a problem or to let us know what you would like to see in the next release of the datasets.
For more on the Unsplash Dataset, see our announcement and site.
The Lite dataset contains all of the same fields as the Full dataset, but is limited to ~25,000 photos. It can be used for both commercial and non-commercial usage, provided you abide by the terms.
⬇️ Download the Lite dataset [~650MB compressed, ~1.4GB raw]
The Full dataset is available for non-commercial usage and all uses must abide by the terms. To access, please go to unsplash.com/data and request access. The dataset weighs 20 GB compressed (43GB raw)).
See the documentation for a complete list of tables and fields.
You can follow these examples to load the dataset in these common formats:
We're making this data open and available with the hopes of enabling researchers and developers to discover interesting and useful connections in the data.
We'd love to see what you create, whether that's a research paper, a machine learning model, a blog post, or just an interesting discovery in the data. Send us an email at data@unsplash.com.
If you're using the dataset in a research paper, you can attribute the dataset as Unsplash Lite Dataset 1.2.0
or Unsplash Full Dataset 1.2.0
and link to the permalink unsplash.com/data
.
The Unsplash Dataset is made available for research purposes. It cannot be used to redistribute the images contained within. To use the Unsplash library in a product, see the Unsplash API.
Author: Unsplash
Source Code: https://github.com/unsplash/datasets
1675318380
A tensor-based search and analytics engine that seamlessly integrates with applications and websites. Marqo allows developers to turbocharge search functionality with the latest machine learning models, in 3 lines of code.
⚡ Performance
🤖 Machine Learning
☁️ Cloud-native
🌌 End-to-end
🍱 Managed cloud
📗 Quick start | Build your first application with Marqo in under 5 minutes. |
🔍 What is tensor search? | A beginner's guide to the fundamentals of Marqo and tensor search. |
🖼 Marqo for image data | Building text-to-image search in Marqo in 5 lines of code. |
📚 Marqo for text | Building a multilingual database in Marqo. |
🔮 Integrating Marqo with GPT | Making GPT a subject matter expert by using Marqo as a knowledge base. |
🎨 Marqo for Creative AI | Combining stable diffusion with semantic search to generate and categorise 100k images of hotdogs. |
🦾 Features | Marqo's core features. |
Marqo requires docker. To install Docker go to the Docker Official website.. Ensure that docker has at least 8GB memory and 50GB storage.
Use docker to run Marqo (Mac users with M-series chips will need to go here):
docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:latest
pip install marqo
import marqo
mq = marqo.Client(url='http://localhost:8882')
mq.index("my-first-index").add_documents([
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing Polo's travels"
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection, "
"mobility, life support, and communications for astronauts",
"_id": "article_591"
}]
)
results = mq.index("my-first-index").search(
q="What is the best outfit to wear on the moon?", searchable_attributes=["Title", "Description"]
)
mq
is the client that wraps the marqo
APIadd_documents()
takes a list of documents, represented as python dicts for indexing.add_documents()
creates an index with default settings, if one does not already exist._id
field. Otherwise, Marqo will generate one.Let's have a look at the results:
# let's print out the results:
import pprint
pprint.pprint(results)
{
'hits': [
{
'Title': 'Extravehicular Mobility Unit (EMU)',
'Description': 'The EMU is a spacesuit that provides environmental protection, mobility, life support, and'
'communications for astronauts',
'_highlights': {
'Description': 'The EMU is a spacesuit that provides environmental protection, '
'mobility, life support, and communications for astronauts'
},
'_id': 'article_591',
'_score': 0.61938936
},
{
'Title': 'The Travels of Marco Polo',
'Description': "A 13th-century travelogue describing Polo's travels",
'_highlights': {'Title': 'The Travels of Marco Polo'},
'_id': 'e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a',
'_score': 0.60237324
}
],
'limit': 10,
'processingTimeMs': 49,
'query': 'What is the best outfit to wear on the moon?'
}
limit
is the maximum number of hits to be returned. This can be set as a parameter during search._highlights
field. This was the part of the document that matched the query the best.Retrieve a document by ID.
result = mq.index("my-first-index").get_document(document_id="article_591")
Note that by adding the document using add_documents
again using the same _id
will cause a document to be updated.
Get information about an index.
results = mq.index("my-first-index").get_stats()
Perform a keyword search.
result = mq.index("my-first-index").search('marco polo', search_method=marqo.SearchMethods.LEXICAL)
Using the default tensor search method.
result = mq.index("my-first-index").search('adventure', searchable_attributes=['Title'])
Delete documents.
results = mq.index("my-first-index").delete_documents(ids=["article_591", "article_602"])
Delete an index.
results = mq.index("my-first-index").delete()
To power image and text search, Marqo allows users to plug and play with CLIP models from HuggingFace. Note that if you do not configure multi modal search, image urls will be treated as strings. To start indexing and searching with images, first create an index with a CLIP configuration, as below:
settings = {
"treat_urls_and_pointers_as_images":True, # allows us to find an image file and index it
"model":"ViT-L/14"
}
response = mq.create_index("my-multimodal-index", **settings)
Images can then be added within documents as follows. You can use urls from the internet (for example S3) or from the disk of the machine:
response = mq.index("my-multimodal-index").add_documents([{
"My Image": "https://upload.wikimedia.org/wikipedia/commons/thumb/f/f2/Portrait_Hippopotamus_in_the_water.jpg/440px-Portrait_Hippopotamus_in_the_water.jpg",
"Description": "The hippopotamus, also called the common hippopotamus or river hippopotamus, is a large semiaquatic mammal native to sub-Saharan Africa",
"_id": "hippo-facts"
}])
Setting searchable_attributes
to the image field ['My Image']
ensures only images are searched in this index:
results = mq.index("my-multimodal-index").search('animal', searchable_attributes=['My Image'])
You can then search using text as usual. Both text and image fields will be searched:
results = mq.index("my-multimodal-index").search('animal')
Setting searchable_attributes
to the image field ['My Image']
ensures only images are searched in this index:
results = mq.index("my-multimodal-index").search('animal', searchable_attributes=['My Image'])
Searching using an image can be achieved by providing the image link.
results = mq.index("my-multimodal-index").search('https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Standing_Hippopotamus_MET_DP248993.jpg/440px-Standing_Hippopotamus_MET_DP248993.jpg')
The full documentation for Marqo can be found here https://docs.marqo.ai/.
Note that you should not run other applications on Marqo's Opensearch cluster as Marqo automatically changes and adapts the settings on the cluster.
Marqo does not yet support the docker-in-docker backend configuration for the arm64 architecture. This means that if you have an M series Mac, you will also need to run marqo's backend, marqo-os, locally.
To run Marqo on an M series Mac, follow the next steps.
In one terminal run the following command to start opensearch:
docker rm -f marqo-os; docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" marqoai/marqo-os:0.0.3-arm
In another terminal run the following command to launch Marqo:
docker rm -f marqo; docker run --name marqo --privileged \
-p 8882:8882 --add-host host.docker.internal:host-gateway \
-e "OPENSEARCH_URL=https://localhost:9200" \
marqoai/marqo:latest
Marqo is a community project with the goal of making tensor search accessible to the wider developer community. We are glad that you are interested in helping out! Please read this to get started.
Create a virtual env python -m venv ./venv
.
Activate the virtual environment source ./venv/bin/activate
.
Install requirements from the requirements file: pip install -r requirements.txt
.
Run tests by running the tox file. CD into this dir and then run "tox".
If you update dependencies, make sure to delete the .tox dir and rerun.
Run the full test suite (by using the command tox
in this dir).
Create a pull request with an attached github issue.
This readme is available in the following translations:
Author: marqo-ai
Source Code: https://github.com/marqo-ai/marqo
License: Apache-2.0 license
#searchengine #machinelearning #deeplearning #transform #python
1673859970
Para is a scalable, multitenant backend server/framework for object persistence and retrieval. It helps you build and prototype applications faster by taking care of backend operations. It can be a part of your JVM-based application or it can be deployed as standalone, multitenant API server with multiple applications and clients connecting to it.
The name "pára" means "steam" in Bulgarian. And just like steam is used to power stuff, you can use Para to power your mobile or web application backend.
DAO
natively)+----------------------------------------------------------+
| ____ ___ _ ____ ___ _ |
| / __ \/ __` / ___/ __` / |
| / /_/ / /_/ / / / /_/ / |
| / .___/\__,_/_/ \__,_/ +-------------+
| /_/ | Persistence |
+-------------------+ +-----------------+ +-------------+
| REST API | | Search |---| Cache |
+---------+---------+--+--------+--------+---+------+------+
| | |
+---------+---------+ +--------+--------+ +------+------+
| Signed Requests | | Search Index | | Data Store |
| and JWT Tokens | | (Any) | | (Any) |
+----+---------^----+ +-----------------+ +-------------+
| |
+----v---------+-------------------------------------------+
| Clients: JavaScript, PHP, Java, C#, Android, iOS, et al. |
+----------------------------------------------------------+
application.conf
file in the same directory as the JAR package.java -jar -Dconfig.file=./application.conf para-*.jar
npm install -g para-cli
# run setup and set endpoint to either 'http://localhost:8080' or 'https://paraio.com'
# the keys for the root app are inside application.conf
$ para-cli setup
$ para-cli new-app "myapp" --name "My App"
Tagged Docker images for Para are located at erudikaltd/para
on Docker Hub. It's highly recommended that you pull only release images like :1.45.1
or :latest_stable
because the :latest
tag can be broken or unstable. First, create an application.conf
file and a data
folder and start the Para container:
$ touch application.conf && mkdir data
$ docker run -ti -p 8080:8080 --rm -v $(pwd)/data:/para/data \
-v $(pwd)/application.conf:/para/application.conf \
-e JAVA_OPTS="-Dconfig.file=/para/application.conf" erudikaltd/para:latest_stable
Environment variables
JAVA_OPTS
- Java system properties, e.g. -Dpara.port=8000
BOOT_SLEEP
- Startup delay, in seconds
Plugins
To use plugins, create a new Dockerfile-plugins
which does a multi-stage build like so:
# change X.Y.Z to the version you want to use
FROM erudikaltd/para:v1.XY.Z-base AS base
FROM erudikaltd/para-search-lucene:1.XY.Z AS search
FROM erudikaltd/para-dao-mongodb:1.XY.Z AS dao
FROM base AS final
COPY --from=search /para/lib/*.jar /para/lib
COPY --from=dao /para/lib/*.jar /para/lib
Then simply run $ docker build -f Dockerfile-plugins -t para-mongo .
Para can be compiled with JDK 8+:
To compile it you'll need Maven. Once you have it, just clone and build:
$ git clone https://github.com/erudika/para.git && cd para
$ mvn install -DskipTests=true
To generate the executable "fat-jar" run $ mvn package
and it will be in ./para-jar/target/para-x.y.z-SNAPSHOT.jar
. Two JAR files will be generated in total - the fat one is a bit bigger in size.
To build the base package without plugins (excludes para-dao-sql
and para-search-lucene
), run:
$ cd para-jar && mvn -Pbase package
To run a local instance of Para for development, use:
$ mvn -Dconfig.file=./application.conf spring-boot:run
You can run Para as a standalone server by downloading the executable JAR and then:
$ java -jar para-X.Y.Z.jar
The you can browse your objects through the Para Web Console console.paraio.org. Simply change the API endpoint to be your local server and connect your access keys. The admin interface is client-side only and your secret key is never sent over the the network. Instead, a JWT access token is generated locally and sent to the server on each request.
Alternatively, you can build a WAR file and deploy it to your favorite servlet container:
$ cd para-war && mvn package
You can also integrate Para with your project by adding it as a dependency. Para is hosted on Maven Central. Here's the Maven snippet to include in your pom.xml
:
<dependency>
<groupId>com.erudika</groupId>
<artifactId>para-server</artifactId>
<version>{see_green_version_badge_above}</version>
</dependency>
For building lightweight client-only applications connecting to Para, include only the client module:
<dependency>
<groupId>com.erudika</groupId>
<artifactId>para-client</artifactId>
<version>{see_green_version_badge_above}</version>
</dependency>
$ npm install -g para-cli
Use these client libraries to quickly integrate Para into your project:
Use these DAO
implementations to connect to different databases:
AWSDynamoDAO
(included in para-server
)H2DAO
is the default DAO
and it's part of the SQL plugin (packaged with the JAR file)The Search
interface is implemented by:
The Cache
interface is implemented by:
para-server
)The Queue
interface is implemented by:
AWSQueue
classLocalQueue
for single-host deployments and local development2.0
- migration to Quarkus, Java 13+ only, native imagepara
tagpara
tag on Stack Overflowgit checkout -b my-new-feature
)git commit -am 'Added a new feature'
)git push origin my-new-feature
)Please try to respect the code style of this project. To check your code, run it through the style checker:
mvn validate
For more information see CONTRIBUTING.md
We offer hosting and premium support at paraio.com where you can try Para online with a free developer account. Browse and manage your users and objects, do backups and edit permissions with a few clicks in the web console. By upgrading to a premium account you will be able to scale you projects up and down in seconds and manage multiple apps.
See how Para compares to other open source backend frameworks.
This project is fully funded and supported by Erudika - an independent, bootstrapped company.
Author: Erudika
Source Code: https://github.com/Erudika/para
License: Apache-2.0 license
1672785900
The open big data serving engine - Store, search, organize and make machine-learned inferences over big data at serving time.
This is the primary repository for Vespa where all development is happening. New production releases from this repository's master branch are made each weekday from Monday through Thursday.
Use cases such as search, recommendation and personalization need to select a subset of data in a large corpus, evaluate machine-learned models over the selected data, organize and aggregate it and return it, typically in less than 100 milliseconds, all while the data corpus is continuously changing.
This is hard to do, especially with large data sets that needs to be distributed over multiple nodes and evaluated in parallel. Vespa is a platform which performs these operations for you with high availability and performance. It has been in development for many years and is used on a number of large internet services and apps which serve hundreds of thousands of queries from Vespa per second.
Run your own Vespa instance: https://docs.vespa.ai/en/getting-started.html Or deploy your Vespa applications to the cloud service: https://cloud.vespa.ai
Full documentation is at https://docs.vespa.ai.
We welcome contributions! See CONTRIBUTING.md to learn how to contribute.
If you want to contribute to the documentation, see https://github.com/vespa-engine/documentation
You do not need to build Vespa to use it, but if you want to contribute you need to be able to build the code. This section explains how to build and test Vespa. To understand where to make changes, see Code-map.md. Some suggested improvements with pointers to code are in TODO.md.
C++ and Java building is supported on CentOS Stream 8. The Java source can also be built on any platform having Java 17 and Maven installed. Use the following guide to set up a complete development environment using Docker for building Vespa, running unit tests and running system tests: Vespa development on CentOS Stream 8.
export MAVEN_OPTS="-Xms128m -Xmx1024m"
./bootstrap.sh java
mvn install --threads 1C
Use this if you only need to build the Java modules, otherwise follow the complete development guide above.
Author: Vespa-engine
Source Code: https://github.com/vespa-engine/vespa
License: Apache-2.0 license
1659121200
TNTSearch is a full-text search (FTS) engine written entirely in PHP. A simple configuration allows you to add an amazing search experience in just minutes. Features include:
We also created some demo pages that show tolerant retrieval with n-grams in action. The package has a bunch of helper functions like Jaro-Winkler and Cosine similarity for distance calculations. It supports stemming for English, Croatian, Arabic, Italian, Russian, Portuguese and Ukrainian. If the built-in stemmers aren't enough, the engine lets you easily plugin any compatible snowball stemmer. Some forks of the package even support Chinese. And please contribute other languages!
Unlike many other engines, the index can be easily updated without doing a reindex or using deltas.
The easiest way to install TNTSearch is via composer:
composer require teamtnt/tntsearch
Before you proceed, make sure your server meets the following requirements:
In order to be able to make full text search queries, you have to create an index.
Usage:
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig([
'driver' => 'mysql',
'host' => 'localhost',
'database' => 'dbname',
'username' => 'user',
'password' => 'pass',
'storage' => '/var/www/tntsearch/examples/',
'stemmer' => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class//optional
]);
$indexer = $tnt->createIndex('name.index');
$indexer->query('SELECT id, article FROM articles;');
//$indexer->setLanguage('german');
$indexer->run();
Important: "storage" settings marks the folder where all of your indexes will be saved so make sure to have permission to write to this folder otherwise you might expect the following exception thrown:
Note: If your primary key is different than id
set it like:
$indexer->setPrimaryKey('article_id');
By default, the primary key isn't searchable. If you want to make it searchable, simply run:
$indexer->includePrimaryKey();
Searching for a phrase or keyword is trivial:
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig($config);
$tnt->selectIndex("name.index");
$res = $tnt->search("This is a test search", 12);
print_r($res); //returns an array of 12 document ids that best match your query
// to display the results you need an additional query against your application database
// SELECT * FROM articles WHERE id IN $res ORDER BY FIELD(id, $res);
The ORDER BY FIELD clause is important, otherwise the database engine will not return the results in the required order.
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig($config);
$tnt->selectIndex("name.index");
//this will return all documents that have romeo in it but not juliet
$res = $tnt->searchBoolean("romeo -juliet");
//returns all documents that have romeo or hamlet in it
$res = $tnt->searchBoolean("romeo or hamlet");
//returns all documents that have either romeo AND juliet or prince AND hamlet
$res = $tnt->searchBoolean("(romeo juliet) or (prince hamlet)");
The fuzziness can be tweaked by setting the following member variables:
public $fuzzy_prefix_length = 2;
public $fuzzy_max_expansions = 50;
public $fuzzy_distance = 2; //represents the Levenshtein distance;
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig($config);
$tnt->selectIndex("name.index");
$tnt->fuzziness = true;
//when the fuzziness flag is set to true, the keyword juleit will return
//documents that match the word juliet, the default Levenshtein distance is 2
$res = $tnt->search("juleit");
Once you created an index, you don't need to reindex it each time you make some changes to your document collection. TNTSearch supports dynamic index updates.
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig($config);
$tnt->selectIndex("name.index");
$index = $tnt->getIndex();
//to insert a new document to the index
$index->insert(['id' => '11', 'title' => 'new title', 'article' => 'new article']);
//to update an existing document
$index->update(11, ['id' => '11', 'title' => 'updated title', 'article' => 'updated article']);
//to delete the document from index
$index->delete(12);
First, create your own Tokenizer class. It should extend AbstractTokenizer class, define word split $pattern value and must implement TokenizerInterface:
use TeamTNT\TNTSearch\Support\AbstractTokenizer;
use TeamTNT\TNTSearch\Support\TokenizerInterface;
class SomeTokenizer extends AbstractTokenizer implements TokenizerInterface
{
static protected $pattern = '/[\s,\.]+/';
public function tokenize($text) {
return preg_split($this->getPattern(), strtolower($text), -1, PREG_SPLIT_NO_EMPTY);
}
}
This tokenizer will split words using spaces, commas and periods.
After you have the tokenizer ready, you should pass it to TNTIndexer
via setTokenizer
method.
$someTokenizer = new SomeTokenizer;
$indexer = new TNTIndexer;
$indexer->setTokenizer($someTokenizer);
Another way would be to pass the tokenizer via config:
use TeamTNT\TNTSearch\TNTSearch;
$tnt = new TNTSearch;
$tnt->loadConfig([
'driver' => 'mysql',
'host' => 'localhost',
'database' => 'dbname',
'username' => 'user',
'password' => 'pass',
'storage' => '/var/www/tntsearch/examples/',
'stemmer' => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class//optional,
'tokenizer' => \TeamTNT\TNTSearch\Support\SomeTokenizer::class
]);
$indexer = $tnt->createIndex('name.index');
$indexer->query('SELECT id, article FROM articles;');
$indexer->run();
$candyShopIndexer = new TNTGeoIndexer;
$candyShopIndexer->loadConfig($config);
$candyShopIndexer->createIndex('candyShops.index');
$candyShopIndexer->query('SELECT id, longitude, latitude FROM candy_shops;');
$candyShopIndexer->run();
$currentLocation = [
'longitude' => 11.576124,
'latitude' => 48.137154
];
$distance = 2; //km
$candyShopIndex = new TNTGeoSearch();
$candyShopIndex->loadConfig($config);
$candyShopIndex->selectIndex('candyShops.index');
$candyShops = $candyShopIndex->findNearest($currentLocation, $distance, 10);
use TeamTNT\TNTSearch\Classifier\TNTClassifier;
$classifier = new TNTClassifier();
$classifier->learn("A great game", "Sports");
$classifier->learn("The election was over", "Not sports");
$classifier->learn("Very clean match", "Sports");
$classifier->learn("A clean but forgettable game", "Sports");
$guess = $classifier->predict("It was a close election");
var_dump($guess['label']); //returns "Not sports"
$classifier->save('sports.cls');
$classifier = new TNTClassifier();
$classifier->load('sports.cls');
If you're using TNT Search and finding it useful, take a look at our premium analytics tool:
You're free to use this package, but if it makes it to your production environment, we would highly appreciate you sending us a PS4 game of your choice. This way you support us to further develop and add new features.
Our address is: TNT Studio, Sv. Mateja 19, 10010 Zagreb, Croatia.
We'll publish all received games here
Author: Teamtnt
Source Code: https://github.com/teamtnt/tntsearch
License: MIT license
1649703660
Toshi is a three year old Shiba Inu. He is a very good boy and is the official mascot of this project. Toshi personally reviews all code before it is committed to this repository and is dedicated to only accepting the highest quality contributions from his human. He will, though, accept treats for easier code reviews.
Please note that this is far from production ready, also Toshi is still under active development, I'm just slow.
Toshi is meant to be a full-text search engine similar to Elasticsearch. Toshi strives to be to Elasticsearch what Tantivy is to Lucene.
Toshi will always target stable Rust and will try our best to never make any use of unsafe Rust. While underlying libraries may make some use of unsafe, Toshi will make a concerted effort to vet these libraries in an effort to be completely free of unsafe Rust usage. The reason I chose this was because I felt that for this to actually become an attractive option for people to consider it would have to have be safe, stable and consistent. This was why stable Rust was chosen because of the guarantees and safety it provides. I did not want to go down the rabbit hole of using nightly features to then have issues with their stability later on. Since Toshi is not meant to be a library, I'm perfectly fine with having this requirement because people who would want to use this more than likely will take it off the shelf and not modify it. My motivation was to cater to that use case when building Toshi.
At this current time Toshi should build and work fine on Windows, Mac OS X, and Linux. From dependency requirements you are going to need 1.39.0 and Cargo installed in order to build. You can get rust easily from rustup.
There is a default configuration file in config/config.toml:
host = "127.0.0.1"
port = 8080
path = "data2/"
writer_memory = 200000000
log_level = "info"
json_parsing_threads = 4
bulk_buffer_size = 10000
auto_commit_duration = 10
experimental = false
[experimental_features]
master = true
nodes = [
"127.0.0.1:8081"
]
[merge_policy]
kind = "log"
min_merge_size = 8
min_layer_size = 10_000
level_log_size = 0.75
Host
host = "localhost"
The hostname Toshi will bind to upon start.
Port
port = 8080
The port Toshi will bind to upon start.
Path
path = "data/"
The data path where Toshi will store its data and indices.
Writer Memory
writer_memory = 200000000
The amount of memory (in bytes) Toshi should allocate to commits for new documents.
Log Level
log_level = "info"
The detail level to use for Toshi's logging.
Json Parsing
json_parsing_threads = 4
When Toshi does a bulk ingest of documents it will spin up a number of threads to parse the document's json as it's received. This controls the number of threads spawned to handle this job.
Bulk Buffer
bulk_buffer_size = 10000
This will control the buffer size for parsing documents into an index. It will control the amount of memory a bulk ingest will take up by blocking when the message buffer is filled. If you want to go totally off the rails you can set this to 0 in order to make the buffer unbounded.
Auto Commit Duration
auto_commit_duration = 10
This controls how often an index will automatically commit documents if there are docs to be committed. Set this to 0 to disable this feature, but you will have to do commits yourself when you submit documents.
Merge Policy
[merge_policy]
kind = "log"
Tantivy will merge index segments according to the configuration outlined here. There are 2 options for this. "log" which is the default segment merge behavior. Log has 3 additional values to it as well. Any of these 3 values can be omitted to use Tantivy's default value. The default values are listed below.
min_merge_size = 8
min_layer_size = 10_000
level_log_size = 0.75
In addition there is the "nomerge" option, in which Tantivy will do no merging of segments.
Experimental Settings
experimental = false
[experimental_features]
master = true
nodes = [
"127.0.0.1:8081"
]
In general these settings aren't ready for usage yet as they are very unstable or flat out broken. Right now the distribution of Toshi is behind this flag, so if experimental is set to false then all these settings are ignored.
Toshi can be built using cargo build --release
. Once Toshi is built you can run ./target/release/toshi
from the top level directory to start Toshi according to the configuration in config/config.toml
You should get a startup message like this.
______ __ _ ____ __
/_ __/__ ___ / / (_) / __/__ ___ _________/ /
/ / / _ \(_-</ _ \/ / _\ \/ -_) _ `/ __/ __/ _ \
/_/ \___/___/_//_/_/ /___/\__/\_,_/_/ \__/_//_/
Such Relevance, Much Index, Many Search, Wow
INFO toshi::index > Indexes: []
You can verify Toshi is running with:
curl -X GET http://localhost:8080/
which should return:
{
"name": "Toshi Search",
"version": "0.1.1"
}
Once toshi is running it's best to check the requests.http
file in the root of this project to see some more examples of usage.
Term Query
{ "query": {"term": {"test_text": "document" } }, "limit": 10 }
Fuzzy Term Query
{ "query": {"fuzzy": {"test_text": {"value": "document", "distance": 0, "transposition": false } } }, "limit": 10 }
Phrase Query
{ "query": {"phrase": {"test_text": {"terms": ["test","document"] } } }, "limit": 10 }
Range Query
{ "query": {"range": { "test_i64": { "gte": 2012, "lte": 2015 } } }, "limit": 10 }
Regex Query
{ "query": {"regex": { "test_text": "d[ou]{1}c[k]?ument" } }, "limit": 10 }
Boolean Query
{ "query": {"bool": {"must": [ { "term": { "test_text": "document" } } ], "must_not": [ {"range": {"test_i64": { "gt": 2017 } } } ] } }, "limit": 10 }
Usage
To try any of the above queries you can use the above example
curl -X POST http://localhost:8080/test_index -H 'Content-Type: application/json' -d '{ "query": {"term": {"test_text": "document" } }, "limit": 10 }'
Also, to note, limit is optional, 10 is the default value. It's only included here for completeness.
cargo test
Download Details:
Author: toshi-search
Source Code: https://github.com/toshi-search/Toshi
License: MIT License
1633333872
The blockchain is the key to everything when it comes to cryptocurrencies and non-fungible tokens. These two platforms have submitted themselves as the two pillars of the crypto ecosystem. But the foundation of these pillars is laid by the power of blockchain. There are numerous blockchains in the crypto ecosystem, but nothing has come closer to the Binance Smart Chain in terms of blockchain fundamentals. It would elevate any crypto platform that is being built on it at a rapid rate.
#BinanceSmartChainDevelopment #BinanceSmartChain #BinanceSmartChainDevelopmentServices #BinanceSmartChainDevelopmentCompany #BinanceSmartChainDevelopmentServices #BinanceChain #etherum #synthetix #searchengine #testnet
1629369949
Just like the rest of the world, the marketing industry develops with the speed of light. Several decades ago, people couldn’t imagine that some products and services would no longer be advertised with magazines, newspapers, and billboards. Today these marketing channels are rarely used. We live in the digital epoch when traditional marketing cannot reach the necessary audience. Now, online marketing is the most effective way to promote a business, and the predictions say that the demand for it will only grow.
We know the Google Ads platform in & out and can help you expand your paid search service offerings along with helping you add a second revenue stream for your agency. Our team of Google certified experts acts as an extension to your in-house team, serving you with the required expertise and productivity.
List of Top 10 SEO Marketing Consultant Companies:
Auxano Global Services is best SEO Marketing Consultant, We offers a comprehensive array of professional search engine optimization services to get your business more visibility in search using only trustworthy, future-proof, white hat SEO techniques, Our SEO experts work closely with our clients to develop personalized SEO strategies that drive long-term profitability. By using a proven, efficient and effective methodology, we are able to create high-quality, measurable results.
2.Bamboo Apps
Bamboo Apps, we’re committed to perfection in the design and simplicity of processes that make mobile development hassle-free and let our clients focus on growing their businesses with beautiful apps. We handle every aspect of the development of native and cross-platform apps: UX/UI design, front-end, back-end programming. And apply our extensive expertise in automotive, education & e-learning, insurance, healthcare, IoT to meet the industry-specific requirements.
3. AccelOne
Our company was created by small team of seasoned professionals with almost 100 years of combined experience in the IT Industry who have come together with the common goal of solving software development's biggest challenge: delivering quality solutions on-time and on-budget. Now, with multi-national teams across the Americas we are ready to help you with your software consulting and staffing needs.
#seo #seoexperts #topseoexperts #searchengine #searchengineoptimization #hireseoexperts #topseoexpertsforhire
1629274958
Approximately 93 percent of U.S. consumers search for local businesses online and about 88 percent of SEO local mobile searches result in a store visit or phone call within 24 hours. Local SEO is the process of improving the local search visibility of small and medium-sized businesses (SMBs), brick-and-mortar stores and multiple-location businesses within a geographic area.
Need expert help to unravel what is local SEO and how to boost your local SEO ranking? The best local SEO companies can answer all your local SEO marketing questions and guide you with your local SEO optimization efforts.
List of Best 10 Local SEO companies:
Auxano Global Services is a top local SEO company based in India, that provides comprehensive local search engine optimization services to businesses worldwide. We understand that local SEO is essential to brand success. That is why we are here to assist you with your local digital marketing and local search optimization endeavours. Our local SEO experts also explain to your team ensure we are on the same page; we create a local SEO checklist that outlines each local SEO strategy included in your packages.
2.Wezom Mobile
For over 20 years we have been developing custom IT solutions for medium-sized businesses and corporations. We specialize in Logistics & Supply Chain, and also have extensive experience in Real estate and B2B eCommerce
3. Orangesoft
Orangesoft is a mobile app & web development company from Belarus. We started guiding companies into mobile and web development in 2011 and have successfully completed more than 300 projects ever since. Over the years, we have become a full-cycle software development company delivering highly productive and cost-effective app development solutions across various domains.
#seo #localseo #localseoexperts #searchengine #searchengineoptimization #toplocalseoexperts #hirelocalseoexperts #localseoexpertsforhire