Reid  Rohan

Reid Rohan

1664872640

uFuzzy: A Tiny, Efficient Fuzzy Search That Doesn't Suck

▒ μFuzzy

A tiny, efficient, fuzzy search that doesn't suck. This is my fuzzy 🐈. There are many like it, but this one is mine.


Overview

uFuzzy is a fuzzy search library designed to match a relatively short search phrase (needle) against a large list of short-to-medium phrases (haystack). It might be best described as a more forgiving String.indexOf(). Common use cases are list filtering, auto-complete/suggest, and title/name/description/filename/function searches.

In its default configuration, each uFuzzy match must contain all alpha-numeric characters from the needle in the same sequence, so is likely a poor fit for applications like spellcheck or fulltext/document search. However, its speed leaves ample headroom to match out-of-order terms by combining results from all permutations of the needle. When held just right, it can efficiently match against multiple object properties, too.


Features

  • Junk-free, high quality results that are dataset-independent. No need to fine-tune indexing options or boosting params to attain some arbitrary quality score cut-off.
  • Straightforward fuzziness control without surprises.
  • Sorting you can reason about and customize using a simple Array.sort() which gets access to each match's stats/counters. There's no composite, black box "score" to understand.
  • Concise set of options that don't interact in mysterious ways to drastically alter combined behavior.
  • Fast with low resource usage - there's no index to build, so startup is below 1ms with near-zero memory overhead. Searching a three-term phrase in a 162,000 phrase dataset takes 12ms with in-order terms or 50ms with out-of-order terms.
  • Micro, with zero dependencies - currently < 4KB min

uFuzzy demo


Demos

NOTE: The testdata.json file is a diverse 162,000 string/phrase dataset 4MB in size, so first load may be slow due to network transfer. Try refreshing once it's been cached by your browser.

First, uFuzzy in isolation to demonstrate its performance.

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy&search=super%20ma

Now the same comparison page, booted with fuzzysort, QuickScore, and Fuse.js:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,fuzzysort,QuickScore,Fuse&search=super%20ma

Here is the full library list but with a reduced dataset (just hearthstone_750, urls_and_titles_600) to avoid crashing your browser:

https://leeoniya.github.io/uFuzzy/demos/compare.html?lists=hearthstone_750,urls_and_titles_600&search=moo


Installation

Node

npm i @leeoniya/ufuzzy
const uFuzzy = require('@leeoniya/ufuzzy');

Browser

<script src="./dist/uFuzzy.iife.min.js"></script>

Example

let haystack = [
    'puzzle',
    'Super Awesome Thing (now with stuff!)',
    'FileName.js',
    '/feeding/the/catPic.jpg',
];

let needle = 'feed cat';

let opts = {};

let uf = new uFuzzy(opts);

// pre-filter
let idxs = uf.filter(haystack, needle);

// sort/rank only when <= 1,000 items
if (idxs.length <= 1e3) {
  let info = uf.info(idxs, haystack, needle);

  // order is a double-indirection array (a re-order of the passed-in idxs)
  // this allows corresponding info to be grabbed directly by idx, if needed
  let order = uf.sort(info, haystack, needle);

  // render post-filtered & ordered matches
  for (let i = 0; i < order.length; i++) {
    // using info.idx here instead of idxs because uf.info() may have
    // further reduced the initial idxs based on prefix/suffix rules
    console.log(haystack[info.idx[order[i]]]);
  }
}
else {
  // render pre-filtered but unordered matches
  for (let i = 0; i < idxs.length; i++) {
    console.log(haystack[i]);
  }
}

How It Works

uFuzzy has two operational modes which differ in matching strategy:

  • intraMode: 0 (default) requires all alpha-numeric characters in the search phrase to exist in the same sequence in all matches. For example, when searching for "cat", this mode is capable of matching the strings below. What is actually matched will depend on additonal fuzziness settings.
    • cat
    • coat
    • scratch
    • cantina
    • tractors are late
  • intraMode: 1 allows for a single error in each term of the search phrase, where an error is one of: substitution (replacement), transposition (swap), insertion (addition), or deletion (omission). The search strings with errors below can return matches containing "example". What is actually matched will depend on additonal fuzziness settings. In contrast to the previous mode, searching for "example" will never match "extra maple".
    • example - exact
    • examplle - single insertion (addition)
    • exemple - single substitution (replacement)
    • exmaple - single transposition (swap)
    • exmple - single deletion (omission)
    • xamp - partial
    • xmap - partial with transposition

There are 3 phases to a search:

  1. Filter filters the full haystack with a fast RegExp compiled from your needle without doing any extra ops. It returns an array of matched indices in original order.
  2. Info collects more detailed stats about the filtered matches, such as start offsets, fuzz level, prefix/suffix counters, etc. It also gathers substring match positions for range highlighting. Finally, it filters out any matches that don't conform to the desired prefix/suffix rules. To do all this it re-compiles the needle into two more-expensive RegExps that can partition each match. Therefore, it should be run on a reduced subset of the haystack, usually returned by the Filter phase. The uFuzzy demo is gated at <= 1,000 filtered items, before moving ahead with this phase.
  3. Sort does an Array.sort() to determine final result order, utilizing the info object returned from the previous phase. A custom sort function can be provided via a uFuzzy option: {sort: (info, haystack, needle) => idxsOrder}.

API

A liberally-commented 100 LoC uFuzzy.d.ts file.


Options

Options with an inter prefix apply to allowances in between search terms, while those with an intra prefix apply to allowances within each search term.

OptionDescriptionDefaultExamples
intraModeHow term matching should be performed00 MultiInsert
1 SingleError

See How It Works
intraInsMax number of extra chars allowed
between each char within a term
0Searching "cat"...
0 can match: cat, scat, catch, vacate
1 also matches: cart, chapter, outcast
 
interInsMax number of extra chars allowed between termsInfinitySearching "where is"...
Infinity can match: where is, where have blah wisdom
5 cannot match: where have blah wisdom
 
intraSub
intraTrn
intraDel
 
For intraMode: 1 only,
Error types to tolerate within terms
00 No
1 Yes
 
intraCharsPartial regexp for allowed insert
chars between each char within a term
[a-z\d][a-z\d] matches only alpha-numeric (case-insensitive)
[\w-] would match alpha-numeric, undercore, and hyphen
 
intraFiltCallback for excluding results based on term & match(term, match, index) => trueDo your own thing, maybe... - Length diff threshold
- Levenshtein distance
- Term offset or content
 
interCharsPartial regexp for allowed chars between terms.. matches all chars
[^a-z\d] would only match whitespace and punctuation
 
interLftDetermines allowable term left boundary0Searching "mania"...
0 any - anywhere: romanian
1 loose - whitespace, punctuation, alpha-num, case-change transitions: TrackMania, maniac
2 strict - whitespace, punctuation: maniacally
 
interRgtDetermines allowable term right boundary0Searching "mania"...
0 any - anywhere: romanian
1 loose - whitespace, punctuation, alpha-num, case-change transitions: ManiaStar
2 strict - whitespace, punctuation: mania_foo
 
sortCustom result sorting function(info, haystack, needle) => idxsOrderDefault: Search sort, prioritizes full term matches and char density
Demo: Typeahead sort, prioritizes start offset and match length
 

A biased appraisal of similar work

This assessment is extremely narrow and, of course, biased towards my use cases, text corpus, and my complete expertise in operating my own library. It is highly probable that I'm not taking full advantage of some feature in other libraries that may significantly improve outcomes along some axis; I welcome improvement PRs from anyone with deeper library knowledge than afforded by my hasty 10min skim over any "Basic usage" example and README doc.

Search quality

Can-of-worms #1.

Before we discuss performance let's talk about search quality, because speed is irrelevant when your results are a strange medly of "Oh yeah!" and "WTF?".

Search quality is very subjective. What constitutes a good top match in a "typeahead / auto-suggest" case can be a poor match in a "search / find-all" scenario. Some solutions optimize for the latter, some for the former. It's common to find knobs that skew the results in either direction, but these are often by-feel and imperfect, being little more than a proxy to producing a single, composite match "score".

Let's take a look at some matches produced by the most popular fuzzy search library, Fuse.js and some others for which match highlighting is implemented in the demo.

Searching for the partial term "twili", we see these results appearing above numerous obvious "twilight" results:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,fuzzysort,QuickScore,Fuse&search=twili

  • twirling
  • The total number of received alerts that were invalid.
  • Tom Clancy's Ghost Recon Wildlands - ASIA Pre-order Standard Uplay Activation
  • theHunter™: Call of the Wild - Bearclaw Lite CB-60

Not only are these poor matches in isolation, but they actually rank higher than literal substrings.

Finishing the search term to "twilight", still scores bizzare results higher:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,fuzzysort,QuickScore,Fuse&search=twilight

  • Magic: The Gathering - Duels of the Planeswalkers Wings of Light Unlock
  • The Wild Eight

Some engines do better with partial prefix matches, at the expense of higher startup/indexing cost:

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,FlexSearch,match-sorter,MiniSearch&search=twili

Here, match-sorter returns 1,384 results, but only the first 40 are relevant. How do we know where the cut-off is?

https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,FlexSearch,match-sorter,MiniSearch&search=super

Performance

Can-of-worms #2.

All benchmarks suck, but this one might suck more than others.

  • I've tried to follow any "best performance" advice when I could find it in each library's docs, but it's a certainty that some stones were left unturned when implementing ~20 different search engines.
  • Despite my best efforts, result quality is still extremely variable between libraries, and even between search terms. In some cases, results are very poor but the library is very fast; in other cases, the results are better, but the library is quite slow. What use is extreme speed when the search quality is sub-par? This is a subjective, nuanced topic that will surely affect how you interpret these numbers. I consider uFuzzy's search quality second-to-none, so my view of most faster libraries is typically one of quality trade-offs I'm happy not to have made. I encourage you to evaluate the results for all benched search phrases manually to decide this for yourself.
  • Many fulltext & document-search libraries compared here are designed to work best with exact terms rather than partial matches (which this benchmark is skewed towards).

Still, something is better than a hand-wavy YMMV/do-it-yourself dismissal and certainly better than nothing.

Benchmark

  • Each benchmark can be run by changing the libs parameter to the desired library name: https://leeoniya.github.io/uFuzzy/demos/compare.html?bench&libs=uFuzzy
  • Results output is suppressed in bench mode to avoid benchmarking the DOM.
  • Measurements are taken in the Performance secrion of Chrome's DevTools by recording several reloads of the bench page, with forced garbage collection in between. The middle/typical run is used to collect numbers.
  • The search corpus is 162,000 words and phrases, loaded from a 4MB testdata.json.
  • The benchmark types and then deletes, character-by-character (every 100ms) the following search terms, triggering a search for each keypress: test, chest, super ma, mania, puzz, prom rem stor, twil.

To evaluate the results for each library, or to compare several, simply visit the same page with more libs and without bench: https://leeoniya.github.io/uFuzzy/demos/compare.html?libs=uFuzzy,fuzzysort,QuickScore,Fuse&search=super%20ma.

profile example

There are several metrics evaluated:

  • Init time - how long it takes to load the library and build any required index to perform searching.
  • Bench runtime - how long it takes to execute all searches.
  • Memory required - peak JS heap size used during the bench as well as how much is still retained after a forced garbage collection at the end.
  • GC cost - how much time is needed to collect garbage at the end (main thread jank)
LibStarsSize (min)InitSearchHeap (peak)RetainedGC
uFuzzy (try)★ 04KB0.3ms630ms25.5MB7.5MB20ms
Fuse.js (try)★ 14.8k23.5KB40ms35600ms323MB14.5MB30ms
FlexSearch (Light) (try)★ 8.9k5.9KB3600ms130ms673MB316MB320ms
Lunr.js (try)★ 8.2k29.4KB2000ms800ms355MB121MB130ms
LyraSearch (try)★ 3.3k      
match-sorter (try)★ 3.1k7.3KB0.03ms10000ms79MB8MB30ms
fuzzysort (try)★ 3k5.5KB60ms1500ms176MB84MB70ms
Wade (try)★ 3k4KB840ms370ms436MB42MB100ms
fuzzysearch (try)★ 2.6k      
js-search (try)★ 2k17.1KB6500ms1350ms1937MB734MB1400ms
Elasticlunr.js (try)★ 1.9k18.1KB1220ms1770ms238MB70MB100ms
MiniSearch (try)★ 1.5k22.4KB525ms1800ms422MB64MB90ms
Fuzzyset (try)★ 1.3k2.8KB3500ms1000ms660MB238MB430ms
search-index (try)★ 1.3k      
LiquidMetal (try)★ 285      
ItemJS (try)★ 260      
FuzzySearch (try)★ 184      
FuzzySearch2 (try)★ 173      
QuickScore (try)★ 1319.1KB40ms7500ms176MB12.8MB30ms
fzy (try)★ 115      
fuzzyMatch (try)★ 0      

Download Details:

Author: leeoniya
Source Code: https://github.com/leeoniya/uFuzzy 
License: MIT license

#javascript #search #autocomplete 

uFuzzy: A Tiny, Efficient Fuzzy Search That Doesn't Suck

Display Search Page, Customize Search Content with Flutter

Features

Display search page, customize search content

Getting started

dependencies:
  show_search_page: <last version>

Usage

implementation SearchPage, as showSearchPagethe second parameter of the top-level method

showSearchPage(context, SearchPageWidget()); 

Use this package as a library

Depend on it

Run this command:

With Flutter:

 $ flutter pub add show_search_page

This will add a line like this to your package's pubspec.yaml (and run an implicit flutter pub get):

dependencies:
  show_search_page: ^0.0.2

Alternatively, your editor might support flutter pub get. Check the docs for your editor to learn more.

Import it

Now in your Dart code, you can use:

import 'package:show_search_page/show_search_page.dart'; 

Download Details:

Author: qiangjindong

Source Code: https://github.com/qiangjindong/show_search_page

#flutter #android #search 

Display Search Page, Customize Search Content with Flutter

A Flutter Package That Has an Animated Search Bar

anim_search_app_bar

An animated SearchAppBar Widget, to be used with Flutter.

Website example >> https://animsearchappbar.mustafaturkmen.dev

demo

Usage

It has a very easy to use.

add this line to pubspec.yaml


   dependencies:
     anim_search_app_bar: ^1.0.3

import package


   import 'package:anim_search_app_bar/anim_search_app_bar.dart';

const AnimSearchAppBar(
  cancelButtonText: "Cancel",
  hintText: 'Search',
),

Complete Example

class SearchScreen extends StatelessWidget {
  final TextEditingController searchController = TextEditingController();
  
  SearchScreen({Key? key}) : super(key: key);

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      body: Column(
        children: [
          AnimSearchAppBar(
            cancelButtonText: "Cancel",
            hintText: 'Search for Neighborhood, Street or avenue',
            cSearch: searchController,
            appBar: const PBAppBar(
              title: 'Search',
            ),
          ),
          const Expanded(child: SearchView()),
        ],
      ),
    );
  }
}

Use this package as a library

Depend on it

Run this command:

With Flutter:

 $ flutter pub add anim_search_app_bar

This will add a line like this to your package's pubspec.yaml (and run an implicit flutter pub get):

dependencies:
  anim_search_app_bar: ^1.0.3

Alternatively, your editor might support flutter pub get. Check the docs for your editor to learn more.

Import it

Now in your Dart code, you can use:

import 'package:anim_search_app_bar/anim_search_app_bar.dart'; 

example/lib/main.dart

import 'package:anim_search_app_bar/anim_search_app_bar.dart';
import 'package:flutter/material.dart';

void main() {
  runApp(const MyApp());
}

class MyApp extends StatelessWidget {
  const MyApp({Key? key}) : super(key: key);

  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      title: 'Flutter Demo',
      theme: ThemeData(
        primarySwatch: Colors.blue,
      ),
      home: const MyHomePage(title: 'Flutter Demo Home Page'),
    );
  }
}

class MyHomePage extends StatefulWidget {
  const MyHomePage({Key? key, required this.title}) : super(key: key);

  final String title;

  @override
  State<MyHomePage> createState() => _MyHomePageState();
}

class _MyHomePageState extends State<MyHomePage> {
  int _counter = 0;

  void _incrementCounter() {
    setState(() {
      _counter++;
    });
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      body: Column(
        children: [
          const AnimSearchAppBar(
            cancelButtonText: "Cancel",
            hintText: 'Search',
          ),
          Expanded(
            child: Column(
              mainAxisAlignment: MainAxisAlignment.center,
              children: <Widget>[
                const Text(
                  'You have pushed the button this many times:',
                ),
                Text(
                  '$_counter',
                  style: Theme.of(context).textTheme.headline4,
                ),
              ],
            ),
          ),
        ],
      ),
      floatingActionButton: FloatingActionButton(
        onPressed: _incrementCounter,
        tooltip: 'Increment',
        child: const Icon(Icons.add),
      ),
    );
  }
} 

Download Details:

Author: Mto292

Source Code: https://github.com/Mto292/anim_search_app_bar

#flutter #search  #bar 

A Flutter Package That Has an Animated Search Bar
Nigel  Uys

Nigel Uys

1663227420

6 Favorite Libraries for Search and Analytic Databases in Go

In today's post we will learn about 6 Favorite Libraries for Search and Analytic Databases in Go. 

What is an Analytical Database?

Analytical database software specializes in big data management for business applications and services. Analytical databases are optimized to provide quick query response times and advanced analytics. They are also more scalable than traditional databases and often times are columnar databases that can efficiently write and read data to and from hard disk storage in order to speed up the time it takes to return a query. Analytical database features include column-based storage, in-memory loading of compressed data and the ability to search data through multiple attributes.

Table of contents:

  • Elastic - Elasticsearch client for Go.
  • Elasticsql - Convert sql to elasticsearch dsl in Go.
  • Elastigo - Elasticsearch client library.
  • Go-elasticsearch - Official Elasticsearch client for Go.
  • Goes - Library to interact with Elasticsearch.
  • Skizze - probabilistic data-structures service and storage.

1 - Elastic: Elasticsearch client for Go.

Elastic is an Elasticsearch client for the Go programming language.

Releases

The release branches (e.g. release-branch.v7) are actively being worked on and can break at any time. If you want to use stable versions of Elastic, please use Go modules.

Here's the version matrix:

Elasticsearch versionElastic versionPackage URLRemarks
7.x                  7.0            github.com/olivere/elastic/v7 (source doc)Use Go modules.
6.x                  6.0            github.com/olivere/elastic (source doc)Use a dependency manager (see below).
5.x5.0gopkg.in/olivere/elastic.v5 (source doc)Actively maintained.
2.x3.0gopkg.in/olivere/elastic.v3 (source doc)Deprecated. Please update.
1.x2.0gopkg.in/olivere/elastic.v2 (source doc)Deprecated. Please update.
0.9-1.31.0gopkg.in/olivere/elastic.v1 (source doc)Deprecated. Please update.

Example:

You have installed Elasticsearch 7.0.0 and want to use Elastic. As listed above, you should use Elastic 7.0 (code is in release-branch.v7).

To use the required version of Elastic in your application, you should use Go modules to manage dependencies. Make sure to use a version such as 7.0.0 or later.

To use Elastic, import:

import "github.com/olivere/elastic/v7"

Elastic 7.0

Elastic 7.0 targets Elasticsearch 7.x which was released on April 10th 2019.

As always with major version, there are a lot of breaking changes. We will use this as an opportunity to clean up and refactor Elastic, as we already did in earlier (major) releases.

Elastic 6.0

Elastic 6.0 targets Elasticsearch 6.x which was released on 14th November 2017.

Notice that there are a lot of breaking changes in Elasticsearch 6.0 and we used this as an opportunity to clean up and refactor Elastic as we did in the transition from earlier versions of Elastic.

View on Github

2 - Elasticsql: Convert sql to elasticsearch dsl in Go.

Overview

This tool converts sql to elasticsearch dsl

Currently support:

  •  sql and expression
  •  sql or expression
  •  equal(=) support
  •  not equal(!=) support
  •  gt(>) support
  •  gte(>=) support
  •  lt(<) support
  •  lte(<=) support
  •  sql in (eg. id in (1,2,3) ) expression
  •  sql not in (eg. id not in (1,2,3) ) expression
  •  paren bool support (eg. where (a=1 or b=1) and (c=1 or d=1))
  •  sql like expression (currently use match phrase, perhaps will change to wildcard in the future)
  •  sql order by support
  •  sql limit support
  •  sql not like expression
  •  field missing check
  •  support aggregation like count(*), count(field), min(field), max(field), avg(field)
  •  support aggregation like stats(field), extended_stats(field), percentiles(field) which are not standard sql function
  •  null check expression(is null/is not null)
  •  join expression
  •  having support

Usage

go get -u github.com/cch123/elasticsql

Demo :

package main

import (
    "fmt"

    "github.com/cch123/elasticsql"
)

var sql = `
select * from aaa
where a=1 and x = '三个男人'
and create_time between '2015-01-01T00:00:00+0800' and '2016-01-01T00:00:00+0800'
and process_id > 1 order by id desc limit 100,10
`

func main() {
    dsl, esType, _ := elasticsql.Convert(sql)
    fmt.Println(dsl)
    fmt.Println(esType)
}

will produce :

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "a": {
                            "query": "1",
                            "type": "phrase"
                        }
                    }
                },
                {
                    "match": {
                        "x": {
                            "query": "三个男人",
                            "type": "phrase"
                        }
                    }
                },
                {
                    "range": {
                        "create_time": {
                            "from": "2015-01-01T00:00:00+0800",
                            "to": "2016-01-01T00:00:00+0800"
                        }
                    }
                },
                {
                    "range": {
                        "process_id": {
                            "gt": "1"
                        }
                    }
                }
            ]
        }
    },
    "from": 100,
    "size": 10,
    "sort": [
        {
            "id": "desc"
        }
    ]
}

aaa

If your sql contains some keywords, eg. order, timestamp, don't forget to escape these fields as follows:

select * from `order` where `timestamp` = 1 and `desc`.id > 0

View on Github

3 - Elastigo: Elasticsearch client library.

A Go (Golang) based Elasticsearch client, implements core api for Indexing and searching.

NOTE: Based on the great work from Jeremy Shute, Elastigo now supports multiple connections. We attempted to make this backwards compatible, however in the end it wasn't possible, so we tagged the older single connection code as v1.0 and started work on v2.0.

If you want to use v1.0, you can use a tool like GoDep to make that possible. See http://bit.ly/VLG2et for full details.

The godep tool saves the exact version of the dependencies you’re building your project against, which means that upstream modifications in third-party dependencies won’t break your build.

go get github.com/tools/godep

Now, to pull in an existing project with godep:

godep go get github.com/myuser/myproject

When your code compiles in your workspace, ala:

cd $HOME/gopath/src/github.com/myuser/myproject
# hack hack hack
go build ./...

You can freeze your dependencies thusly:

godep save github.com/myuser/myproject
git add Godeps

The godep tool will examine your code to find and save the transitive closure of your dependencies in the current directory, observing their versions. If you want to restore or update these versions, see the documentation for the tool.

Note, in particular, that if your current directory contains a group of binaries or packages, you may save all of them at once:

godep save ./...

To get the Chef based Vagrantfile working, be sure to pull like so::

# This will pull submodules.
git clone --recursive git@github.com:mattbaird/elastigo.git

It's easier to use the ElasticSearch provided Docker image found here: https://github.com/dockerfile/elasticsearch

Non-persistent usage is:

docker run -d -p 9200:9200 -p 9300:9300 dockerfile/elasticsearch

Quick Start with Docker

Make sure docker is installed. If you are running docker on a mac, you must expose ports 9200 and 9300. Shut down docker:

boot2docker stop

and run

for i in {9200..9300}; do
 VBoxManage modifyvm "boot2docker-vm" --natpf1 "tcp-port$i,tcp,,$i,,$i";
 VBoxManage modifyvm "boot2docker-vm" --natpf1 "udp-port$i,udp,,$i,,$i";
done

The following will allow you to get the code, and run the tests against your docker based non-persistent elasticsearch:

docker run -d -p 9200:9200 -p 9300:9300 dockerfile/elasticsearch
git clone git@github.com:mattbaird/elastigo.git
cd elastigo
go get -u ./...
cd lib
go test -v -host localhost -loaddata
cd ..
go test -v ./...

View on Github

4 - Go-elasticsearch: Official Elasticsearch client for Go.

Compatibility

Language clients are forward compatible; meaning that clients support communicating with greater or equal minor versions of Elasticsearch. Elasticsearch language clients are only backwards compatible with default distributions and without guarantees made.

When using Go modules, include the version in the import path, and specify either an explicit version or a branch:

require github.com/elastic/go-elasticsearch/v8 v8.0.0
require github.com/elastic/go-elasticsearch/v7 7.17

It's possible to use multiple versions of the client in a single project:

// go.mod
github.com/elastic/go-elasticsearch/v7 v7.17.0
github.com/elastic/go-elasticsearch/v8 v8.0.0

// main.go
import (
  elasticsearch7 "github.com/elastic/go-elasticsearch/v7"
  elasticsearch8 "github.com/elastic/go-elasticsearch/v8"
)
// ...
es7, _ := elasticsearch7.NewDefaultClient()
es8, _ := elasticsearch8.NewDefaultClient()

The main branch of the client is compatible with the current master branch of Elasticsearch.

Installation

Add the package to your go.mod file:

require github.com/elastic/go-elasticsearch/v8 main

Or, clone the repository:

git clone --branch main https://github.com/elastic/go-elasticsearch.git $GOPATH/src/github.com/elastic/go-elasticsearch

A complete example:

mkdir my-elasticsearch-app && cd my-elasticsearch-app

cat > go.mod <<-END
  module my-elasticsearch-app

  require github.com/elastic/go-elasticsearch/v8 main
END

cat > main.go <<-END
  package main

  import (
    "log"

    "github.com/elastic/go-elasticsearch/v8"
  )

  func main() {
    es, _ := elasticsearch.NewDefaultClient()
    log.Println(elasticsearch.Version)
    log.Println(es.Info())
  }
END

go run main.go

View on Github

5 - Goes: Library to interact with Elasticsearch.

Goes : a library to interact with ElasticSearch

Supported operations

  • index creation
  • index removal
  • simple indexing (document)
  • bulk indexing
  • search
  • get

Example

You will find examples in example_test.go

View on Github

6 - Skizze: Probabilistic data-structures service and storage.

Skizze ([ˈskɪt͡sə]: german for sketch) is a sketch data store to deal with all problems around counting and sketching using probabilistic data-structures.

Unlike a Key-Value store, Skizze does not store values, but rather appends values to defined sketches, allowing one to solve frequency and cardinality queries in near O(1) time, with minimal memory footprint.

Current status ==> Alpha (tagged v0.0.2)

Example use cases (queries)

  • How many distinct elements are in the data set (i.e. what is the cardinality of the data set)?
  • What are the most frequent elements (the terms “heavy hitters” and “top-k elements” are also used)?
  • What are the frequencies of the most frequent elements?
  • How many elements belong to the specified range (range query, in SQL it looks like SELECT count(v) WHERE v >= c1 AND v < c2)?
  • Does the data set contain a particular element (membership query)?

How to build and run

make dist
./bin/skizze

Bindings

Two bindings are currently available:

Go

Node.js

Example usage:

Skizze comes with a CLI to help test and explore the server. It can be run via

./bin/skizze-cli

Commands

Create a new Domain (Collection of Sketches):

#CREATE DOM $name $estCardinality $topk
CREATE DOM demostream 10000000 100

Add values to the domain:

#ADD DOM $name $value1, $value2 ....
ADD DOM demostream zod joker grod zod zod grod

Get the cardinality of the domain:

# GET CARD $name
GET CARD demostream

# returns:
# Cardinality: 9

Get the rankings of the domain:

# GET RANK $name
GET RANK demostream

# returns:
# Rank: 1	  Value: zod	  Hits: 3
# Rank: 2	  Value: grod	  Hits: 2
# Rank: 3	  Value: joker	  Hits: 1

View on Github

Thank you for following this article.

#go #golang #search #analytics #database 

6 Favorite Libraries for Search and Analytic Databases in Go
Oral  Brekke

Oral Brekke

1662631680

7 Popular Node.js Google Search API Libraries

In today's post we will learn about 7 popular Node.js Google Search API Libraries.

This NodeJS module is designed to scrape and parse Google, Bing and Baidu results using SerpApi. This Ruby Gem is meant to scrape and parse Google results using SerpApi. 

1 - Google-it

Command line Google search and save to JSON

Install

$ npm install --save -g google-it

Example Usage

$ google-it --query="Latvian unicorn"

GIF of google-it

View on Github

2 - Google-search-results-nodejs

Google Search Results Node.JS

Installation

NPM 7+

$ npm install google-search-results-nodejs

Quick start

const SerpApi = require('google-search-results-nodejs')
const search = new SerpApi.GoogleSearch("Your Private Key")
search.json({
 q: "Coffee", 
 location: "Austin, TX"
}, (result) => {
  console.log(result)
})

View on Github

3 - Google-search

Execute a google search through it's api

Install

npm install google-search

Usage

var GoogleSearch = require('google-search');
var googleSearch = new GoogleSearch({
  key: 'YOUR_API_KEY',
  cx: 'YOUR_CX'
});


googleSearch.build({
  q: "",
  start: 5,
  fileType: "pdf",
  gl: "tr", //geolocation,
  lr: "lang_tr",
  num: 10, // Number of search results to return between 1 and 10, inclusive
  siteSearch: "http://kitaplar.ankara.edu.tr/" // Restricts results to URLs from a specified site
}, function(error, response) {
  console.log(response);
});

View on Github

4 - Google-search-results-nodejs

Scrape and parse Google search results in Node.JS

Installation

You can install google-search-results-serpwow with:

$ npm install google-search-results-serpwow

and update with:

$ npm update google-search-results-serpwow

Simple Example

Simplest example for a standard query "pizza", returning the Google SERP (Search Engine Results Page) data as JSON.

var SerpWow = require('google-search-results-serpwow')

// create the serpwow object, passing in our API key
let serpwow = new SerpWow('API_KEY')

// #1. example using promises & async/await
async function getResult() {

  let result = await serpwow.json({
    q: 'pizza'
  });
  
  // pretty-print the result
  console.log(JSON.stringify(result, 0, 2));

}
getResult();


// #2. example using callbacks
serpwow.json(
  {
    q: 'pizza'
  })
  .then(result => {
    // pretty-print the result
    console.log(JSON.stringify(result, 0, 2));
  })
  .catch(error => {
    // print the error
    console.log(error);
  });

View on Github

5 - Node-reverse-image-search

A free solution for reverse image search using Google

Example:

const reverseImageSearch = require('node-reverse-image-search')

const doSomething = (results) => {
  console.log(results)
}

reverseImageSearch('i.ebayimg.com/00/s/OTAwWDkwMA==/z/3G8AAOSwzoxd80XB/$_83.JPG', doSomething)

View on Github

6 - Node-google-search-trends

Node.js module to fetch localized Google trending searches

Usage

As always, install using:

npm install node-google-search-trends [--save]

The module comes with one exposed function. It takes three parameters - localization, count and callback. Example usage:

var trends = require('node-google-search-trends');
trends('Singapore', 10, function(err, data) {
    if (err) return console.err(err);
    console.log(JSON.stringify(data, null, 2));  // Pretty prints JSON 'data'
});

View on Github

7 - Google-trends

Scrap recent trend words on Google for Node.js

Usage

var trends = require('google-trends')

trends.load(['kr'], function (err, result) {
  console.log(err, JSON.stringify(result))
})
// output
{
  "kr": [
    {
      "title": "지진",
      "link": "http://www.google.co.kr/trends/hottrends?pn=p23#a=20151222-%EC%A7%80%EC%A7%84",
      "ctime": 1450728000,
      "news": {
        "picture": {
          "url": "//t0.gstatic.com/images?q=tbn:ANd9GcTEI1l0ltniQq9PVbDe_u3oHxAk2QHoRM9h54L-FB7USd14CqkjrRSZVQ28fIbNdtNlaEj8DCo",
          "source": "연합뉴스"
        },
        "items": [
          {
            "title": "전북 익산 규모 3.5 <b>지진</b>…서울·부산서도 감지(종합2보)",
            "snippet": "(익산=연합뉴스) 김진방 기자 = 22일 오전 4시30분께 전북 익산 북쪽 8㎞ 지점에서 규모 3.5의 <b>지진이</b> 발생했다고 전주기상지청이 밝혔다. 이번에 발생한 <b>지진은</b> 지난 8월 3일 제주 서귀포시 성산 남동쪽 22㎞ 해역에서 발생한 규모 3.7의 <b>지진에</b> 이어 올 들어&nbsp;...",
            "url": "http://www.yonhapnews.co.kr/bulletin/2015/12/22/0200000000AKR20151222009300055.HTML",
            "source": "연합뉴스"
          },
          {
            "title": "익산서 내륙 최대 규모 <b>지진</b>, 서울서도 싱크대 흔들렸다",
            "snippet": "전북 익산에서 올들어 두번째로 규모가 큰 <b>지진이</b> 발생했다. 내륙에서는 가장 큰 규모의 <b>지진</b>이었다. <b>지진</b>여파는 서울과 강원 등지에까지 전달됐다. 새벽 단잠을 깬 일부 시민들은 휴대폰 메시지 등 SNS를 통해 <b>지진을</b> 알렸고 지인들의 안부를 물었다. 기상청은&nbsp;...",
            "url": "http://news.khan.co.kr/kh_news/khan_art_view.html?artid=201512220836181&code=940100",
            "source": "경향신문"
          }
        ]
      }
    }
    // ...
  ]
}

View on Github

Thank you for following this article. 

#node #google #search 

7 Popular Node.js Google Search API Libraries
Nat  Grady

Nat Grady

1661091840

RStudio Addin to Search CRAN Packages Titles and Descriptions

CRANsearcher

RStudio addin to search CRAN packages titles and descriptions

About

One of the strengths of R is its vast package ecosystem. Indeed, R packages extend from visualization to bayesian inference and from spatial analyses to pharmacokinetics (https://cran.r-project.org/web/views/). There is probably not an area of quantitative research that isn't represented by at least one R package. At the time of this writing, there are more than 10,000 active CRAN packages. Because of this massive ecosystem, it is important to have tools to search and learn about packages related to your personal R needs. For this reason, we developed an RStudio addin capable of searching available CRAN packages directly within RStudio.

Installation

devtools::install_github("RhoInc/CRANsearcher")

Use

After installation, the add-in will be available in your Rstudio add-in dropdown menu. Simply select "CRANsearcher" from the menu to run the application.

CRANsearcher_addin.gif

Inspiration

We must acknowledge a few great resources that helped inspire ideas behind this add-in:

Download Details:

Author: RhoInc
Source Code: https://github.com/RhoInc/CRANsearcher 
License: MIT license

#r #rstudio #search 

RStudio Addin to Search CRAN Packages Titles and Descriptions

PkgSearch: A Julia REPL Utility for Package Discovery

PkgSearch

A Julia utility for package discovery

PkgSearch is a small REPL tool that helps Julia developers find and install packages. It knows about both the official and the unofficial packages available on GitHub. It searches for your keywords within both the package name and, most importantly, within the package's README file.

The search results are provided by a RESTful API so the search will be fast and no data will be stored locally. The APIs data is refreshed daily.

PkgSearch is not yet an official package, so install using:

Pkg.clone("https://github.com/essenciary/PkgSearch")

It uses a few more packages, so you might need to install these too, first:

Requests
Sparklines
MetadataTools
Memoize

Searching for packages

In order to search for packages, the lookup(keyword1::AbstractString, keyword2::AbstractString, ...) function is available. As this is designed as a REPL tool, the search results will be displayed in a human readable format. However, the function will return an array of results, that can be used programatically.

using PkgSearch

julia> PkgSearch.lookup("web")

=====================================================
Escher.jl
-----------------------------------------------------
Unofficial package
-----------------------------------------------------
git://github.com/shashi/Escher.jl.git
-----------------------------------------------------
  web server for 2016.** Escher's built-in web server allows you to create interactive
_____________________________________________________


=====================================================
docker-scrapy-crawler
-----------------------------------------------------
Unofficial package
-----------------------------------------------------
git://github.com/iammai/docker-scrapy-crawler.git
-----------------------------------------------------
  web browser for the web ui for monitoring * Go to http://192.168.59.103:6800/ * Go to http://192.168.59.103:6800/jobs
_____________________________________________________


=====================================================
GoogleCharts
-----------------------------------------------------
Official package
-----------------------------------------------------
git://github.com/jverzani/GoogleCharts.jl.git
-----------------------------------------------------
  web page describes this. We don't have a mechanism in place supporting Google's "Column
_____________________________________________________

[ ... output omitted ... ]

10-element Array{Any,1}:
 Dict{AbstractString,Any}("search"=>Dict{AbstractString,Any}("rank"=>0.0951392,"headline"=>"**web** server for 2016.** Escher's built-in **web** server allows you to create interactive"),"links"=>Dict{AbstractString,Any}("self"=>"/api/v1/packages/466"),"attributes"=>Dict{AbstractString,Any}("name"=>"Escher.jl","url"=>"git://github.com/shashi/Escher.jl.git"),"id"=>466,"type"=>"packages")
 Dict{AbstractString,Any}("search"=>Dict{AbstractString,Any}("rank"=>0.094717,"headline"=>"**web** browser for the **web** ui for monitoring\n   * Go to http://192.168.59.103:6800/\n   * Go to http://192.168.59.103:6800/jobs"),"links"=>Dict{AbstractString,Any}("self"=>"/api/v1/packages/1317"),"attributes"=>Dict{AbstractString,Any}("name"=>"docker-scrapy-crawler","url"=>"git://github.com/iammai/docker-scrapy-crawler.git"),"id"=>1317,"type"=>"packages")
 Dict{AbstractString,Any}("search"=>Dict{AbstractString,Any}("rank"=>0.0889769,"headline"=>"**web** page describes this. We don't have a mechanism in\nplace supporting Google's \"Column"),"links"=>Dict{AbstractString,Any}("self"=>"/api/v1/packages/289"),"attributes"=>Dict{AbstractString,Any}("name"=>"GoogleCharts","url"=>"git://github.com/jverzani/GoogleCharts.jl.git"),"id"=>289,"type"=>"packages")
 Dict{AbstractString,Any}("search"=>Dict{AbstractString,Any}("rank"=>0.0827456,"headline"=>"**web** pages from Julia. Pages may be served over the internet and controlled from the browser"),"links"=>Dict{AbstractString,Any}("self"=>"/api/v1/packages/240"),"attributes"=>Dict{AbstractString,Any}("name"=>"Blink.jl","url"=>"git://github.com/JunoLab/Blink.jl.git"),"id"=>240,"type"=>"packages")
 Dict{AbstractString,Any}("search"=>Dict{AbstractString,Any}("rank"=>0.0827456,"headline"=>"**web** browser.  Click\non the *New* button and choose the *Julia* option to start a new\n\"notebook"),"links"=>Dict{AbstractString,Any}("self"=>"/api/v1/packages/288"),"attributes"=>Dict{AbstractString,Any}("name"=>"IJulia.jl","url"=>"git://github.com/JuliaLang/IJulia.jl.git"),"id"=>288,"type"=>"packages")
 Dict{AbstractString,Any}("search"=>Dict{AbstractString,Any}("rank"=>0.0827456,"headline"=>"**web** content, GTK+ 3 Port\n11. libwebkitgtk3-devel (mingw32) - Library for rendering **web** content, GTK+ 3 Port"),"links"=>Dict{AbstractString,Any}("self"=>"/api/v1/packages/939"),"attributes"=>Dict{AbstractString,Any}("name"=>"WinRPM.jl","url"=>"git://github.com/JuliaLang/WinRPM.jl.git"),"id"=>939,"type"=>"packages")
 Dict{AbstractString,Any}("search"=>Dict{AbstractString,Any}("rank"=>0.0827456,"headline"=>"**Web**)\"\n\t  \"Idrisi\"     => \"Idrisi Vector (.vct)\"\n\t  \"WFS\"        => \"OGC WFS (**Web** Feature Service)\"\n\t  \"WMS\"        => \"OGC **Web** Map Service"),"links"=>Dict{AbstractString,Any}("self"=>"/api/v1/packages/1058"),"attributes"=>Dict{AbstractString,Any}("name"=>"RasterIO.jl","url"=>"git://github.com/wkearn/RasterIO.jl.git"),"id"=>1058,"type"=>"packages")
 Dict{AbstractString,Any}("search"=>Dict{AbstractString,Any}("rank"=>0.0759909,"headline"=>"**web** applications in Julia as easy as possible.\n\nIf you have looked at the examples"),"links"=>Dict{AbstractString,Any}("self"=>"/api/v1/packages/492"),"attributes"=>Dict{AbstractString,Any}("name"=>"Pages","url"=>"git://github.com/EricForgy/Pages.jl.git"),"id"=>492,"type"=>"packages")
 Dict{AbstractString,Any}("search"=>Dict{AbstractString,Any}("rank"=>0.0759909,"headline"=>"**web** services some closure. Mux allows you to\ndefine servers in terms of highly modular"),"links"=>Dict{AbstractString,Any}("self"=>"/api/v1/packages/588"),"attributes"=>Dict{AbstractString,Any}("name"=>"Mux.jl","url"=>"git://github.com/JuliaWeb/Mux.jl.git"),"id"=>588,"type"=>"packages")
 Dict{AbstractString,Any}("search"=>Dict{AbstractString,Any}("rank"=>0.0759909,"headline"=>"**Web** Services\n\nThis package provides a native Julia interface to the Amazon **Web** Services API\n\nThe following"),"links"=>Dict{AbstractString,Any}("self"=>"/api/v1/packages/644"),"attributes"=>Dict{AbstractString,Any}("name"=>"AWS","url"=>"git://github.com/JuliaParallel/AWS.jl.git"),"id"=>644,"type"=>"packages")

Getting more results

The results are paginated, with 10 packages per page. To navigate between the results, use the next() and prev() functions. They both take an optional ::Int argument indicating how many pages to jump (the default is 1).

julia> PkgSearch.next()

=====================================================
TextMining.jl
-----------------------------------------------------
Unofficial package
-----------------------------------------------------
git://github.com/SLU-TMI/TextMining.jl.git
-----------------------------------------------------
  Julia**** with the goal of making them fast, generic, and easily usable in **Julia's REPL
_____________________________________________________

[ ... output omitted ... ]

Checking packages details

Once you found a package that looks interesting, you can find out more about it using details(package_name::AbstractString). Similar to lookup(), the results are displayed in human readable format and a Dict is returned for using in scripts.

julia> PkgSearch.details("Mocking")

=====================================================
Mocking
-----------------------------------------------------
git://github.com/invenia/Mocking.jl.git
-----------------------------------------------------
Official package
-----------------------------------------------------
Weekly activity:
▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▇▁▃▁▁▁▁▄█▁▁▁▁▁▂▁▁▁▁▂▁▁▁▁▁▁▁▁▁▂▂▁▁▁▁▁
-----------------------------------------------------
     Mocking
    ≡≡≡≡≡≡≡≡≡

  (Image: Build Status) (Image: Build Status) (Image: codecov.io)

  Allows Julia functions to be temporarily modified for testing purposes.

     Usage
    =======

  Using the mend function provides a way to temporarily overwrite a specific method.
  
[ ... output omitted ... ]

Dict{AbstractString,Any} with 4 entries:
  "name"          => "Mocking"
  "participation" => Any[0,0,0,0,0,0,0,0,0,0  …  0,0,0,2,3,0,0,0,0,0]
  "readme"        => "# Mocking\n\n[![Build Status](https://travis-ci.org/invenia/Mocking.jl.svg?branch=master)](https://travis-ci.org/invenia/Mocking.jl)\n[![Build Status](https://ci.appveyor.com/api/projects/status/la041r86v6p5k24x?svg=true)](https://ci.appveyor.com/project/omus/mocking-jl)\n[![codecov.io](http://…
  "url"           => "git://github.com/invenia/Mocking.jl.git"

Installing packages

PkgSearch informs you whether or not a package is official. Using this info you can either Pkg.add() or Pkg.clone(). Or you can just run PkgSearch.install(package_name::AbstractString) and PkgSearch will do it for you.

Download Details:

Author: essenciary
Source Code: https://github.com/essenciary/PkgSearch.jl 
License: MIT license

#julia #search 

PkgSearch: A Julia REPL Utility for Package Discovery
Nat  Grady

Nat Grady

1660864500

Pkgsearch: Search R Packages on CRAN

Search and Query CRAN R Packages    

pkgsearch uses R-hub web services that munge CRAN metadata and let you access it through several lenses.

Installation

Install the latest pkgsearch release from CRAN:

install.packages("pkgsearch")

Usage

Search relevant packages

Do you need to find packages solving a particular problem, e.g. “permutation test”?

library("pkgsearch")
library("pillar") # nicer data frame printing
pkg_search("permutation test")
#> - "permutation test" ------------------------------------ 2236 packages in 0.01 seconds -
#>   #     package        version by                      @ title                           
#>   1 100 coin           1.4.2   Torsten Hothorn        4M Conditional Inference Procedu...
#>   2  31 perm           1.0.0.2 Michael P. Fay         4M Exact or Asymptotic Permutati...
#>   3  30 exactRankTests 0.8.34  Torsten Hothorn        4M Exact Distributions for Rank ...
#>   4  29 flip           2.5.0   Livio Finos            3y Multivariate Permutation Tests  
#>   5  22 jmuOutlier     2.2     Steven T. Garren       3y Permutation Tests for Nonpara...
#>   6  19 wPerm          1.0.1   Neil A. Weiss          6y Permutation Tests               
#>   7  16 cpt            1.0.2   Johann Gagnon-Bartsch  3y Classification Permutation Test 
#>   8  16 GlobalDeviance 0.4     Frederike Fuhlbrueck   8y Global Deviance Permutation T...
#>   9  16 permutes       2.3.2   Cesko C. Voeten        2M Permutation Tests for Time Se...
#>  10  16 AUtests        0.99    Arjun Sondhi           1y Approximate Unconditional and...

pkgsearch uses an R-hub web service and a careful ranking that puts popular packages before less frequently used ones.

Do it all clicking

For the search mentioned above, and other points of entry to CRAN metadata, you can use pkgsearch RStudio add-in!

Addin
screencast

Select the “CRAN package search” addin from the menu, or start it with pkg_search_addin().

Get package metadata

Do you want to find the dependencies the first versions of testthat had and when each of these versions was released?

cran_package_history("testthat")
#> # A data frame: 35 × 29
#>    Package  Type    Title     Version Author Maintainer Description URL   License LazyData
#>  * <chr>    <chr>   <chr>     <chr>   <chr>  <chr>      <chr>       <chr> <chr>   <chr>   
#>  1 testthat Package Tools fo… 0.1     Hadle… Hadley Wi… Test_that … http… GPL     true    
#>  2 testthat Package Testthat… 0.1.1   Hadle… Hadley Wi… A testing … http… GPL     true    
#>  3 testthat Package Testthat… 0.2     Hadle… Hadley Wi… A testing … http… GPL     true    
#>  4 testthat Package Testthat… 0.3     Hadle… Hadley Wi… A testing … http… GPL     true    
#>  5 testthat Package Testthat… 0.4     Hadle… Hadley Wi… A testing … http… GPL     true    
#>  6 testthat Package Testthat… 0.5     Hadle… Hadley Wi… A testing … http… GPL     true    
#>  7 testthat Package Testthat… 0.6     Hadle… Hadley Wi… A testing … http… GPL     true    
#>  8 testthat Package Testthat… 0.7     Hadle… Hadley Wi… A testing … http… GPL     true    
#>  9 testthat Package Testthat… 0.7.1   Hadle… Hadley Wi… A testing … http… GPL     true    
#> 10 testthat Package Testthat… 0.8     Hadle… Hadley Wi… A testing … http… MIT + … true    
#> # … with 25 more rows, and 19 more variables: Collate <chr>, Packaged <chr>,
#> #   Repository <chr>, `Date/Publication` <chr>, crandb_file_date <chr>, date <chr>,
#> #   dependencies <list>, NeedsCompilation <chr>, Roxygen <chr>, `Authors@R` <chr>,
#> #   BugReports <chr>, RoxygenNote <chr>, VignetteBuilder <chr>, Encoding <chr>,
#> #   MD5sum <chr>, `Config/testthat/edition` <chr>, `Config/testthat/parallel` <chr>,
#> #   `Config/testthat/start-first` <chr>, `Config/Needs/website` <chr>

Discover packages

Do you want to know what packages are trending on CRAN these days? pkgsearch can help!

cran_trending()
#> # A data frame: 100 × 2
#>    package          score                
#>    <chr>            <chr>                
#>  1 spatstat.random  3084.3177189409368600
#>  2 fstcore          2818.8976377952755900
#>  3 collapse         1743.6553713049747700
#>  4 r5r              916.4171779141104300 
#>  5 RcppTOML         614.1354563348462200 
#>  6 maditr           415.9413882397524200 
#>  7 reproj           399.2964304190377700 
#>  8 RGoogleAnalytics 382.0643996388805300 
#>  9 crsmeta          371.6803643925141100 
#> 10 gsalib           352.6695776231031900 
#> # … with 90 more rows
cran_top_downloaded()
#> # A data frame: 100 × 2
#>    package     count 
#>    <chr>       <chr> 
#>  1 ggplot2     537944
#>  2 rlang       510888
#>  3 ragg        417993
#>  4 textshaping 416529
#>  5 dplyr       414192
#>  6 sf          403120
#>  7 jsonlite    401135
#>  8 cli         383482
#>  9 pillar      366216
#> 10 lifecycle   362196
#> # … with 90 more rows

Keep up with CRAN

Are you curious about the latest releases or archivals?

cran_events()
#> CRAN events (events)---------------------------------------------------------------------
#>  . When     Package    Version Title                                                     
#>  + 4 hours  bigsnpr    1.9.10  Analysis of Massive SNP Arrays                            
#>  - 5 hours  report     0.5.0   Automated Reporting of Results and Statistical Models     
#>  - 5 hours  miniCRAN   0.2.14  Create a Mini Version of CRAN Containing Only Selected ...
#>  + 7 hours  RFishBC    0.2.4   Back-Calculation of Fish Length                           
#>  + 7 hours  rEDM       1.10.2  Empirical Dynamic Modeling ('EDM')                        
#>  + 8 hours  rsmatrix   0.2.1   Matrices for Repeat-Sales Price Indexes                   
#>  + 10 hours wallace    1.1.2   A Modular Platform for Reproducible Modeling of Species...
#>  + 10 hours fastmatrix 0.4     Fast Computation of some Matrices Useful in Statistics    
#>  + 10 hours FSAdata    0.3.9   Data to Support Fish Stock Assessment ('FSA') Package     
#>  + 10 hours ggdist     3.1.0   Visualizations of Distributions and Uncertainty

Search features

More details

By default it returns a short summary of the ten best search hits. Their details can be printed by using the format = "long" option of pkg_search(), or just calling pkg_search() again, without any arguments, after a search:

library(pkgsearch)
pkg_search("C++")
#> - "C++" ----------------------------------------------- 11837 packages in 0.006 seconds -
#>   #     package      version  by                    @ title                              
#>   1 100 Rcpp         1.0.8    Dirk Eddelbuettel    1M Seamless R and C++ Integration     
#>   2  35 markdown     1.1      Yihui Xie            3y Render Markdown with the C Libra...
#>   3  32 BH           1.78.0.0 Dirk Eddelbuettel    2M Boost C++ Header Files             
#>   4  18 StanHeaders  2.21.0.7 Ben Goodrich         1y C++ Header Files for Stan          
#>   5  13 RcppProgress 0.4.2    Karl Forner          2y An Interruptible Progress Bar wi...
#>   6  12 cpp11        0.4.2    Romain François      3M A C++11 Interface for R's C Inte...
#>   7  12 covr         3.5.1    Jim Hester           1y Test Coverage for Packages         
#>   8  10 inline       0.3.19   Dirk Eddelbuettel    9M Functions to Inline C, C++, Fort...
#>   9   9 SnowballC    0.7.0    Milan Bouchet-Valat  2y Snowball Stemmers Based on the C...
#>  10   8 RcppThread   2.0.2    Thomas Nagler        7d R-Friendly Threading in C++
pkg_search()
#> - "C++" ----------------------------------------------- 11837 packages in 0.006 seconds -
#> 
#> 1 Rcpp @ 1.0.8                                       Dirk Eddelbuettel, about a month ago
#> --------------
#>   # Seamless R and C++ Integration
#>   The 'Rcpp' package provides R functions as well as C++ classes which offer a
#>   seamless integration of R and C++. Many R data types and objects can be mapped
#>   back and forth to C++ equivalents which facilitates both writing of new code
#>   as well as easier integration of third-party libraries. Documentation about
#>   'Rcpp' is provided by several vignettes included in this package, via the
#>   'Rcpp Gallery' site at <https://gallery.rcpp.org>, the paper by Eddelbuettel
#>   and Francois (2011, <doi:10.18637/jss.v040.i08>), the book by Eddelbuettel
#>   (2013, <doi:10.1007/978-1-4614-6868-4>) and the paper by Eddelbuettel and
#>   Balamuta (2018, <doi:10.1080/00031305.2017.1375990>); see 'citation("Rcpp")'
#>   for details.
#>   http://www.rcpp.org
#>   https://dirk.eddelbuettel.com/code/rcpp.html
#>   https://github.com/RcppCore/Rcpp
#> 
#> 2 markdown @ 1.1                                                   Yihui Xie, 3 years ago
#> ----------------
#>   # Render Markdown with the C Library 'Sundown'
#>   Provides R bindings to the 'Sundown' Markdown rendering library
#>   (<https://github.com/vmg/sundown>). Markdown is a plain-text formatting syntax
#>   that can be converted to 'XHTML' or other formats. See
#>   <http://en.wikipedia.org/wiki/Markdown> for more information about Markdown.
#>   https://github.com/rstudio/markdown
#> 
#> 3 BH @ 1.78.0.0                                           Dirk Eddelbuettel, 2 months ago
#> ---------------
#>   # Boost C++ Header Files
#>   Boost provides free peer-reviewed portable C++ source libraries.  A large part
#>   of Boost is provided as C++ template code which is resolved entirely at
#>   compile-time without linking.  This package aims to provide the most useful
#>   subset of Boost libraries for template use among CRAN packages. By placing
#>   these libraries in this package, we offer a more efficient distribution system
#>   for CRAN as replication of this code in the sources of other packages is
#>   avoided. As of release 1.78.0-0, the following Boost libraries are included:
#>   'accumulators' 'algorithm' 'align' 'any' 'atomic' 'beast' 'bimap' 'bind'
#>   'circular_buffer' 'compute' 'concept' 'config' 'container' 'date_time'
#>   'detail' 'dynamic_bitset' 'exception' 'flyweight' 'foreach' 'functional'
#>   'fusion' 'geometry' 'graph' 'heap' 'icl' 'integer' 'interprocess' 'intrusive'
#>   'io' 'iostreams' 'iterator' 'lambda2' 'math' 'move' 'mp11' 'mpl'
#>   'multiprecision' 'numeric' 'pending' 'phoenix' 'polygon' 'preprocessor'
#>   'process' 'propery_tree' 'random' 'range' 'scope_exit' 'smart_ptr' 'sort'
#>   'spirit' 'tuple' 'type_traits' 'typeof' 'unordered' 'utility' 'uuid'.
#>   https://github.com/eddelbuettel/bh
#>   https://dirk.eddelbuettel.com/code/bh.html
#> 
#> 4 StanHeaders @ 2.21.0.7                                   Ben Goodrich, about a year ago
#> ------------------------
#>   # C++ Header Files for Stan
#>   The C++ header files of the Stan project are provided by this package, but it
#>   contains little R code or documentation. The main reference is the vignette.
#>   There is a shared object containing part of the 'CVODES' library, but its
#>   functionality is not accessible from R. 'StanHeaders' is primarily useful for
#>   developers who want to utilize the 'LinkingTo' directive of their package's
#>   DESCRIPTION file to build on the Stan library without incurring unnecessary
#>   dependencies. The Stan project develops a probabilistic programming language
#>   that implements full or approximate Bayesian statistical inference via Markov
#>   Chain Monte Carlo or 'variational' methods and implements (optionally
#>   penalized) maximum likelihood estimation via optimization. The Stan library
#>   includes an advanced automatic differentiation scheme, 'templated' statistical
#>   and linear algebra functions that can handle the automatically
#>   'differentiable' scalar types (and doubles, 'ints', etc.), and a parser for
#>   the Stan language. The 'rstan' package provides user-facing R functions to
#>   parse, compile, test, estimate, and analyze Stan models.
#>   https://mc-stan.org/
#> 
#> 5 RcppProgress @ 0.4.2                                           Karl Forner, 2 years ago
#> ----------------------
#>   # An Interruptible Progress Bar with OpenMP Support for C++ in R Packages
#>   Allows to display a progress bar in the R console for long running
#>   computations taking place in c++ code, and support for interrupting those
#>   computations even in multithreaded code, typically using OpenMP.
#>   https://github.com/kforner/rcpp_progress
#> 
#> 6 cpp11 @ 0.4.2                                             Romain François, 3 months ago
#> ---------------
#>   # A C++11 Interface for R's C Interface
#>   Provides a header only, C++11 interface to R's C interface.  Compared to other
#>   approaches 'cpp11' strives to be safe against long jumps from the C API as
#>   well as C++ exceptions, conform to normal R function semantics and supports
#>   interaction with 'ALTREP' vectors.
#>   https://cpp11.r-lib.org
#>   https://github.com/r-lib/cpp11
#> 
#> 7 covr @ 3.5.1                                               Jim Hester, about a year ago
#> --------------
#>   # Test Coverage for Packages
#>   Track and report code coverage for your package and (optionally) upload the
#>   results to a coverage service like 'Codecov' <https://codecov.io> or
#>   'Coveralls' <https://coveralls.io>. Code coverage is a measure of the amount
#>   of code being exercised by a set of tests. It is an indirect measure of test
#>   quality and completeness. This package is compatible with any testing
#>   methodology or framework and tracks coverage of both R code and compiled
#>   C/C++/FORTRAN code.
#>   https://covr.r-lib.org
#>   https://github.com/r-lib/covr
#> 
#> 8 inline @ 0.3.19                                         Dirk Eddelbuettel, 9 months ago
#> -----------------
#>   # Functions to Inline C, C++, Fortran Function Calls from R
#>   Functionality to dynamically define R functions and S4 methods with 'inlined'
#>   C, C++ or Fortran code supporting the .C and .Call calling conventions.
#>   https://github.com/eddelbuettel/inline
#>   https://dirk.eddelbuettel.com/code/inline.html
#> 
#> 9 SnowballC @ 0.7.0                                      Milan Bouchet-Valat, 2 years ago
#> -------------------
#>   # Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library
#>   An R interface to the C 'libstemmer' library that implements Porter's word
#>   stemming algorithm for collapsing words to a common root to aid comparison of
#>   vocabulary. Currently supported languages are Danish, Dutch, English, Finnish,
#>   French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian,
#>   Spanish, Swedish and Turkish.
#>   https://github.com/nalimilan/R.TeMiS
#> 
#> 10 RcppThread @ 2.0.2                                           Thomas Nagler, 7 days ago
#> ---------------------
#>   # R-Friendly Threading in C++
#>   Provides a C++11-style thread class and thread pool that can safely be
#>   interrupted from R. See Nagler (2021) <doi:10.18637/jss.v097.c01>.
#>   https://github.com/tnagler/RcppThread

Pagination

The more() function can be used to display the next batch of search hits, batches contain ten packages by default. ps() is a shorter alias to pkg_search():

ps("google")
#> - "google" ---------------------------------------------- 155 packages in 0.005 seconds -
#>   #     package             version by               @ title                             
#>   1 100 googledrive         2.0.0   Jennifer Bryan  7M An Interface to Google Drive      
#>   2  93 googleVis           0.6.11  Markus Gesmann  1M R Interface to Google Charts      
#>   3  93 googleAuthR         2.0.0   Mark Edmondson 16d Authenticate and Create Google ...
#>   4  87 lubridate           1.8.0   Vitalie Spinu   4M Make Dealing with Dates a Littl...
#>   5  83 gargle              1.2.0   Jennifer Bryan  8M Utilities for Working with Goog...
#>   6  59 googleCloudStorageR 0.7.0   Mark Edmondson  2M Interface with Google Cloud Sto...
#>   7  58 googlesheets4       1.0.0   Jennifer Bryan  7M Access Google Sheets using the ...
#>   8  56 gsheet              0.4.5   Max Conway      2y Download Google Sheets Using Ju...
#>   9  51 googlePolylines     0.8.2   David Cooley    1y Encoding Coordinates into 'Goog...
#>  10  47 cld2                1.2.1   Jeroen Ooms     1y Google's Compact Language Detec...
more()
#> - "google" ---------------------------------------------- 155 packages in 0.006 seconds -
#>   #    package          version by                  @ title                              
#>  11 46 bigrquery        1.4.0   Hadley Wickham     6M An Interface to Google's 'BigQue...
#>  12 39 googleAnalyticsR 1.0.1   Mark Edmondson     4M Google Analytics API into R        
#>  13 38 cld3             1.4.2   Jeroen Ooms        7M Google's Compact Language Detect...
#>  14 37 plotKML          0.8.2   Tomislav Hengl     4M Visualization of Spatial and Spa...
#>  15 33 bigQueryR        0.5.0   Mark Edmondson     2y Interface with Google BigQuery w...
#>  16 32 ggmap            3.0.0   ORPHANED           3y Spatial Visualization with ggplot2 
#>  17 31 V8               4.1.0   Jeroen Ooms        7d Embedded JavaScript and WebAssem...
#>  18 30 googlesheets     0.3.0   Jennifer Bryan     4y Manage Google Spreadsheets from R  
#>  19 30 googleway        2.7.6   David Cooley      20d Accesses Google Maps APIs to Ret...
#>  20 27 tensorflow       2.8.0   Tomasz Kalinowski  4d R Interface to 'TensorFlow'

Stemming

The search server uses the stems of the words in the indexed metadata, and the search phrase. This means that “colour” and “colours” deliver the exact same result. So do “coloring”, “colored”, etc. (Unless one is happen to be an exact package name or match another non-stemmed field.)

ps("colour", size = 3)
#> - "colour" ---------------------------------------------- 270 packages in 0.005 seconds -
#>   #     package    version by              @ title                                       
#>  1  100 crayon     1.4.2   Gábor Csárdi   4M Colored Terminal Output                     
#>  2   62 colorspace 2.0.2   Achim Zeileis  8M A Toolbox for Manipulating and Assessing ...
#>  3   59 viridis    0.6.2   Simon Garnier  4M Colorblind-Friendly Color Maps for R
ps("colours", size = 3)
#> - "colours" --------------------------------------------- 268 packages in 0.006 seconds -
#>   #     package    version by              @ title                                       
#>  1  100 crayon     1.4.2   Gábor Csárdi   4M Colored Terminal Output                     
#>  2   62 colorspace 2.0.2   Achim Zeileis  8M A Toolbox for Manipulating and Assessing ...
#>  3   59 viridis    0.6.2   Simon Garnier  4M Colorblind-Friendly Color Maps for R

Ranking

The most important feature of a search engine is the ranking of the results. The best results should be listed first. pkgsearch uses weighted scoring, where a match in the package title gets a higher score than a match in the package description. It also uses the number of reverse dependencies and the number of downloads to weight the scores:

ps("colour")[, c("score", "package", "revdeps", "downloads_last_month")]
#> # A data frame: 10 × 4
#>     score package      revdeps downloads_last_month
#>     <dbl> <chr>          <int>                <int>
#>  1 17254. crayon           346              1020155
#>  2 10681. colorspace       179               621855
#>  3 10096. viridis          160               425759
#>  4  7653. pillar            66              1532564
#>  5  6776. viridisLite       79               735427
#>  6  6480. colourpicker      44                30185
#>  7  4743. shape             34                65178
#>  8  4327. RColorBrewer     569               615252
#>  9  4109. colorRamps        19                 5761
#> 10  3479. ggnewscale        18                 6126

Preferring Phrases

The search engine prefers matching whole phrases over single words. E.g. the search phrase “permutation test” will rank coin higher than testthat, even though testthat is a much better result for the single word “test”. (In fact, at the time of writing testthat is not even on the first page of results.)

ps("permutation test")
#> - "permutation test" ----------------------------------- 2236 packages in 0.009 seconds -
#>   #     package        version by                      @ title                           
#>   1 100 coin           1.4.2   Torsten Hothorn        4M Conditional Inference Procedu...
#>   2  31 perm           1.0.0.2 Michael P. Fay         4M Exact or Asymptotic Permutati...
#>   3  30 exactRankTests 0.8.34  Torsten Hothorn        4M Exact Distributions for Rank ...
#>   4  29 flip           2.5.0   Livio Finos            3y Multivariate Permutation Tests  
#>   5  22 jmuOutlier     2.2     Steven T. Garren       3y Permutation Tests for Nonpara...
#>   6  19 wPerm          1.0.1   Neil A. Weiss          6y Permutation Tests               
#>   7  16 cpt            1.0.2   Johann Gagnon-Bartsch  3y Classification Permutation Test 
#>   8  16 GlobalDeviance 0.4     Frederike Fuhlbrueck   8y Global Deviance Permutation T...
#>   9  16 permutes       2.3.2   Cesko C. Voeten        2M Permutation Tests for Time Se...
#>  10  16 AUtests        0.99    Arjun Sondhi           1y Approximate Unconditional and...

If the whole phrase does not match, pkgsearch falls back to individual matching words. For example, a match from either words is enough here, to get on the first page of results:

ps("test http")
#> - "test http" ------------------------------------------- 6366 packages in 0.01 seconds -
#>   #     package   version by                  @ title                                    
#>   1 100 httptest  4.1.0   Neal Richardson    5M A Test Environment for HTTP Requests     
#>   2  77 covr      3.5.1   Jim Hester         1y Test Coverage for Packages               
#>   3  35 webfakes  1.1.3   Gábor Csárdi      10M Fake Web Apps for HTTP Testing           
#>   4  15 testthat  3.1.2   Hadley Wickham    24d Unit Testing for R                       
#>   5  14 vcr       1.0.2   Scott Chamberlain  9M Record 'HTTP' Calls to Disk              
#>   6  13 psych     2.1.9   William Revelle    5M Procedures for Psychological, Psychome...
#>   7   9 webmockr  0.8.0   Scott Chamberlain  1y Stubbing and Setting Expectations on '...
#>   8   8 httr      1.4.2   Hadley Wickham     2y Tools for Working with URLs and HTTP     
#>   9   6 bnlearn   4.7     Marco Scutari      5M Bayesian Network Structure Learning, P...
#>  10   5 rmarkdown 2.11    Yihui Xie          5M Dynamic Documents for R

British vs American English

The search engine uses a dictionary to make sure that package metadata and queries given in British and American English yield the same results. E.g. note the spelling of colour/color in the results:

ps("colour")
#> - "colour" ---------------------------------------------- 270 packages in 0.005 seconds -
#>   #     package      version by                 @ title                                  
#>   1 100 crayon       1.4.2   Gábor Csárdi      4M Colored Terminal Output                
#>   2  62 colorspace   2.0.2   Achim Zeileis     8M A Toolbox for Manipulating and Asses...
#>   3  59 viridis      0.6.2   Simon Garnier     4M Colorblind-Friendly Color Maps for R   
#>   4  44 pillar       1.7.0   Kirill Müller    12d Coloured Formatting for Columns        
#>   5  39 viridisLite  0.4.0   Simon Garnier    10M Colorblind-Friendly Color Maps (Lite...
#>   6  38 colourpicker 1.1.1   Dean Attali       4M A Colour Picker Tool for Shiny and f...
#>   7  27 shape        1.4.6   Karline Soetaert  9M Functions for Plotting Graphical Sha...
#>   8  25 RColorBrewer 1.1.2   Erich Neuwirth    7y ColorBrewer Palettes                   
#>   9  24 colorRamps   2.3     Tim Keitt         9y Builds color tables                    
#>  10  20 ggnewscale   0.4.5   Elio Campitelli   1y Multiple Fill and Colour Scales in '...
ps("color")
#> - "color" ----------------------------------------------- 269 packages in 0.009 seconds -
#>   #     package      version by                 @ title                                  
#>   1 100 crayon       1.4.2   Gábor Csárdi      4M Colored Terminal Output                
#>   2  62 colorspace   2.0.2   Achim Zeileis     8M A Toolbox for Manipulating and Asses...
#>   3  59 viridis      0.6.2   Simon Garnier     4M Colorblind-Friendly Color Maps for R   
#>   4  44 pillar       1.7.0   Kirill Müller    12d Coloured Formatting for Columns        
#>   5  39 viridisLite  0.4.0   Simon Garnier    10M Colorblind-Friendly Color Maps (Lite...
#>   6  38 colourpicker 1.1.1   Dean Attali       4M A Colour Picker Tool for Shiny and f...
#>   7  27 shape        1.4.6   Karline Soetaert  9M Functions for Plotting Graphical Sha...
#>   8  25 RColorBrewer 1.1.2   Erich Neuwirth    7y ColorBrewer Palettes                   
#>   9  24 colorRamps   2.3     Tim Keitt         9y Builds color tables                    
#>  10  20 ggnewscale   0.4.5   Elio Campitelli   1y Multiple Fill and Colour Scales in '...

Ascii Folding

Especially when searching for package maintainer names, it is convenient to use the corresponding ASCII letters for non-ASCII characters in search phrases. E.g. the following two queries yield the same results. Note that case is also ignored.

ps("gabor", size = 5)
#> - "gabor" ----------------------------------------------- 101 packages in 0.005 seconds -
#>   #     package  version by              @ title                                         
#>  1  100 crayon   1.4.2   Gábor Csárdi   4M Colored Terminal Output                       
#>  2   84 cli      3.1.1   Gábor Csárdi  24d Helpers for Developing Command Line Interfaces
#>  3   77 progress 1.2.2   Gábor Csárdi   3y Terminal Progress Bars                        
#>  4   59 zoo      1.8.9   Achim Zeileis  1y S3 Infrastructure for Regular and Irregular...
#>  5   59 fs       1.5.2   Gábor Csárdi   2M Cross-Platform File System Operations Based...
ps("Gábor", size = 5)
#> - "Gábor" ----------------------------------------------- 101 packages in 0.006 seconds -
#>   #     package  version by              @ title                                         
#>  1  100 crayon   1.4.2   Gábor Csárdi   4M Colored Terminal Output                       
#>  2   84 cli      3.1.1   Gábor Csárdi  24d Helpers for Developing Command Line Interfaces
#>  3   77 progress 1.2.2   Gábor Csárdi   3y Terminal Progress Bars                        
#>  4   59 zoo      1.8.9   Achim Zeileis  1y S3 Infrastructure for Regular and Irregular...
#>  5   59 fs       1.5.2   Gábor Csárdi   2M Cross-Platform File System Operations Based...

More info

See the complete documentation.

Download Details:

Author: r-hub
Source Code: https://github.com/r-hub/pkgsearch 
License: View license

#r #search #engine 

Pkgsearch: Search R Packages on CRAN
Monty  Boehm

Monty Boehm

1660303620

CellFishing.jl: Fast and Scalable Cell Search tool

CellFishing.jl 🎣

CellFishing.jl (cell finder via hashing) is a tool to find similar cells of query cells based on their transcriptome expression profiles.

Kenta Sato, Koki Tsuyuzaki, Kentaro Shimizu, and Itoshi Nikaido. "CellFishing.jl: an ultrafast and scalable cell search method for single-cell RNA sequencing." Genome Biology, 2019 20:31. https://doi.org/10.1186/s13059-019-1639-x

# Import packages.
using CellFishing
using TableReader

# Load expression profiles of database cells.
# Note: We highly recommend using the Loom format (http://loompy.org/) to
# load expression data, because loading a large matrix in plain text takes
# extremely long time.
data = readtsv("database.txt")  # use readcsv if your file is comma-separated
cellnames = string.(names(data))
featurenames = string.(data[:,1])
counts = Matrix{Int}(data[:,2:end])

# Select features and create an index (or a database).
features = CellFishing.selectfeatures(counts, featurenames)
database = CellFishing.CellIndex(counts, features, metadata=cellnames)

# Save/load the database to/from a file (optional).
# CellFishing.save("database.cf", database)
# database = CellFishing.load("database.cf")

# Load expression profiles of query cells.
data = readtsv("query.txt")
cellnames = string.(names(data))
featurenames = string.(data[:,1])
counts = Matrix{Int}(data[:,2:end])

# Search the database for similar cells; k cells will be returned per query.
k = 10
neighbors = CellFishing.findneighbors(k, counts, featurenames, database)

# Write the neighboring cells to a file.
open("neighbors.tsv", "w") do file
    println(file, join(["cell"; string.("n", 1:k)], '\t'))
    for j in 1:length(cellnames)
        print(file, cellnames[j])
        for i in 1:k
            print(file, '\t', database.metadata[neighbors.indexes[i,j]])
        end
        println(file)
    end
end

Installation

First of all, you need to install a Julia compiler. A recommended way is to download a pre-built binary of Julia. The pre-built binaries for several major platforms are distributed at https://julialang.org/downloads/. Currently, CellFishing.jl supports Julia 1.0 or later.

Then, install CellFishing.jl with the following command:

$ julia -e 'using Pkg; Pkg.add(PackageSpec(url="git://github.com/bicycle1885/CellFishing.jl.git"))'

Alternatively, you can use the add command in the package management mode of Julia:

(v1.0) pkg> add git@github.com:bicycle1885/CellFishing.jl.git

To check the installation, you can try using CellFishing in your REPL:

$ julia
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.0.0 (2018-08-08)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using CellFishing  # load the package
[ Info: Precompiling CellFishing [5ab3512e-c64d-48f6-b1c0-509c1121fdda]

julia>

No error messages mean you have successfully installed CellFishing.jl.

To run unit tests, execute the following command:

$ julia -e 'using Pkg; Pkg.test("CellFishing")'

Command-line interface (WIP)

The bin/cellfishing script is a command-line interface to CellFishing.jl.

$ ./bin/cellfishing build Plass2018.dge.loom
Build a search database from Plass2018.dge.loom.
  Loading data ―――――――――――― 13 seconds, 173 milliseconds
  Selecting features ―――――― 1 second, 376 milliseconds
  Creating a database ――――― 16 seconds, 418 milliseconds
  Writing the database ―――― 659 milliseconds
The serialized database is in Plass2018.dge.loom.cf.
$ ./bin/cellfishing search Plass2018.dge.loom.cf Plass2018.dge.loom >neighbors.tsv
Search Plass2018.dge.loom.cf for 10 neighbors.
  Loading the database ―――― 512 milliseconds
  Loading query data ―――――― 12 seconds, 960 milliseconds
  Searching the database ―― 31 seconds, 821 milliseconds
  Writing neighbors ――――――― 64 milliseconds
$ head -5 neighbors.tsv | cut -f1-3
plan1_GACTTTCTCTTC      plan1_GACTTTCTCTTC      h2b_TTTTGCTACGGG
plan1_GTAAGGCGACAN      plan1_GTAAGGCGACAN      gfp_ATTCCTAGCGAT
plan1_TGGCCCAGCTGC      plan1_TGGCCCAGCTGC      plan1_GACTTTCTCTTC
plan1_CTCCTGTAATTT      plan1_CTCCTGTAATTT      plan1_ATCCTCCATTAA
plan1_ATGACGCATAAT      plan1_ATGACGCATAAT      plan1_TACTTGACGGTA

Download Details:

Author: Bicycle1885
Source Code: https://github.com/bicycle1885/CellFishing.jl 
License: MIT license

#julia #search #tool 

CellFishing.jl: Fast and Scalable Cell Search tool
Reid  Rohan

Reid Rohan

1660037040

Lyra: Fast, in-memory, Typo-tolerant, Full-text Search Engine Written

lyra

Installation

You can install Lyra using npm, yarn, pnpm:

npm i @nearform/lyra
yarn add @nearform/lyra
pnpm add @nearform/lyra

Usage

Lyra is quite simple to use. The first thing to do is to create a new database instance and set an indexing schema:

import { create, insert, search, remove } from '@nearform/lyra';

const db = create({
  schema: {
    author: 'string',
    quote: 'string'
  }
});

Lyra will only index string properties, but will allow you to set and store additional data if needed.

Once the db instance is created, you can start adding some documents:

insert(db, {
  quote: 'It is during our darkest moments that we must focus to see the light.',
  author: 'Aristotle'
});

insert(db, {
  quote: 'If you really look closely, most overnight successes took a long time.',
  author: 'Steve Jobs'
});

insert(db, {
  quote: 'If you are not willing to risk the usual, you will have to settle for the ordinary.',
  author: 'Jim Rohn'
});

insert(db, {
  quote: 'You miss 100% of the shots you don\'t take',
  author: 'Wayne Gretzky - Michael Scott'
});

After the data has been inserted, you can finally start to query the database.

const searchResult = search(db, {
  term: 'if',
  properties: '*'
});

In the case above, you will be searching for all the documents containing the word if, looking up in every schema property (AKA index):

{
  elapsed: 99, // elapsed time is in microseconds
  hits: [
    {
      id: 'ckAOPGTA5qLXx0MgNr1Zy',
      quote: 'If you really look closely, most overnight successes took a long time.',
      author: 'Steve Jobs'
    },
    {
      id: 'fyl-_1veP78IO-wszP86Z',
      quote: 'If you are not willing to risk the usual, you will have to settle for the ordinary.',
      author: 'Jim Rohn'
    }
  ],
  count: 2
}

You can also restrict the lookup to a specific property:

const searchResult = search(db, {
  term: 'Michael',
  properties: ['author']
});

Result:

{
  elapsed: 111,
  hits: [
    {
      id: 'L1tpqQxc0c2djrSN2a6TJ',
      quote: "You miss 100% of the shots you don't take",
      author: 'Wayne Gretzky - Michael Scott'
    }
  ],
  count: 1
}

If needed, you can also delete a given document by using the remove method:

remove(db, 'L1tpqQxc0c2djrSN2a6TJ');

Try the live demo

Download Details:

Author: Nearform
Source Code: https://github.com/nearform/lyra 
License: View license

#typescript #search #datastructures 

Lyra: Fast, in-memory, Typo-tolerant, Full-text Search Engine Written
Elian  Harber

Elian Harber

1659968100

Terminotes: Self Hosted Notes Storage and Search Server

terminotes ✍️

Take notes directly from without ever having to leave your terminal

What is terminotes?

Terminotes is an HTTP server that can be deployed to allow you to store and search notes in a centralised [accessible from home or the office] location directly from your terminal. It is similar to eureka [the rust notes project] but that saves it locally on the machine - this is saved on the server it is deployed to.

It has a backend server that can be deployed as a single go binary and invoked over HTTP

A command line tool is WIP

What does terminotes use internally?

The HTTP server is built using Echo, while the storage is done via BadgerDB [embedded key value store by Dgraph]. The searching functionality is provided by Bleve [search library used in couchbase]

Who should use terminotes?

You want to take notes directly from your terminal and have them be accessible from anywhere

You want your notes/data to be stored on your own server

API

Unauthenticated routes

GET /api/v1/keys => Generate an API key - notes stored for this key will only be accessible by this key

Authenticated routes

For the following routes a header with the API key needs to be present under X-Terminote-Auth key

API base prefix is /api/v1/notes

GET / => Get all notes

POST / => Create new note by ID

GET /:id => Get note by ID

PUT /:id => Put note by ID

DELETE /:id => Delete note by ID

GET /search => Search notes by query param text eg /api/v1/notes/search?text=sushi%20party

API Model

Note:

{
    "id": "f460637a-d2e4-4124-930d-97b657410a67",
    "text": "go get groceries",
    "create_time": "2020-09-20T16:27:48.173581-04:00"
}

Response:

{
  "result_code": 200,
  "data": {
    "id": "f460637a-d2e4-4124-930d-97b657410a67",
    "text": "go get groceries",
    "create_time": "2020-09-20T16:27:48.173581-04:00"
  },
  "error_message": ""
}

{
  "result_code": 404,
  "data": null,
  "error_message": "no note found"
}

Build

go build main.go terminote.go

Run

go run main.go terminote.go

or

go build main.go terminote.go
./main

Original article source at: https://gitlab.com/asad-awadia/terminotes 

#go #golang #search #server 

Terminotes: Self Hosted Notes Storage and Search Server
Elian  Harber

Elian Harber

1659819480

Riot: Go Open Source, Distributed, Simple and Efficient Search Engine

Riot search

Go Open Source, Distributed, Simple and efficient full text search engine.

简体中文

Features

Efficient indexing and search (1M blog 500M data 28 seconds index finished, 1.65 ms search response time, 19K search QPS)

Support for logical search

Support Chinese word segmentation (use gse word segmentation package concurrent word, speed 27MB / s)

Support the calculation of the keyword in the text close to the distance(token proximity)

Support calculation BM25 correlation

Support custom scoring field and scoring rules

Support add online, delete index

Support heartbeat

Support multiple persistent storage

Support distributed index and search

Can be achieved distributed index and search

Look at Word segmentation rules

Requirements

Go version >= 1.8

Dependencies

Riot uses go module or dep to manage dependencies.

Installation/Update

go get -u github.com/go-ego/riot

Build-tools

go get -u github.com/go-ego/re 

re riot

To create a new riot application

$ re riot my-riotapp

re run

To run the application we just created, you can navigate to the application folder and execute:

$ cd my-riotapp && re run

Usage:

Look at an example

package main

import (
    "log"

    "github.com/go-ego/riot"
    "github.com/go-ego/riot/types"
)

var (
    // searcher is coroutine safe
    searcher = riot.Engine{}
)

func main() {
    // Init
    searcher.Init(types.EngineOpts{
        // Using:             4,
        NotUseGse: true,
        })
    defer searcher.Close()

    text := "Google Is Experimenting With Virtual Reality Advertising"
    text1 := `Google accidentally pushed Bluetooth update for Home
    speaker early`
    text2 := `Google is testing another Search results layout with 
    rounded cards, new colors, and the 4 mysterious colored dots again`
    
    // Add the document to the index, docId starts at 1
    searcher.Index("1", types.DocData{Content: text})
    searcher.Index("2", types.DocData{Content: text1}, false)
    searcher.IndexDoc("3", types.DocData{Content: text2}, true)

    // Wait for the index to refresh
    searcher.Flush()
    // engine.FlushIndex()

    // The search output format is found in the types.SearchResp structure
    log.Print(searcher.Search(types.SearchReq{Text:"google testing"}))
}

It is very simple!

Use default engine:

package main

import (
    "log"

    "github.com/go-ego/riot"
    "github.com/go-ego/riot/types"
)

var (
    searcher = riot.New("zh")
)

func main() {
    data := types.DocData{Content: `I wonder how, I wonder why
        , I wonder where they are`}
    data1 := types.DocData{Content: "所以, 你好, 再见"}
    data2 := types.DocData{Content: "没有理由"}

    searcher.Index("1", data)
    searcher.Index("2", data1)
    searcher.Index("3", data2)
    searcher.Flush()

    req := types.SearchReq{Text: "你好"}
    search := searcher.Search(req)
    log.Println("search...", search)
}

Look at more Examples

Look at Store example

Look at Logic search example

Look at Pinyin search example

Look at different dict and language search example

Look at benchmark example

Riot search engine templates, client and dictionaries

Author: Go-ego
Source Code: https://github.com/go-ego/riot 
License: Apache-2.0 license

#go #golang #search 

Riot: Go Open Source, Distributed, Simple and Efficient Search Engine
Rocio  O'Keefe

Rocio O'Keefe

1659801120

Search_bar_bloc: A Flutter Package for Creating A SearchBar

SearchBar with Bloc

This package a SearchBar with Bloc state management.

The purpose of this package is to give an easy way to implement a search bar in a stateless widget.

Features

The composition of the search bar is customizable and more parameters will be handled in the future.

Getting started

If you're not familiar with bloc state management and the flutter_bloc package here are the resources you need :

Usage

Here is a quick example of how to implement the search bar with a BlocProvider.

  @override
  Widget build(BuildContext context) {
    return BlocProvider(
      create: (_) => SearchBarCubit(),
      child: Column(
        children: [
          const SearchBar(hintText: "Search something..."),
          BlocBuilder<SearchBarCubit, SearchBarState>(
              buildWhen: (previous, current) => previous.content != current.content,
              builder: (context, state) {
                return Text("You are searching : ${state.content}");
              }),
        ],
      ),
    );
  }

Installing

Use this package as a library

Depend on it

Run this command:

With Flutter:

 $ flutter pub add search_bar_bloc

This will add a line like this to your package's pubspec.yaml (and run an implicit flutter pub get):

dependencies:
  search_bar_bloc: ^1.0.1

Alternatively, your editor might support flutter pub get. Check the docs for your editor to learn more.

Import it

Now in your Dart code, you can use:

import 'package:search_bar_bloc/search_bar_bloc.dart';

example/README.md

Bloc Provider Example

Here is a quick example of how to implement the search bar with a BlocProvider.

  @override
  Widget build(BuildContext context) {
    return BlocProvider(
      create: (_) => SearchBarCubit(),
      child: Column(
        children: [
          const SearchBar(hintText: "Search something..."),
          BlocBuilder<SearchBarCubit, SearchBarState>(
              buildWhen: (previous, current) => previous.content != current.content,
              builder: (context, state) {
                return Text("You are searching : ${state.content}");
              }),
        ],
      ),
    );
  }

Bloc Provider Example

Here is a quick example of how to implement the search bar with a BlocListener.

class BlockBuilderExampleWidget extends StatelessWidget {
  const BlockBuilderExampleWidget({Key? key}) : super(key: key);

  @override
  Widget build(BuildContext context) {
    return BlocListener<SearchBarCubit, SearchBarState>(
      listenWhen: (previous, current) => previous.content != current.content,
      listener: (context, state) {
        print("You are searching : ${state.content}");
      },
      child: Container(),
    );
  }

}

Author: B0berman
Source Code: https://github.com/B0berman/search_bar_bloc 
License: BSD-3-Clause license

#flutter #dart #bloc #search 

Search_bar_bloc: A Flutter Package for Creating A SearchBar
Royce  Reinger

Royce Reinger

1659684180

Ransack: Object-based Searching

Introduction

Ransack will help you easily add searching to your Rails application, without any additional dependencies.

There are advanced searching solutions around, like ElasticSearch or Algolia. Ransack will do the job for many Rails websites, without the need to run additional infrastructure or work in a different language. With Ransack you do it all with standard Ruby and ERB.

Ready to move beyond the basics? Use advanced features like i18n and extensive configuration options.

Ransack is supported for Rails 7.0, 6.x on Ruby 2.6.6 and later.

Installation

To install ransack and add it to your Gemfile, run

gem 'ransack'

Bleeding edge

If you would like to use the latest updates not yet published to RubyGems, use the main branch:

gem 'ransack', :github => 'activerecord-hackery/ransack', :branch => 'main'

Documentation

There is extensive documentation on Ransack, which is a Docusaurus project and run as a GitHub Pages site.

Issues tracker

  • Before filing an issue, please read the Contributing Guide.
  • File an issue if a bug is caused by Ransack, is new (has not already been reported), and can be reproduced from the information you provide.
  • Please consider adding a branch with a failing spec describing the problem.
  • Contributions are welcome. :smiley:
  • Please do not use the issue tracker for personal support requests. Stack Overflow or GitHub Discussions is a better place for that where a wider community can help you!

Contributions

To support the project:

  • Consider supporting us via Open Collective
  • Use Ransack in your apps, and let us know if you encounter anything that's broken or missing. A failing spec to demonstrate the issue is awesome. A pull request with passing tests is even better!
  • Before filing an issue or pull request, be sure to read and follow the Contributing Guide.
  • Please use Stack Overflow or GitHub Discussions for questions or discussion not directly related to bug reports, pull requests, or documentation improvements.
  • Spread the word on social media if Ransack's been useful to you. The more people who are using the project, the quicker we can find and fix bugs!

Contributors

Ransack was created by Ernie Miller and is developed and maintained by:

Alumni Maintainers

Author: Activerecord-hackery
Source Code: https://github.com/activerecord-hackery/ransack 
License: MIT license

#ruby #rails #search 

Ransack: Object-based Searching
Duyen Hoang

Duyen Hoang

1659334200

Cách Tải Dữ Liệu Tệp XML Trong JQuery UI Autocomplete Bằng AJAX Và PHP

Giao diện người dùng jQuery tự động hoàn thành cho phép hộp văn bản hiển thị danh sách đề xuất dựa trên đầu vào.

Nó cho phép tải danh sách gợi ý bằng AJAX.

Phản hồi AJAX phải ở định dạng được xác định, nếu không, danh sách sẽ không hiển thị.

Bạn có thể tải dữ liệu XML nếu có.

Trong hướng dẫn này, tôi chỉ cách bạn có thể tải dữ liệu tệp XML trong tự động hoàn thành giao diện người dùng jQuery bằng AJAX và PHP.

1. Tệp XML

Tạo một xmlfile.xmltệp mới và lưu trữ nó trong thư mục gốc của dự án.

<?xml version="1.0" encoding="UTF-8" ?>
<employees>
  <employee>
    <id>1</id>
    <name>Yogesh Singh</name>
    <salary>40000</salary>
    <gender>Male</gender>
  </employee>
  <employee>
    <id>2</id>   
    <name>Sonarika Bhadoria</name>
    <salary>42000</salary>
    <gender>Female</gender>
  </employee>
  <employee>
    <id>3</id>
    <name>Anil Singh</name>
    <salary>36000</salary>
    <gender>Male</gender>
  </employee>
  <employee>
    <id>4</id>
    <name>Mayank Patidar</name>
    <salary>39000</salary>
    <gender>Male</gender>
  </employee>
  <employee>
    <id>5</id>
    <name>Priya</name>
    <salary>23000</salary>
    <gender>Female</gender>
  </employee>
  <employee>
    <id>6</id>
    <name>Ravi Sharma</name>
    <salary>43000</salary>
    <gender>Male</gender>
  </employee>
  <employee>
    <id>7</id>
    <name>Akilesh Sahu</name>
    <salary>39000</salary>
    <gender>Male</gender>
  </employee>
  <employee>
    <id>8</id>
    <name>Rohan pathak</name>
    <salary>42000</salary>
    <gender>Male</gender>
  </employee>
  <employee>
    <id>9</id>
    <name>Madhu sharma</name>
    <salary>30000</salary>
    <gender>Female</gender>
  </employee>
  <employee>
    <id>10</id>
    <name>Sunil singh</name>
    <salary>38000</salary>
    <gender>Male</gender>
  </employee>
</employees>

2. Tải xuống và bao gồm

  • Tải xuống thư viện giao diện người dùng jQueryjQuery .
  • Bao gồm jquery-ui.css, thư viện jQuery và tập lệnh jquery-ui.min.js.
<!-- jQuery UI CSS -->
<link rel="stylesheet" href="https://ajax.googleapis.com/ajax/libs/jqueryui/1.12.1/themes/smoothness/jquery-ui.css">

<!-- jQuery --> 
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>

<!-- jQuery UI JS -->
<script src="https://ajax.googleapis.com/ajax/libs/jqueryui/1.12.1/jquery-ui.min.js"></script>

3. HTML

Tạo hai phần tử kiểu văn bản.

Đầu tiên được sử dụng để khởi tạo tự động hoàn thành giao diện người dùng jQuery và thứ hai là hiển thị id mục đã chọn từ danh sách gợi ý.

Mã đã hoàn thành

<table>
   <tr>
     <td>Search employee</td>
     <td><input type='text' id='search_employee' ></td>
   </tr>

   <tr>
     <td>Selected Employee id</td>
     <td><input type='text' id='selectemployee_id' /></td>
   </tr>

</table>

4. AJAX

Tạo ajaxfile.phptệp để xử lý yêu cầu AJAX.

Gán $_POST['search']bằng $searchTextchữ thường nếu 'search'là POST. Tôi đã gán nó bằng chữ thường để thực hiện tìm kiếm không phân biệt chữ hoa chữ thường vì tôi đang sử dụng truy vấn XPath để chọn dữ liệu XML theo mặc định thực hiện tìm kiếm phân biệt chữ hoa chữ thường.

Tải tệp XML

Tạo Đối tượng của DOMDocument()Lớp và chuyển đường dẫn tệp XML trong load()phương thức.

Chọn dữ liệu tệp XML

Sử dụng DOMXPathđối tượng để chọn dữ liệu tệp XML.

Tạo một đối tượng trong DOMXPathđó vượt qua DOMDocumentđối tượng.

Để sử dụng strtolower()hàm trong truy vấn XPath trước tiên cần đăng ký hàm đó với $xpathđối tượng sử dụng registerNamespace()registerPhpFunctions()các hàm.

Chuẩn bị một truy vấn để chọn employeecác nút $searchTextđược tìm thấy trong namenút.

//employee[contains(php:functionString('strtolower',name), '$searchText')]/*

LƯU Ý - Sửa đổi truy vấn trên theo cấu trúc tệp XML của bạn. Bạn có thể tìm hiểu thêm về cách tạo truy vấn XPath từ đây .

Sử dụng contains()để tìm kiếm trên namenút. Chuyển đổi namenút thành strtolower()chức năng sử dụng chữ thường. Để chọn tất cả các nút sử dụng /*.

Tạo $count=0;$responseMảng. $countđược sử dụng để $responselập chỉ mục Mảng và $responseđể lưu trữ phản hồi trả về.

Vòng lặp trên $results. Khởi tạo $responseMảng với giá trị nút nếu $val->tagNamebằng 'id'hoặc 'name'.

Nếu $val->tagName == 'id'thì gán $response[$count]['value']với $val->nodeValue. Tương tự, if $val->tagName == 'name'then gán $response[$count]['label']với $val->nodeValuevà tăng $countbiến.

Trả về $responseMảng ở định dạng JSON.

Mã đã hoàn thành

<?php

// Search Text
$searchText = "";
if(isset($_POST['search'])){
   $searchText = strtolower($_POST['search']);
}

// Load XML File
$doc = new DOMDocument();
$doc->load('xmlfile.xml');
$xpath = new DOMXPath($doc);

// Register PHP function
$xpath->registerNamespace('php', 'http://php.net/xpath');
$xpath->registerPhpFunctions('strtolower'); // Allow all PHP functions

// Execute query and select all node
$results = $xpath->query("//employee[contains(php:functionString('strtolower',name), '$searchText')]/*");

$count = 0;
$response = array();
foreach($results as $val){

   if($val->tagName == 'id'){
      $response[$count]['value'] = $val->nodeValue;
   }
   if($val->tagName == 'name'){
      $response[$count]['label'] = $val->nodeValue;
      $count++;
   }

}

echo json_encode($response);
exit;

5. jQuery

Khởi tạo tự động hoàn thành giao diện người dùng jQuery trên #search_employee.

Sử dụng 'source'tùy chọn để tải danh sách đề xuất bằng cách gửi yêu cầu AJAX POST tới 'ajaxfile.php'tệp.

Đặt dataType để 'json'chuyển giá trị đầu vào làm dữ liệu. Khi gọi lại thành công, hãy chuyển phản hồi AJAX datatới response()hàm.

Sử dụng 'select'tùy chọn để nhận một mục đề xuất đã chọn.

Gán nhãn mục đã chọn - ui.item.labelcho #search_employee. Tương tự, gán giá trị mục đã chọn - ui.item.valuecho #selectemployee_id.

Mã đã hoàn thành

$(document).ready(function(){

 $( "#search_employee" ).autocomplete({
   source: function( request, response ) {

     $.ajax({
       url: "ajaxfile.php",
       type: 'post',
       dataType: "json",
       data: {
          search: request.term
       },
       success: function( data ) {
          response( data );
       }
     });
   },
   select: function (event, ui) {
     $('#search_employee').val(ui.item.label); // display the selected text
     $('#selectemployee_id').val(ui.item.value); // save selected id to input
     return false;
   },
   focus: function(event, ui){
      $( "#search_employee" ).val( ui.item.label );
      $( "#selectemployee_id" ).val( ui.item.value );
      return false;
   },
  }); 
});

6. Demo

Xem Demo


7. Kết luận

Bạn không cần phải sửa đổi tập lệnh jQuery để tải dữ liệu tệp XML. Xử lý tệp XML từ tập lệnh PHP. Tải tệp và tìm kiếm trên dữ liệu XML.

Trả lại phản hồi JSON hợp lệ cho tập lệnh jQuery.

Nguồn:  https://makitweb.com

#php #xml #jquery 

Cách Tải Dữ Liệu Tệp XML Trong JQuery UI Autocomplete Bằng AJAX Và PHP