Dexter  Goodwin

Dexter Goodwin

1657178700

inverted-index for Level with Pagination, Sift3/cosine Distance

inverted-index

features

  • pagination
  • facets
  • sift3/cosine distance
  • tf-idf ranking
  • stopword removal
  • stemming
  • diactrics replacement
  • number support

install

npm install [--save/--save-dev] inverted-index

api

var inverted = require('inverted-index')

inverted(db[, options[, getter]])

var level = require('level')('/path/to/my/db')
var sublevel = require('sublevel')

var index = inverted(sublevel(db, 'index'), {
  idf: true,
  stem: true,
  rank: true,
  rank_algorithm: 'cosine',
  facets: true
}, function(id, options, fn){
  level.get(id, options, fn)
})

db

Any level API-compatible instance is accepted.

options

The exemplified options is the default configuration.

idf

When idf is flagged as true, for each token indexed an idf (term frequency–inverse document frequency) is calculated. When querying the index, the terms with lowest idf are fetched first. Example:

"Julie loves me more than Linda loves me"
[
  {
    "word": "julie",
    "idf": 1.791759469228055
  },
  {
    "word": "linda",
    "idf": 1.791759469228055
  },
  {
    "word": "loves",
    "idf": 1.0986122886681098
  }
]

Notice that "me", "more" and "than" are not indexed, because those are considered stopwords.

stem

Whether the text should be stemmed or not. When true, the text is stemmed with the Porter stemming algorithm using NaturalNode/natural. Example:

"Fishing is a way of catching cats, he argued in his arguments"

is tokenized into:

["fishing", "is", "a", "way", "of", "catching", "cats", "he", "argued", "in", "his", "arguments"]

and stemmed into:

["fish", "is", "a", "wai", "of", "catch", "cat", "he", "argued", "in", "his", "argum"]

rank

With ranking enabled, when querying it ranks the results based on a defined algorithm. The rank is done AFTER the fetch, so it only ranks using the result set (that can be parcial depending on the size of matching results) comparing the query with the original indexed text, to the tokens.

So, idf is used to fetch tokens ordered by idf and then ranking is done with the original text of each token's correspondent document comparing with the query text. The "problem" with ranking is that if you have 100000 tokens that match the query tokens, only 100 (can be set on the query options) are fetched for each page and THEN the rank is done. Example:

{
  "1": "Fishing is a way of catching cats, Julie argued in her arguments",
  "2": "Julie loves me more than Linda loves me"
}

querying Julie loves would fetch:

[
  {
    "word": "loves",
    "idf": 1.0986122886681098,
    "id": "2"
  },
  {
    "word": "julie",
    "idf": 1.791759469228055,
    "id": "2"
  },
  {
    "word": "julie",
    "idf": 2.4849066497880004,
    "id": "1"
  }
]

and then rank them:

["2", "1"]

rank_algorithm

Only takes effect when rank is set to true. Valid options are cosine or sift3 using ramitos/cosine and ramitos/sift3.

Haven't made any benchmarks on that, but sift3 should be faster. Will get data on that soon.

facets

Enabling facets is useful to query based on types of models. Example:

{
  "1": {
    "text": "Hank Green",
    "facets": ["user"]
  },
  "2": {
    "text": "John Green",
    "facets": ["user"]
  },
  "3": {
    "text": "Johnnie Walker",
    "facets": ["user"]
  },
  "b": {
    "text": "Johnnie Walker",
    "facets": ["brand"]
  }
}

You can then query "Johnnie" with facets ["brand"] and only get:

["b"]

Notice how the result don't include the user 3 because it doesn't have the brand facet.

You can also combine facets with id's to provide property based queries:

{
  "3": {
    "text": "Johnnie Walker johnnie@walker.com",
    "facets": ["user"]
  },
  "3-name": {
    "text": "Johnnie Walker",
    "facets": ["user-name"]
  },
  "3-email": {
    "text": "johnnie@walker.com",
    "facets": ["user-email"]
  }
}

And then query the facets ["user-name"] with the text "johnnie" and get:

["3-name"]

And with that you can just split the results to get the id's.

getter

For ranking results, we need to store the original text. When indexing large amounts of data this can have an impact on disk usage. To prevent that, a function can be passed that receives id, options, and callback as the arguments to fetch the original indexed text for that id.

index(text, id[, facets], callback)

put(text, id[, facets], callback)

link(text, id[, facets], callback)

index.index('john green', 1, ['user'], function(err){
  assert(!err)
})
index.put('Fishing is a way of catching cats, he argued in his arguments', 'b', function(err){
  assert(!err)
})
index.link('Julie loves me more than Linda loves me', '1436ebc684b-c1039c76bdb2b054670f3a1256c98650', ['message'], function(err){
  assert(!err)
})

remove(id, callback)

del(id, callback)

unlink(id, callback)

index.remove(1, function(err){
  assert(!err)
})
index.del('b', function(err){
  assert(!err)
})
index.unlink('1436ebc684b-c1039c76bdb2b054670f3a1256c98650', function(err){
  assert(!err)
})

index.search(query[, facets[, options]], callback)

index.query(query[, facets[, options]], callback)

index.search('Fishing', function(err, result){
  assert(!err)
  assert(result.last)
  assert(result.results)
})
index.query('Green', ['user'], function(err, result){
  assert(!err)
  assert(result.last)
  assert(result.results)
})
index.search('Green', 'user', function(err, result){
  assert(!err)
  assert(result.last)
  assert(result.results)
})
index.search('Green', {
  limit: 100, 
  ttl: 1000 * 60 * 60
}, function(err, result){
  assert(!err)
  assert(result.last)
  assert(result.results)
})
index.search({
  last: '1436ec2e069-bf55e1ed64540b925e13d6bfd21a543c'
}, function(err, result){
  assert(!err)
  assert(result.last)
  assert(result.results)
})

pagination

Every query returns a last parameter. That can be passed to the query/search function to get the next results. When you pass last, you don't need to pass the search query again, because it is saved in the db.

Note that pagination expires in 1h, so if you do a query now, and 2 hours later you want to retrieve the next page, you'll get an error.

The ttl can, however, be tuned in the query options.

Author: Sergioramos
Source Code: https://github.com/sergioramos/inverted 
License: MIT license

#javascript #index 

What is GEEK

Buddha Community

inverted-index for Level with Pagination, Sift3/cosine Distance
Dexter  Goodwin

Dexter Goodwin

1657178700

inverted-index for Level with Pagination, Sift3/cosine Distance

inverted-index

features

  • pagination
  • facets
  • sift3/cosine distance
  • tf-idf ranking
  • stopword removal
  • stemming
  • diactrics replacement
  • number support

install

npm install [--save/--save-dev] inverted-index

api

var inverted = require('inverted-index')

inverted(db[, options[, getter]])

var level = require('level')('/path/to/my/db')
var sublevel = require('sublevel')

var index = inverted(sublevel(db, 'index'), {
  idf: true,
  stem: true,
  rank: true,
  rank_algorithm: 'cosine',
  facets: true
}, function(id, options, fn){
  level.get(id, options, fn)
})

db

Any level API-compatible instance is accepted.

options

The exemplified options is the default configuration.

idf

When idf is flagged as true, for each token indexed an idf (term frequency–inverse document frequency) is calculated. When querying the index, the terms with lowest idf are fetched first. Example:

"Julie loves me more than Linda loves me"
[
  {
    "word": "julie",
    "idf": 1.791759469228055
  },
  {
    "word": "linda",
    "idf": 1.791759469228055
  },
  {
    "word": "loves",
    "idf": 1.0986122886681098
  }
]

Notice that "me", "more" and "than" are not indexed, because those are considered stopwords.

stem

Whether the text should be stemmed or not. When true, the text is stemmed with the Porter stemming algorithm using NaturalNode/natural. Example:

"Fishing is a way of catching cats, he argued in his arguments"

is tokenized into:

["fishing", "is", "a", "way", "of", "catching", "cats", "he", "argued", "in", "his", "arguments"]

and stemmed into:

["fish", "is", "a", "wai", "of", "catch", "cat", "he", "argued", "in", "his", "argum"]

rank

With ranking enabled, when querying it ranks the results based on a defined algorithm. The rank is done AFTER the fetch, so it only ranks using the result set (that can be parcial depending on the size of matching results) comparing the query with the original indexed text, to the tokens.

So, idf is used to fetch tokens ordered by idf and then ranking is done with the original text of each token's correspondent document comparing with the query text. The "problem" with ranking is that if you have 100000 tokens that match the query tokens, only 100 (can be set on the query options) are fetched for each page and THEN the rank is done. Example:

{
  "1": "Fishing is a way of catching cats, Julie argued in her arguments",
  "2": "Julie loves me more than Linda loves me"
}

querying Julie loves would fetch:

[
  {
    "word": "loves",
    "idf": 1.0986122886681098,
    "id": "2"
  },
  {
    "word": "julie",
    "idf": 1.791759469228055,
    "id": "2"
  },
  {
    "word": "julie",
    "idf": 2.4849066497880004,
    "id": "1"
  }
]

and then rank them:

["2", "1"]

rank_algorithm

Only takes effect when rank is set to true. Valid options are cosine or sift3 using ramitos/cosine and ramitos/sift3.

Haven't made any benchmarks on that, but sift3 should be faster. Will get data on that soon.

facets

Enabling facets is useful to query based on types of models. Example:

{
  "1": {
    "text": "Hank Green",
    "facets": ["user"]
  },
  "2": {
    "text": "John Green",
    "facets": ["user"]
  },
  "3": {
    "text": "Johnnie Walker",
    "facets": ["user"]
  },
  "b": {
    "text": "Johnnie Walker",
    "facets": ["brand"]
  }
}

You can then query "Johnnie" with facets ["brand"] and only get:

["b"]

Notice how the result don't include the user 3 because it doesn't have the brand facet.

You can also combine facets with id's to provide property based queries:

{
  "3": {
    "text": "Johnnie Walker johnnie@walker.com",
    "facets": ["user"]
  },
  "3-name": {
    "text": "Johnnie Walker",
    "facets": ["user-name"]
  },
  "3-email": {
    "text": "johnnie@walker.com",
    "facets": ["user-email"]
  }
}

And then query the facets ["user-name"] with the text "johnnie" and get:

["3-name"]

And with that you can just split the results to get the id's.

getter

For ranking results, we need to store the original text. When indexing large amounts of data this can have an impact on disk usage. To prevent that, a function can be passed that receives id, options, and callback as the arguments to fetch the original indexed text for that id.

index(text, id[, facets], callback)

put(text, id[, facets], callback)

link(text, id[, facets], callback)

index.index('john green', 1, ['user'], function(err){
  assert(!err)
})
index.put('Fishing is a way of catching cats, he argued in his arguments', 'b', function(err){
  assert(!err)
})
index.link('Julie loves me more than Linda loves me', '1436ebc684b-c1039c76bdb2b054670f3a1256c98650', ['message'], function(err){
  assert(!err)
})

remove(id, callback)

del(id, callback)

unlink(id, callback)

index.remove(1, function(err){
  assert(!err)
})
index.del('b', function(err){
  assert(!err)
})
index.unlink('1436ebc684b-c1039c76bdb2b054670f3a1256c98650', function(err){
  assert(!err)
})

index.search(query[, facets[, options]], callback)

index.query(query[, facets[, options]], callback)

index.search('Fishing', function(err, result){
  assert(!err)
  assert(result.last)
  assert(result.results)
})
index.query('Green', ['user'], function(err, result){
  assert(!err)
  assert(result.last)
  assert(result.results)
})
index.search('Green', 'user', function(err, result){
  assert(!err)
  assert(result.last)
  assert(result.results)
})
index.search('Green', {
  limit: 100, 
  ttl: 1000 * 60 * 60
}, function(err, result){
  assert(!err)
  assert(result.last)
  assert(result.results)
})
index.search({
  last: '1436ec2e069-bf55e1ed64540b925e13d6bfd21a543c'
}, function(err, result){
  assert(!err)
  assert(result.last)
  assert(result.results)
})

pagination

Every query returns a last parameter. That can be passed to the query/search function to get the next results. When you pass last, you don't need to pass the search query again, because it is saved in the db.

Note that pagination expires in 1h, so if you do a query now, and 2 hours later you want to retrieve the next page, you'll get an error.

The ttl can, however, be tuned in the query options.

Author: Sergioramos
Source Code: https://github.com/sergioramos/inverted 
License: MIT license

#javascript #index 

Pagination Example In Laravel

In this post I will show you pagination example in laravel, as we all know pagination is very common feature in all websites, if we want to display specific number of details or images then we can use pagination.

aravel provide paginate method and it will automatically takes care of setting the proper limit and offset based on the current page being viewed by the user.here i will show you how to use pagination in laravel, So I have try paginate method in laravel.

Pagination Example In Laravel

https://websolutionstuff.com/post/pagination-example-in-laravel

#pagination example in laravel #laravel #pagination #paginate method #how to use pagination in laravel #pagination in laravel

Josefa  Corwin

Josefa Corwin

1659736920

Mailboxer: A Rails Gem to Send Messages inside A Web Application

Mailboxer

This project is based on the need for a private message system for ging / social_stream. Instead of creating our core message system heavily dependent on our development, we are trying to implement a generic and potent messaging gem.

After looking for a good gem to use we noticed the lack of messaging gems and functionality in them. Mailboxer tries to fill this void delivering a powerful and flexible message system. It supports the use of conversations with two or more participants, sending notifications to recipients (intended to be used as system notifications “Your picture has new comments”, “John Doe has updated his document”, etc.), and emailing the messageable model (if configured to do so). It has a complete implementation of a Mailbox object for each messageable with inbox, sentbox and trash.

The gem is constantly growing and improving its functionality. As it is used with our parallel development ging / social_stream we are finding and fixing bugs continously. If you want some functionality not supported yet or marked as TODO, you can create an issue to ask for it. It will be great feedback for us, and we will know what you may find useful in the gem.

Mailboxer was born from the great, but outdated, code from lpsergi / acts_as_messageable.

We are now working to make exhaustive documentation and some wiki pages in order to make it even easier to use the gem to its full potential. Please, give us some time if you find something missing or ask for it. You can also find us on the Gitter room for this repo. Join us there to talk.

Installation

Add to your Gemfile:

gem 'mailboxer'

Then run:

$ bundle install

Run install script:

$ rails g mailboxer:install

And don't forget to migrate your database:

$ rake db:migrate

You can also generate email views:

$ rails g mailboxer:views

Upgrading

If upgrading from 0.11.0 to 0.12.0, run the following generators:

$ rails generate mailboxer:namespacing_compatibility
$ rails generate mailboxer:install -s

Then, migrate your database:

$ rake db:migrate

Requirements & Settings

Emails

We are now adding support for sending emails when a Notification or a Message is sent to one or more recipients. You should modify the mailboxer initializer (/config/initializer/mailboxer.rb) to edit these settings:

Mailboxer.setup do |config|
  #Enables or disables email sending for Notifications and Messages
  config.uses_emails = true
  #Configures the default `from` address for the email sent for Messages and Notifications of Mailboxer
  config.default_from = "no-reply@dit.upm.es"
  ...
end

You can change the way in which emails are delivered by specifying a custom implementation of notification and message mailers:

Mailboxer.setup do |config|
  config.notification_mailer = CustomNotificationMailer
  config.message_mailer = CustomMessageMailer
  ...
end

If you have subclassed the Mailboxer::Notification class, you can specify the mailers using a member method:

class NewDocumentNotification < Mailboxer::Notification
  def mailer_class
    NewDocumentNotificationMailer
  end
end

class NewCommentNotification < Mailboxer::Notification
  def mailer_class
    NewDocumentNotificationMailer
  end
end

Otherwise, the mailer class will be determined by appending 'Mailer' to the mailable class name.

User identities

Users must have an identity defined by a name and an email. We must ensure that Messageable models have some specific methods. These methods are:

#Returning any kind of identification you want for the model
def name
  return "You should add method :name in your Messageable model"
end
#Returning the email address of the model if an email should be sent for this object (Message or Notification).
#If no mail has to be sent, return nil.
def mailboxer_email(object)
  #Check if an email should be sent for that object
  #if true
  return "define_email@on_your.model"
  #if false
  #return nil
end

These names are explicit enough to avoid colliding with other methods, but as long as you need to change them you can do it by using mailboxer initializer (/config/initializer/mailboxer.rb). Just add or uncomment the following lines:

Mailboxer.setup do |config|
  # ...
  #Configures the methods needed by mailboxer
  config.email_method = :mailboxer_email
  config.name_method = :name
  config.notify_method = :notify
  # ...
end

You may change whatever you want or need. For example:

config.email_method = :notification_email
config.name_method = :display_name
config.notify_method = :notify_mailboxer

Will use the method notification_email(object) instead of mailboxer_email(object), display_name for name and notify_mailboxer for notify.

Using default or custom method names, if your model doesn't implement them, Mailboxer will use dummy methods so as to notify you of missing methods rather than crashing.

Preparing your models

In your model:

class User < ActiveRecord::Base
  acts_as_messageable
end

You are not limited to the User model. You can use Mailboxer in any other model and use it in several different models. If you have ducks and cylons in your application and you want to exchange messages as if they were the same, just add acts_as_messageable to each one and you will be able to send duck-duck, duck-cylon, cylon-duck and cylon-cylon messages. Of course, you can extend it for as many classes as you need.

Example:

class Duck < ActiveRecord::Base
  acts_as_messageable
end
class Cylon < ActiveRecord::Base
  acts_as_messageable
end

Mailboxer API

Warning for version 0.8.0

Version 0.8.0 sees Messageable#read and Messageable#unread renamed to mark_as_(un)read, and Receipt#read and Receipt#unread to is_(un)read. This may break existing applications, but read is a reserved name for Active Record, and the best pratice in this case is simply avoid using it.

How can I send a message?

#alfa wants to send a message to beta
alfa.send_message(beta, "Body", "subject")

How can I read the messages of a conversation?

As a messageable, what you receive are receipts, which are associated with the message itself. You should retrieve your receipts for the conversation and get the message associated with them.

This is done this way because receipts save the information about the relation between messageable and the messages: is it read?, is it trashed?, etc.

#alfa gets the last conversation (chronologically, the first in the inbox)
conversation = alfa.mailbox.inbox.first

#alfa gets it receipts chronologically ordered.
receipts = conversation.receipts_for alfa

#using the receipts (i.e. in the view)
receipts.each do |receipt|
  ...
  message = receipt.message
  read = receipt.is_unread? #or message.is_unread?(alfa)
  ...
end

How can I reply to a message?

#alfa wants to reply to all in a conversation
#using a receipt
alfa.reply_to_all(receipt, "Reply body")

#using a conversation
alfa.reply_to_conversation(conversation, "Reply body")
#alfa wants to reply to the sender of a message (and ONLY the sender)
#using a receipt
alfa.reply_to_sender(receipt, "Reply body")

How can I delete a message from trash?

#delete conversations forever for one receipt (still in database)
receipt.mark_as_deleted

#you can mark conversation as deleted for one participant
conversation.mark_as_deleted participant

#Mark the object as deleted for messageable
#Object can be:
  #* A Receipt
  #* A Conversation
  #* A Notification
  #* A Message
  #* An array with any of them
alfa.mark_as_deleted conversation

# get available message for specific user
conversation.messages_for(alfa)

How can I retrieve my conversations?

#alfa wants to retrieve all his conversations
alfa.mailbox.conversations

#A wants to retrieve his inbox
alfa.mailbox.inbox

#A wants to retrieve his sent conversations
alfa.mailbox.sentbox

#alfa wants to retrieve his trashed conversations
alfa.mailbox.trash

How can I paginate conversations?

You can use Kaminari to paginate the conversations as normal. Please, make sure you use the last version as mailboxer uses select('DISTINCT conversations.*') which was not respected before Kaminari 0.12.4 according to its changelog. Working correctly on Kaminari 0.13.0.

#Paginating all conversations using :page parameter and 9 per page
conversations = alfa.mailbox.conversations.page(params[:page]).per(9)

#Paginating received conversations using :page parameter and 9 per page
conversations = alfa.mailbox.inbox.page(params[:page]).per(9)

#Paginating sent conversations using :page parameter and 9 per page
conversations = alfa.mailbox.sentbox.page(params[:page]).per(9)

#Paginating trashed conversations using :page parameter and 9 per page
conversations = alfa.mailbox.trash.page(params[:page]).per(9)

You can take a look at the full documentation for Mailboxer in rubydoc.info.

Do you want to test Mailboxer?

Thanks to Roman Kushnir (@RKushnir) you can test Mailboxer with this sample app.

I need a GUI!

If you need a GUI you should take a look at these links:

Contributors


Author: mailboxer
Source code: https://github.com/mailboxer/mailboxer
License: MIT license

#ruby  #ruby-on-rails 

Lawrence  Lesch

Lawrence Lesch

1657169100

Level-inverted-index: inverted Index for Levelup

level-inverted-index

Inverted Index for levelup.

Example

With prewired defaults!


var indexDb = InvertedIndex(db, 'index')

//get list of documents containing 'search' and 'term'
indexDb.query(['search', 'term'], function (err, docs){
  console.log(docs)
})

//stream stubs of documents
indexDb.createQueryStream(['search', 'term'])
  .on('data', console.log)

Configurable!


var indexDb = InvertedIndex(db, 'index',

  //extract words from documents

  function map(key, value, index) {
    //parse, and pull out any bits of text you want,
    //call index with an index and a rank!

    //here we split by anything that is not a letter
    //or a number
    index(value.split(/[^\w\d]+/))
  }, 

  //when quering with createQueryStream,
  //stub converts from each doc, to a short preview.
  //Possibly, highlighting matches with the query!
  function stub(doc, query) {
    return doc.substring(0, 140) + '...\n'
  })

insert documents into the database like you normally would.

fs.readFile('readme.md', function (err, value) {
  db.put('readme.md', value, function () {
    //and so on!
  })
})

//run an index batch
indexDb.start()

TODO

add realtime update to level-map-merge and get them in level-inverted-index for free.

https://github.com/dominictarr/level-map-merge/blob/master/index.js#L106-L109

Author: Dominictarr
Source Code: https://github.com/dominictarr/level-inverted-index 
License: MIT license

#javascript #index 

Shawn  Durgan

Shawn Durgan

1598750220

MongoDB Indexes: Deep Dive, Understanding Indexes.

Getting a performance boost with the best usage of indexes, by understanding what’s the data structure, how it works’s/stored, how is it loaded into memory. How Query optimization make’s decision to select indexes.

_Basic understanding of indexes is required i.e what are indexes, index types, creating them. _https://docs.mongodb.com/manual/indexes/

  • Data Structure
  • Storage on disk
  • Memory Allocation.

Data Structure

Index on a filed/fields is stored in order that we specify using B-Tree data structure. Stored in ordered Let see what does it mean’s and how it help’s.

  • Index is created on the value of the filed referencing to the actual document stored.

Image for post

snipped from: MognoDB university

  • Using B-Tree indexes significantly reduces the number of comparison to find the document.

Image for post

snipped from: MognoDB university

  • Likewise in below picture we can see with index(sky blue line) even adding document still limit the number of document examined in comparison to without index/collscan.

Image for post

snipped from: MognoDB university

Storage On Disk

Let’s see/visualize how the index are stored on disk. Index stored on disk is managed by the database storage engine itself.

  • Uses prefix Index compression- Repeated prefix value is not written, let us see example to understand what it mean’s.
db.getCollection("movieTicket")
.ensureIndex({"showDate":1, "seatNo":1, "status":1});

How the index ({“showDate”:1, “seatNo”:1, “status”:1}) is stored on disk.

Image for post

showDate_1_seatNo_1_status_1

#index #mongodb #indexing #mongo