Cakephp-token-verify: Token Verify plugin for CakePHP3

Token Verify plugin for CakePHP3

JWT for mail authentication.

Easily issue tokens(JWT) that can be used for mail authentication.
No need for token field in table.
one-time/url-safe/safety :+1:

Requirements

  • PHP 7.0+
  • CakePHP 3.0.0+

Installation

composer require mosaxiv/cakephp-token-verify

Example

reset password

CREATE TABLE users (
    id INT AUTO_INCREMENT PRIMARY KEY, # Required
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) NOT NULL,
    password VARCHAR(255) NOT NULL,
    created DATETIME,
    modified DATETIME # Required
);
// app/src/Model/Entity/User.php

use Token\Model\Entity\TokenTrait;

class User extends Entity
{
    use TokenTrait;
}
// app/src/Controller/UsersController.php

use Cake\Routing\Router;
use Token\Util\Token;

class UsersController extends AppController
{

    public function forgotPassword()
    {
        if ($this->request->is('post')) {
            $email = $this->request->getData('email');
            $user = $this->Users->findByEmail($email)->first();
            if ($user) {
                $token = $user->tokenGenerate();
                $url = Router::url(['controller' => 'User', 'action' => 'resetPassword', $token], true);
                // send email
            }
        }
    }

    public function resetPassword($token)
    {
        $user = $this->Users->get(Token::getId($token));
        if (!$user->tokenVerify($token)) {
            throw new \Cake\Network\Exception\NotFoundException();
        }

        if ($this->request->is('post')) {
            $user = $this->Users->patchEntity($user, $this->request->getData());
            if ($this->Users->save($user)) {
                // success
            } else {
                // error
            }
        }
    }
}

Usage

Required database field

  • id field
  • modified field

By using modified field, JWT can be used as one-time tokens.
JWT should be discarded when the table is updated.

Token\Model\Entity\TokenTrait

Used in entity.

tokenGenerate($minits = 10)

// token generate(default token expiration in 10 minits)
$token = $entity->tokenGenerate();

// token generate(token expiration in 60 minits)
$token = $entity->tokenGenerate(60);

tokenVerify($token)

$user->tokenVerify($token) // true or false

setTokenData($name, $value)

※ It does not encrypt the set data

$user->setTokenData('test', 'testdata')

Token\Util\Token

Token::getId($token)

Token::getId($token) // id or false

Token::getData($token, $name)

Token::getData($token, 'test') // data or false

Author: Mosaxiv
Source Code: https://github.com/mosaxiv/cakephp-token-verify 
License: MIT license

#php #cakephp #token 

What is GEEK

Buddha Community

Cakephp-token-verify: Token Verify plugin for CakePHP3

Cakephp-token-verify: Token Verify plugin for CakePHP3

Token Verify plugin for CakePHP3

JWT for mail authentication.

Easily issue tokens(JWT) that can be used for mail authentication.
No need for token field in table.
one-time/url-safe/safety :+1:

Requirements

  • PHP 7.0+
  • CakePHP 3.0.0+

Installation

composer require mosaxiv/cakephp-token-verify

Example

reset password

CREATE TABLE users (
    id INT AUTO_INCREMENT PRIMARY KEY, # Required
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) NOT NULL,
    password VARCHAR(255) NOT NULL,
    created DATETIME,
    modified DATETIME # Required
);
// app/src/Model/Entity/User.php

use Token\Model\Entity\TokenTrait;

class User extends Entity
{
    use TokenTrait;
}
// app/src/Controller/UsersController.php

use Cake\Routing\Router;
use Token\Util\Token;

class UsersController extends AppController
{

    public function forgotPassword()
    {
        if ($this->request->is('post')) {
            $email = $this->request->getData('email');
            $user = $this->Users->findByEmail($email)->first();
            if ($user) {
                $token = $user->tokenGenerate();
                $url = Router::url(['controller' => 'User', 'action' => 'resetPassword', $token], true);
                // send email
            }
        }
    }

    public function resetPassword($token)
    {
        $user = $this->Users->get(Token::getId($token));
        if (!$user->tokenVerify($token)) {
            throw new \Cake\Network\Exception\NotFoundException();
        }

        if ($this->request->is('post')) {
            $user = $this->Users->patchEntity($user, $this->request->getData());
            if ($this->Users->save($user)) {
                // success
            } else {
                // error
            }
        }
    }
}

Usage

Required database field

  • id field
  • modified field

By using modified field, JWT can be used as one-time tokens.
JWT should be discarded when the table is updated.

Token\Model\Entity\TokenTrait

Used in entity.

tokenGenerate($minits = 10)

// token generate(default token expiration in 10 minits)
$token = $entity->tokenGenerate();

// token generate(token expiration in 60 minits)
$token = $entity->tokenGenerate(60);

tokenVerify($token)

$user->tokenVerify($token) // true or false

setTokenData($name, $value)

※ It does not encrypt the set data

$user->setTokenData('test', 'testdata')

Token\Util\Token

Token::getId($token)

Token::getId($token) // id or false

Token::getData($token, $name)

Token::getData($token, 'test') // data or false

Author: Mosaxiv
Source Code: https://github.com/mosaxiv/cakephp-token-verify 
License: MIT license

#php #cakephp #token 

Hire Dedicated CakePHP Developers | Hire CakePHP Developers USA

The foundational structure behind removing the monotony from the development process is obtained by the CakePHP framework. The framework is open-sourced and helps in creating web applications rapidly.

Want a web application in a short time duration with the CakePHP framework?

Hire Dedicated CakePHP developers from the experts WebClues Infotech and take the step toward business growth. The developer WebClues Infotech offers to its clients is highly skilled and expert in the project requirements mentioned by the customers.

Look no further and share with us your requirements to see our commitments in action

Share your requirements here https://www.webcluesinfotech.com/contact-us/

Book Free Interview with CakePHP Developers: https://bit.ly/3dDShFg

#hire dedicated cakephp developers #hire cakephp developer #hire dedicated cakephp web developers #hire dedicated cakephp developers or programmer #hire cakephp developer #hire cakephp developer india

How To Customize WordPress Plugins? (4 Easy Ways To Do)

This is image title
WordPress needs no introduction. It has been in the world for quite a long time. And up till now, it has given a tough fight to leading web development technology. The main reason behind its remarkable success is, it is highly customizable and also SEO-friendly. Other benefits include open-source technology, security, user-friendliness, and the thousands of free plugins it offers.

Talking of WordPress plugins, are a piece of software that enables you to add more features to the website. They are easy to integrate into your website and don’t hamper the performance of the site. WordPress, as a leading technology, has to offer many out-of-the-box plugins.

However, not always the WordPress would be able to meet your all needs. Hence you have to customize the WordPress plugin to provide you the functionality you wished. WordPress Plugins are easy to install and customize. You don’t have to build the solution from scratch and that’s one of the reasons why small and medium-sized businesses love it. It doesn’t need a hefty investment or the hiring of an in-house development team. You can use the core functionality of the plugin and expand it as your like.

In this blog, we would be talking in-depth about plugins and how to customize WordPress plugins to improve the functionality of your web applications.

What Is The Working Of The WordPress Plugins?

Developing your own plugin requires you to have some knowledge of the way they work. It ensures the better functioning of the customized plugins and avoids any mistakes that can hamper the experience on your site.

1. Hooks

Plugins operate primarily using hooks. As a hook attaches you to something, the same way a feature or functionality is hooked to your website. The piece of code interacts with the other components present on the website. There are two types of hooks: a. Action and b. Filter.

A. Action

If you want something to happen at a particular time, you need to use a WordPress “action” hook. With actions, you can add, change and improve the functionality of your plugin. It allows you to attach a new action that can be triggered by your users on the website.

There are several predefined actions available on WordPress, custom WordPress plugin development also allows you to develop your own action. This way you can make your plugin function as your want. It also allows you to set values for which the hook function. The add_ action function will then connect that function to a specific action.

B. Filters

They are the type of hooks that are accepted to a single variable or a series of variables. It sends them back after they have modified it. It allows you to change the content displayed to the user.

You can add the filter on your website with the apply_filter function, then you can define the filter under the function. To add a filter hook on the website, you have to add the $tag (the filter name) and $value (the filtered value or variable), this allows the hook to work. Also, you can add extra function values under $var.

Once you have made your filter, you can execute it with the add_filter function. This will activate your filter and would work when a specific function is triggered. You can also manipulate the variable and return it.

2. Shortcodes

Shortcodes are a good way to create and display the custom functionality of your website to visitors. They are client-side bits of code. They can be placed in the posts and pages like in the menu and widgets, etc.

There are many plugins that use shortcodes. By creating your very own shortcode, you too can customize the WordPress plugin. You can create your own shortcode with the add_shortcode function. The name of the shortcode that you use would be the first variable and the second variable would be the output of it when it is triggered. The output can be – attributes, content, and name.

3. Widgets

Other than the hooks and shortcodes, you can use the widgets to add functionality to the site. WordPress Widgets are a good way to create a widget by extending the WP_Widget class. They render a user-friendly experience, as they have an object-oriented design approach and the functions and values are stored in a single entity.

How To Customize WordPress Plugins?

There are various methods to customize the WordPress plugins. Depending on your need, and the degree of customization you wish to make in the plugin, choose the right option for you. Also, don’t forget to keep in mind that it requires a little bit of technical knowledge too. So find an expert WordPress plugin development company in case you lack the knowledge to do it by yourself.

1. Hire A Plugin Developer3
This is image title

One of the best ways to customize a WordPress plugin is by hiring a plugin developer. There are many plugin developers listed in the WordPress directory. You can contact them and collaborate with world-class WordPress developers. It is quite easy to find a WordPress plugin developer.

Since it is not much work and doesn’t pay well or for the long term a lot of developers would be unwilling to collaborate but, you will eventually find people.

2. Creating A Supporting Plugin

If you are looking for added functionality in an already existing plugin go for this option. It is a cheap way to meet your needs and creating a supporting plugin takes very little time as it has very limited needs. Furthermore, you can extend a plugin to a current feature set without altering its base code.

However, to do so, you have to hire a WordPress developer as it also requires some technical knowledge.

3. Use Custom Hooks

Use the WordPress hooks to integrate some other feature into an existing plugin. You can add an action or a filter as per your need and improve the functionality of the website.

If the plugin you want to customize has the hook, you don’t have to do much to customize it. You can write your own plugin that works with these hooks. This way you don’t have to build a WordPress plugin right from scratch. If the hook is not present in the plugin code, you can contact a WordPress developer or write the code yourself. It may take some time, but it works.

Once the hook is added, you just have to manually patch each one upon the release of the new plugin update.

4. Override Callbacks

The last way to customize WordPress plugins is by override callbacks. You can alter the core functionality of the WordPress plugin with this method. You can completely change the way it functions with your website. It is a way to completely transform the plugin. By adding your own custom callbacks, you can create the exact functionality you desire.

We suggest you go for a web developer proficient in WordPress as this requires a good amount of technical knowledge and the working of a plugin.

Read More

#customize wordpress plugins #how to customize plugins in wordpress #how to customize wordpress plugins #how to edit plugins in wordpress #how to edit wordpress plugins #wordpress plugin customization

Words Counted: A Ruby Natural Language Processor.

WordsCounted

We are all in the gutter, but some of us are looking at the stars.

-- Oscar Wilde

WordsCounted is a Ruby NLP (natural language processor). WordsCounted lets you implement powerful tokensation strategies with a very flexible tokeniser class.

Are you using WordsCounted to do something interesting? Please tell me about it.

 

Demo

Visit this website for one example of what you can do with WordsCounted.

Features

  • Out of the box, get the following data from any string or readable file, or URL:
    • Token count and unique token count
    • Token densities, frequencies, and lengths
    • Char count and average chars per token
    • The longest tokens and their lengths
    • The most frequent tokens and their frequencies.
  • A flexible way to exclude tokens from the tokeniser. You can pass a string, regexp, symbol, lambda, or an array of any combination of those types for powerful tokenisation strategies.
  • Pass your own regexp rules to the tokeniser if you prefer. The default regexp filters special characters but keeps hyphens and apostrophes. It also plays nicely with diacritics (UTF and unicode characters): Bayrūt is treated as ["Bayrūt"] and not ["Bayr", "ū", "t"], for example.
  • Opens and reads files. Pass in a file path or a url instead of a string.

Installation

Add this line to your application's Gemfile:

gem 'words_counted'

And then execute:

$ bundle

Or install it yourself as:

$ gem install words_counted

Usage

Pass in a string or a file path, and an optional filter and/or regexp.

counter = WordsCounted.count(
  "We are all in the gutter, but some of us are looking at the stars."
)

# Using a file
counter = WordsCounted.from_file("path/or/url/to/my/file.txt")

.count and .from_file are convenience methods that take an input, tokenise it, and return an instance of WordsCounted::Counter initialized with the tokens. The WordsCounted::Tokeniser and WordsCounted::Counter classes can be used alone, however.

API

WordsCounted

WordsCounted.count(input, options = {})

Tokenises input and initializes a WordsCounted::Counter object with the resulting tokens.

counter = WordsCounted.count("Hello Beirut!")

Accepts two options: exclude and regexp. See Excluding tokens from the analyser and Passing in a custom regexp respectively.

WordsCounted.from_file(path, options = {})

Reads and tokenises a file, and initializes a WordsCounted::Counter object with the resulting tokens.

counter = WordsCounted.from_file("hello_beirut.txt")

Accepts the same options as .count.

Tokeniser

The tokeniser allows you to tokenise text in a variety of ways. You can pass in your own rules for tokenisation, and apply a powerful filter with any combination of rules as long as they can boil down into a lambda.

Out of the box the tokeniser includes only alpha chars. Hyphenated tokens and tokens with apostrophes are considered a single token.

#tokenise([pattern: TOKEN_REGEXP, exclude: nil])

tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise

# With `exclude`
tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise(exclude: "hello")

# With `pattern`
tokeniser = WordsCounted::Tokeniser.new("I <3 Beirut!").tokenise(pattern: /[a-z]/i)

See Excluding tokens from the analyser and Passing in a custom regexp for more information.

Counter

The WordsCounted::Counter class allows you to collect various statistics from an array of tokens.

#token_count

Returns the token count of a given string.

counter.token_count #=> 15

#token_frequency

Returns a sorted (unstable) two-dimensional array where each element is a token and its frequency. The array is sorted by frequency in descending order.

counter.token_frequency

[
  ["the", 2],
  ["are", 2],
  ["we",  1],
  # ...
  ["all", 1]
]

#most_frequent_tokens

Returns a hash where each key-value pair is a token and its frequency.

counter.most_frequent_tokens

{ "are" => 2, "the" => 2 }

#token_lengths

Returns a sorted (unstable) two-dimentional array where each element contains a token and its length. The array is sorted by length in descending order.

counter.token_lengths

[
  ["looking", 7],
  ["gutter",  6],
  ["stars",   5],
  # ...
  ["in",      2]
]

#longest_tokens

Returns a hash where each key-value pair is a token and its length.

counter.longest_tokens

{ "looking" => 7 }

#token_density([ precision: 2 ])

Returns a sorted (unstable) two-dimentional array where each element contains a token and its density as a float, rounded to a precision of two. The array is sorted by density in descending order. It accepts a precision argument, which must be a float.

counter.token_density

[
  ["are",     0.13],
  ["the",     0.13],
  ["but",     0.07 ],
  # ...
  ["we",      0.07 ]
]

#char_count

Returns the char count of tokens.

counter.char_count #=> 76

#average_chars_per_token([ precision: 2 ])

Returns the average char count per token rounded to two decimal places. Accepts a precision argument which defaults to two. Precision must be a float.

counter.average_chars_per_token #=> 4

#uniq_token_count

Returns the number of unique tokens.

counter.uniq_token_count #=> 13

Excluding tokens from the tokeniser

You can exclude anything you want from the input by passing the exclude option. The exclude option accepts a variety of filters and is extremely flexible.

  1. A space-delimited string. The filter will normalise the string.
  2. A regular expression.
  3. A lambda.
  4. A symbol that names a predicate method. For example :odd?.
  5. An array of any combination of the above.
tokeniser =
  WordsCounted::Tokeniser.new(
    "Magnificent! That was magnificent, Trevor."
  )

# Using a string
tokeniser.tokenise(exclude: "was magnificent")
# => ["that", "trevor"]

# Using a regular expression
tokeniser.tokenise(exclude: /trevor/)
# => ["magnificent", "that", "was", "magnificent"]

# Using a lambda
tokeniser.tokenise(exclude: ->(t) { t.length < 4 })
# => ["magnificent", "that", "magnificent", "trevor"]

# Using symbol
tokeniser = WordsCounted::Tokeniser.new("Hello! محمد")
tokeniser.tokenise(exclude: :ascii_only?)
# => ["محمد"]

# Using an array
tokeniser = WordsCounted::Tokeniser.new(
  "Hello! اسماءنا هي محمد، كارولينا، سامي، وداني"
)
tokeniser.tokenise(
  exclude: [:ascii_only?, /محمد/, ->(t) { t.length > 6}, "و"]
)
# => ["هي", "سامي", "وداني"]

Passing in a custom regexp

The default regexp accounts for letters, hyphenated tokens, and apostrophes. This means twenty-one is treated as one token. So is Mohamad's.

/[\p{Alpha}\-']+/

You can pass your own criteria as a Ruby regular expression to split your string as desired.

For example, if you wanted to include numbers, you can override the regular expression:

counter = WordsCounted.count("Numbers 1, 2, and 3", pattern: /[\p{Alnum}\-']+/)
counter.tokens
#=> ["numbers", "1", "2", "and", "3"]

Opening and reading files

Use the from_file method to open files. from_file accepts the same options as .count. The file path can be a URL.

counter = WordsCounted.from_file("url/or/path/to/file.text")

Gotchas

A hyphen used in leu of an em or en dash will form part of the token. This affects the tokeniser algorithm.

counter = WordsCounted.count("How do you do?-you are well, I see.")
counter.token_frequency

[
  ["do",   2],
  ["how",  1],
  ["you",  1],
  ["-you", 1], # WTF, mate!
  ["are",  1],
  # ...
]

In this example -you and you are separate tokens. Also, the tokeniser does not include numbers by default. Remember that you can pass your own regular expression if the default behaviour does not fit your needs.

A note on case sensitivity

The program will normalise (downcase) all incoming strings for consistency and filters.

Roadmap

Ability to open URLs

def self.from_url
  # open url and send string here after removing html
end

Contributors

See contributors.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Author: abitdodgy
Source code: https://github.com/abitdodgy/words_counted
License: MIT license

#ruby  #ruby-on-rails 

Royce  Reinger

Royce Reinger

1658068560

WordsCounted: A Ruby Natural Language Processor

WordsCounted

We are all in the gutter, but some of us are looking at the stars.

-- Oscar Wilde

WordsCounted is a Ruby NLP (natural language processor). WordsCounted lets you implement powerful tokensation strategies with a very flexible tokeniser class.

Features

  • Out of the box, get the following data from any string or readable file, or URL:
    • Token count and unique token count
    • Token densities, frequencies, and lengths
    • Char count and average chars per token
    • The longest tokens and their lengths
    • The most frequent tokens and their frequencies.
  • A flexible way to exclude tokens from the tokeniser. You can pass a string, regexp, symbol, lambda, or an array of any combination of those types for powerful tokenisation strategies.
  • Pass your own regexp rules to the tokeniser if you prefer. The default regexp filters special characters but keeps hyphens and apostrophes. It also plays nicely with diacritics (UTF and unicode characters): Bayrūt is treated as ["Bayrūt"] and not ["Bayr", "ū", "t"], for example.
  • Opens and reads files. Pass in a file path or a url instead of a string.

Installation

Add this line to your application's Gemfile:

gem 'words_counted'

And then execute:

$ bundle

Or install it yourself as:

$ gem install words_counted

Usage

Pass in a string or a file path, and an optional filter and/or regexp.

counter = WordsCounted.count(
  "We are all in the gutter, but some of us are looking at the stars."
)

# Using a file
counter = WordsCounted.from_file("path/or/url/to/my/file.txt")

.count and .from_file are convenience methods that take an input, tokenise it, and return an instance of WordsCounted::Counter initialized with the tokens. The WordsCounted::Tokeniser and WordsCounted::Counter classes can be used alone, however.

API

WordsCounted

WordsCounted.count(input, options = {})

Tokenises input and initializes a WordsCounted::Counter object with the resulting tokens.

counter = WordsCounted.count("Hello Beirut!")

Accepts two options: exclude and regexp. See Excluding tokens from the analyser and Passing in a custom regexp respectively.

WordsCounted.from_file(path, options = {})

Reads and tokenises a file, and initializes a WordsCounted::Counter object with the resulting tokens.

counter = WordsCounted.from_file("hello_beirut.txt")

Accepts the same options as .count.

Tokeniser

The tokeniser allows you to tokenise text in a variety of ways. You can pass in your own rules for tokenisation, and apply a powerful filter with any combination of rules as long as they can boil down into a lambda.

Out of the box the tokeniser includes only alpha chars. Hyphenated tokens and tokens with apostrophes are considered a single token.

#tokenise([pattern: TOKEN_REGEXP, exclude: nil])

tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise

# With `exclude`
tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise(exclude: "hello")

# With `pattern`
tokeniser = WordsCounted::Tokeniser.new("I <3 Beirut!").tokenise(pattern: /[a-z]/i)

See Excluding tokens from the analyser and Passing in a custom regexp for more information.

Counter

The WordsCounted::Counter class allows you to collect various statistics from an array of tokens.

#token_count

Returns the token count of a given string.

counter.token_count #=> 15

#token_frequency

Returns a sorted (unstable) two-dimensional array where each element is a token and its frequency. The array is sorted by frequency in descending order.

counter.token_frequency

[
  ["the", 2],
  ["are", 2],
  ["we",  1],
  # ...
  ["all", 1]
]

#most_frequent_tokens

Returns a hash where each key-value pair is a token and its frequency.

counter.most_frequent_tokens

{ "are" => 2, "the" => 2 }

#token_lengths

Returns a sorted (unstable) two-dimentional array where each element contains a token and its length. The array is sorted by length in descending order.

counter.token_lengths

[
  ["looking", 7],
  ["gutter",  6],
  ["stars",   5],
  # ...
  ["in",      2]
]

#longest_tokens

Returns a hash where each key-value pair is a token and its length.

counter.longest_tokens

{ "looking" => 7 }

#token_density([ precision: 2 ])

Returns a sorted (unstable) two-dimentional array where each element contains a token and its density as a float, rounded to a precision of two. The array is sorted by density in descending order. It accepts a precision argument, which must be a float.

counter.token_density

[
  ["are",     0.13],
  ["the",     0.13],
  ["but",     0.07 ],
  # ...
  ["we",      0.07 ]
]

#char_count

Returns the char count of tokens.

counter.char_count #=> 76

#average_chars_per_token([ precision: 2 ])

Returns the average char count per token rounded to two decimal places. Accepts a precision argument which defaults to two. Precision must be a float.

counter.average_chars_per_token #=> 4

#uniq_token_count

Returns the number of unique tokens.

counter.uniq_token_count #=> 13

Excluding tokens from the tokeniser

You can exclude anything you want from the input by passing the exclude option. The exclude option accepts a variety of filters and is extremely flexible.

  1. A space-delimited string. The filter will normalise the string.
  2. A regular expression.
  3. A lambda.
  4. A symbol that names a predicate method. For example :odd?.
  5. An array of any combination of the above.
tokeniser =
  WordsCounted::Tokeniser.new(
    "Magnificent! That was magnificent, Trevor."
  )

# Using a string
tokeniser.tokenise(exclude: "was magnificent")
# => ["that", "trevor"]

# Using a regular expression
tokeniser.tokenise(exclude: /trevor/)
# => ["magnificent", "that", "was", "magnificent"]

# Using a lambda
tokeniser.tokenise(exclude: ->(t) { t.length < 4 })
# => ["magnificent", "that", "magnificent", "trevor"]

# Using symbol
tokeniser = WordsCounted::Tokeniser.new("Hello! محمد")
tokeniser.tokenise(exclude: :ascii_only?)
# => ["محمد"]

# Using an array
tokeniser = WordsCounted::Tokeniser.new(
  "Hello! اسماءنا هي محمد، كارولينا، سامي، وداني"
)
tokeniser.tokenise(
  exclude: [:ascii_only?, /محمد/, ->(t) { t.length > 6}, "و"]
)
# => ["هي", "سامي", "وداني"]

Passing in a custom regexp

The default regexp accounts for letters, hyphenated tokens, and apostrophes. This means twenty-one is treated as one token. So is Mohamad's.

/[\p{Alpha}\-']+/

You can pass your own criteria as a Ruby regular expression to split your string as desired.

For example, if you wanted to include numbers, you can override the regular expression:

counter = WordsCounted.count("Numbers 1, 2, and 3", pattern: /[\p{Alnum}\-']+/)
counter.tokens
#=> ["numbers", "1", "2", "and", "3"]

Opening and reading files

Use the from_file method to open files. from_file accepts the same options as .count. The file path can be a URL.

counter = WordsCounted.from_file("url/or/path/to/file.text")

Gotchas

A hyphen used in leu of an em or en dash will form part of the token. This affects the tokeniser algorithm.

counter = WordsCounted.count("How do you do?-you are well, I see.")
counter.token_frequency

[
  ["do",   2],
  ["how",  1],
  ["you",  1],
  ["-you", 1], # WTF, mate!
  ["are",  1],
  # ...
]

In this example -you and you are separate tokens. Also, the tokeniser does not include numbers by default. Remember that you can pass your own regular expression if the default behaviour does not fit your needs.

A note on case sensitivity

The program will normalise (downcase) all incoming strings for consistency and filters.

Roadmap

Ability to open URLs

def self.from_url
  # open url and send string here after removing html
end

Are you using WordsCounted to do something interesting? Please tell me about it.

Gem Version 

RubyDoc documentation.

Demo

Visit this website for one example of what you can do with WordsCounted.


Contributors

See contributors.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Author: Abitdodgy
Source Code: https://github.com/abitdodgy/words_counted 
License: MIT license

#ruby #nlp