What is Unifi Protocol DAO (UNFI) | What is Unifi Protocol DAO token |What is UNFI token

What is Unifi Protocol DAO (UNFI)?

Unifi Protocol DAO is a group of non-custodial, interoperable, decentralized, multi-chain smart contracts providing the building blocks for DeFi development. The project provides a bridge to connect the economy of Ethereum-based DeFi products to the growing DeFi markets on other blockchains.uTrade, a decentralized Automated Market Maker (AMM) and token exchange, is the first product built on the Unifi Protocol DAO smart contracts. uTrade is made up of a group of smart contracts called a uPair. Multiple token trades can be executed within one uPair, allowing a user to trade or fund liquidity in uPairs with an unlimited combination of tokens.The UP token is minted as a representation of the revenue generated by the Unifi Protocol DAO ecosystem. The tokenomics of UP tokens may differ from one blockchain to another to customize the rewards provided and ensure a sustainable marketplace. The UP token has no max token supply, and can be swapped to UNFI based on dynamic market price rates.UNFI token holders can earn UNFI rewards by staking and delegating to Community Council Representatives. Representatives and their communities across multiple blockchains forms the Unifi Global Governance Model powered by UNFI.

source : TokenClan

The Fast and Secure Protocol Bridging the DeFi Economy

Unifi Protocol is a group of non-custodial, interoperable, decentralized, and multi-chain smart contracts providing the building blocks for the next generation of DeFi development. Unifi provides a bridge to connect the existing economy of Ethereum-based DeFi products to growing DeFi markets on other blockchains. Unifi utilizes the time tested SEED Bridge to connect any Ethereum or EVM/NVM-based DeFi products to other blockchains.

The Unlimited Potential Token (UP) is at the core of Unifi’s DeFi system. UP minting occurs in direct proportion to the fees and other revenue generated by the entire Unifi Protocol. This includes designated fees or revenue from all Unifi-developed platforms, as well as from any projects developed by others which utilize Unifi Protocol. The minting process is designed to ensure the redemption value of all UP tokens must go UP every time a new UP token is minted. UP token creates a very unique and lucrative rewards structure for all holders, protocol users, and liquidity providers.

Unifi Protocol is designed around a decentralized autonomous organization (DAO) governed via a decentralized governance token.

The following are highlights taken from the full Unifi Protocol whitepaper. The complete white paper is available here: Unifi White Paper

Image for post

Introducing Unifi Protocol

  • Unifi Protocol unifies the multi-chain DeFi world through the SEED Bridge. This allows Unifi to be a multi-chain and non-custodial decentralized swap and complete DeFi protocol.
  • Unifi Protocol features incentivized liquidity pooling, a fee sharing governance token, loyalty rewards token, lending platform, and cross-chain swap interoperability.
  • Unifi was developed by Sesameseed, the multi-chain staking community committed to bringing their innovative technology solutions to all their represented blockchains.
  • Unifi Protocol adds value to each blockchain upon which it is deployed, allowing developers to build a range of products and services around the smart contract based building blocks of the protocol.

What is uTrade (and how does it relate to Unifi)?

  • uTrade is a trading platform built utilizing Unifi Protocol.
  • uTrade operates as a decentralized, unlimited liquidity market maker, where one token is exchanged for another, utilizing liquidity provided by incentivized liquidity providers.

Why is uTrade better than existing trading platforms?

  • Liquidity providers share a portion of the revenue generated by the entire Unifi Protocol, and are not limited to a share of fees earned solely from a specific trading pair.
  • Even after liquidity is withdrawn, liquidity providers continue to share in fees as long as they hold UP tokens.
  • Traders are refunded a portion of their trading fees in SEED and enjoy all the benefits of holding SEED such as daily compounding. uTrade is the only platform to offer a trading rebate which automatically stakes on multiple blockchains and earns increasing rewards.
  • Enabled by the power of the SEED Bridge, uTrade will allow for the non-custodial trading of tokens from different blockchains.

More information on UP Token

  • Each blockchain where Unifi Protocol operates will have a designated base token in which fees and revenue is collected. For example these will be ONG on Ontology, ONE on Harmony, and TRX on Tron.
  • The redemption value of UP tokens is the amount of base tokens Unifi Protocol will return in exchange for the UP token.
  • During the UP minting process, an amount of base tokens (normally a percentage) are dedicated to raising the redemption value of all previously minted UP tokens. This amount of base tokens is known as the Power UP Rate (PUR).
  • This redemption value of UP is verifiable on-chain, and increases every time new UP token is minted.
  • Unifi anticipates the market value of UP could greatly exceed the redemption value of UP, and a rising redemption value could provide an increased positive market price pressure.
  • As Unifi Protocol matures and the community grows, UP tokens will be used to enable decentralized governance. Utilizing UP token as the governance token will allow those who have supported Unifi the most to provide the most input in its governance.
  • UP holders receive a share of fees generated by other platforms created by any developer using Unifi Protocol.

What is uLend?

  • uLend is a lending platform built with smart contracts based on Unifi Protocol.
  • uLend will allow UP token holders to preserve their ability to receive a share of further Unifi Protocol revenue, while still utilizing the redemption value of UP. This value can then be used to take advantage of market movements, cross chain trading, margin trading, or any other use the borrower may desire.
  • Unifi Protocol will allow uLend to create loan products to temporarily extract value from other collateralized tokens as well.
  • By utilizing the SEED Bridge, uLend could provide an opportunity to take a collateralized loan on one blockchain, and use the resulting funds on another. For example, a user who held UP token on Ontology could take out a loan and then use those funds to take advantage of the market movement of an Ethereum token.

Would you like to earn UNFI right now! ☞ CLICK HERE

Looking for more information…

☞ Website
☞ Explorer
☞ Social Channel
Message Board
☞ Coinmarketcap

Thank for visiting and reading this article! I’m highly appreciate your actions! Please share if you liked it!

#blockchain #bitcoin #cryptocurrency #unifi protocol dao #unfi

What is GEEK

Buddha Community

What is Unifi Protocol DAO (UNFI) | What is Unifi Protocol DAO token |What is UNFI token

What is Unifi Protocol DAO (UNFI) | What is Unifi Protocol DAO token |What is UNFI token

What is Unifi Protocol DAO (UNFI)?

Unifi Protocol DAO is a group of non-custodial, interoperable, decentralized, multi-chain smart contracts providing the building blocks for DeFi development. The project provides a bridge to connect the economy of Ethereum-based DeFi products to the growing DeFi markets on other blockchains.uTrade, a decentralized Automated Market Maker (AMM) and token exchange, is the first product built on the Unifi Protocol DAO smart contracts. uTrade is made up of a group of smart contracts called a uPair. Multiple token trades can be executed within one uPair, allowing a user to trade or fund liquidity in uPairs with an unlimited combination of tokens.The UP token is minted as a representation of the revenue generated by the Unifi Protocol DAO ecosystem. The tokenomics of UP tokens may differ from one blockchain to another to customize the rewards provided and ensure a sustainable marketplace. The UP token has no max token supply, and can be swapped to UNFI based on dynamic market price rates.UNFI token holders can earn UNFI rewards by staking and delegating to Community Council Representatives. Representatives and their communities across multiple blockchains forms the Unifi Global Governance Model powered by UNFI.

source : TokenClan

The Fast and Secure Protocol Bridging the DeFi Economy

Unifi Protocol is a group of non-custodial, interoperable, decentralized, and multi-chain smart contracts providing the building blocks for the next generation of DeFi development. Unifi provides a bridge to connect the existing economy of Ethereum-based DeFi products to growing DeFi markets on other blockchains. Unifi utilizes the time tested SEED Bridge to connect any Ethereum or EVM/NVM-based DeFi products to other blockchains.

The Unlimited Potential Token (UP) is at the core of Unifi’s DeFi system. UP minting occurs in direct proportion to the fees and other revenue generated by the entire Unifi Protocol. This includes designated fees or revenue from all Unifi-developed platforms, as well as from any projects developed by others which utilize Unifi Protocol. The minting process is designed to ensure the redemption value of all UP tokens must go UP every time a new UP token is minted. UP token creates a very unique and lucrative rewards structure for all holders, protocol users, and liquidity providers.

Unifi Protocol is designed around a decentralized autonomous organization (DAO) governed via a decentralized governance token.

The following are highlights taken from the full Unifi Protocol whitepaper. The complete white paper is available here: Unifi White Paper

Image for post

Introducing Unifi Protocol

  • Unifi Protocol unifies the multi-chain DeFi world through the SEED Bridge. This allows Unifi to be a multi-chain and non-custodial decentralized swap and complete DeFi protocol.
  • Unifi Protocol features incentivized liquidity pooling, a fee sharing governance token, loyalty rewards token, lending platform, and cross-chain swap interoperability.
  • Unifi was developed by Sesameseed, the multi-chain staking community committed to bringing their innovative technology solutions to all their represented blockchains.
  • Unifi Protocol adds value to each blockchain upon which it is deployed, allowing developers to build a range of products and services around the smart contract based building blocks of the protocol.

What is uTrade (and how does it relate to Unifi)?

  • uTrade is a trading platform built utilizing Unifi Protocol.
  • uTrade operates as a decentralized, unlimited liquidity market maker, where one token is exchanged for another, utilizing liquidity provided by incentivized liquidity providers.

Why is uTrade better than existing trading platforms?

  • Liquidity providers share a portion of the revenue generated by the entire Unifi Protocol, and are not limited to a share of fees earned solely from a specific trading pair.
  • Even after liquidity is withdrawn, liquidity providers continue to share in fees as long as they hold UP tokens.
  • Traders are refunded a portion of their trading fees in SEED and enjoy all the benefits of holding SEED such as daily compounding. uTrade is the only platform to offer a trading rebate which automatically stakes on multiple blockchains and earns increasing rewards.
  • Enabled by the power of the SEED Bridge, uTrade will allow for the non-custodial trading of tokens from different blockchains.

More information on UP Token

  • Each blockchain where Unifi Protocol operates will have a designated base token in which fees and revenue is collected. For example these will be ONG on Ontology, ONE on Harmony, and TRX on Tron.
  • The redemption value of UP tokens is the amount of base tokens Unifi Protocol will return in exchange for the UP token.
  • During the UP minting process, an amount of base tokens (normally a percentage) are dedicated to raising the redemption value of all previously minted UP tokens. This amount of base tokens is known as the Power UP Rate (PUR).
  • This redemption value of UP is verifiable on-chain, and increases every time new UP token is minted.
  • Unifi anticipates the market value of UP could greatly exceed the redemption value of UP, and a rising redemption value could provide an increased positive market price pressure.
  • As Unifi Protocol matures and the community grows, UP tokens will be used to enable decentralized governance. Utilizing UP token as the governance token will allow those who have supported Unifi the most to provide the most input in its governance.
  • UP holders receive a share of fees generated by other platforms created by any developer using Unifi Protocol.

What is uLend?

  • uLend is a lending platform built with smart contracts based on Unifi Protocol.
  • uLend will allow UP token holders to preserve their ability to receive a share of further Unifi Protocol revenue, while still utilizing the redemption value of UP. This value can then be used to take advantage of market movements, cross chain trading, margin trading, or any other use the borrower may desire.
  • Unifi Protocol will allow uLend to create loan products to temporarily extract value from other collateralized tokens as well.
  • By utilizing the SEED Bridge, uLend could provide an opportunity to take a collateralized loan on one blockchain, and use the resulting funds on another. For example, a user who held UP token on Ontology could take out a loan and then use those funds to take advantage of the market movement of an Ethereum token.

Would you like to earn UNFI right now! ☞ CLICK HERE

Looking for more information…

☞ Website
☞ Explorer
☞ Social Channel
Message Board
☞ Coinmarketcap

Thank for visiting and reading this article! I’m highly appreciate your actions! Please share if you liked it!

#blockchain #bitcoin #cryptocurrency #unifi protocol dao #unfi

Lisa joly

Lisa joly

1624715818

Unifi Protocol DAO (UNFI) Hit $11.84 WOW..... Price Prediction 2021, $20, $40, $80??

Hey guys, this is the 3rd video on done on Unifi Protocol DAO (UNFI). This project has surprised me a lot. My patreons got in at $3ish price and it’s exceeded my expectations. Keep in mind Seed investors got in at $0.15c and private investors got in at $0.66c. So you can see how much they already would have made. This is crazy. We don’t know how high Unifi Protocol DAO (UNFI) could go. The project is only 2 weeks old. Could it hit $20, $40, $80 in 2021. Let me know your thoughts, comment below.

📺 The video in this post was made by Crypto expat
The origin of the article: https://www.youtube.com/watch?v=Fk6ny8gvDbA
🔺 DISCLAIMER: The article is for information sharing. The content of this video is solely the opinions of the speaker who is not a licensed financial advisor or registered investment advisor. Not investment advice or legal advice.
Cryptocurrency trading is VERY risky. Make sure you understand these risks and that you are responsible for what you do with your money
🔥 If you’re a beginner. I believe the article below will be useful to you ☞ What You Should Know Before Investing in Cryptocurrency - For Beginner
⭐ ⭐ ⭐The project is of interest to the community. Join to Get free ‘GEEK coin’ (GEEKCASH coin)!
☞ **-----CLICK HERE-----**⭐ ⭐ ⭐
Thanks for visiting and watching! Please don’t forget to leave a like, comment and share!

#bitcoin #blockchain #unfi #dao #price prediction #unifi protocol dao (unfi) hit $11.84 wow..... price prediction 2021, $20, $40, $80??

Lisa joly

Lisa joly

1624406400

Unifi Protocol Dao STAKING is LIVE 40% APY Fill fast so HURRY!!

Unifi Protocol DAO is a group of non-custodial, interoperable, decentralized, and multi-chain smart contracts providing the building blocks for the next generation of DeFi development. The Unifi Protocol is designed to provide developers with functional DeFi building blocks linking multiple blockchains together into one large DeFi marketplace. uTrade is the working proof-of-concept for harnessing multi-chain and multi-platform liquidity mining to power UNFI, Unifi’s Global Governance Token.

📺 The video in this post was made by Crypto expat
The origin of the article: https://www.youtube.com/watch?v=DmSFKk0-SIY
🔺 DISCLAIMER: The article is for information sharing. The content of this video is solely the opinions of the speaker who is not a licensed financial advisor or registered investment advisor. Not investment advice or legal advice.
Cryptocurrency trading is VERY risky. Make sure you understand these risks and that you are responsible for what you do with your money
🔥 If you’re a beginner. I believe the article below will be useful to you ☞ What You Should Know Before Investing in Cryptocurrency - For Beginner
⭐ ⭐ ⭐The project is of interest to the community. Join to Get free ‘GEEK coin’ (GEEKCASH coin)!
☞ **-----CLICK HERE-----**⭐ ⭐ ⭐
Thanks for visiting and watching! Please don’t forget to leave a like, comment and share!

#bitcoin #blockchain #unifi protocol dao #hurry #unifi protocol dao staking is live 40% apy fill fast so hurry!!

Words Counted: A Ruby Natural Language Processor.

WordsCounted

We are all in the gutter, but some of us are looking at the stars.

-- Oscar Wilde

WordsCounted is a Ruby NLP (natural language processor). WordsCounted lets you implement powerful tokensation strategies with a very flexible tokeniser class.

Are you using WordsCounted to do something interesting? Please tell me about it.

 

Demo

Visit this website for one example of what you can do with WordsCounted.

Features

  • Out of the box, get the following data from any string or readable file, or URL:
    • Token count and unique token count
    • Token densities, frequencies, and lengths
    • Char count and average chars per token
    • The longest tokens and their lengths
    • The most frequent tokens and their frequencies.
  • A flexible way to exclude tokens from the tokeniser. You can pass a string, regexp, symbol, lambda, or an array of any combination of those types for powerful tokenisation strategies.
  • Pass your own regexp rules to the tokeniser if you prefer. The default regexp filters special characters but keeps hyphens and apostrophes. It also plays nicely with diacritics (UTF and unicode characters): Bayrūt is treated as ["Bayrūt"] and not ["Bayr", "ū", "t"], for example.
  • Opens and reads files. Pass in a file path or a url instead of a string.

Installation

Add this line to your application's Gemfile:

gem 'words_counted'

And then execute:

$ bundle

Or install it yourself as:

$ gem install words_counted

Usage

Pass in a string or a file path, and an optional filter and/or regexp.

counter = WordsCounted.count(
  "We are all in the gutter, but some of us are looking at the stars."
)

# Using a file
counter = WordsCounted.from_file("path/or/url/to/my/file.txt")

.count and .from_file are convenience methods that take an input, tokenise it, and return an instance of WordsCounted::Counter initialized with the tokens. The WordsCounted::Tokeniser and WordsCounted::Counter classes can be used alone, however.

API

WordsCounted

WordsCounted.count(input, options = {})

Tokenises input and initializes a WordsCounted::Counter object with the resulting tokens.

counter = WordsCounted.count("Hello Beirut!")

Accepts two options: exclude and regexp. See Excluding tokens from the analyser and Passing in a custom regexp respectively.

WordsCounted.from_file(path, options = {})

Reads and tokenises a file, and initializes a WordsCounted::Counter object with the resulting tokens.

counter = WordsCounted.from_file("hello_beirut.txt")

Accepts the same options as .count.

Tokeniser

The tokeniser allows you to tokenise text in a variety of ways. You can pass in your own rules for tokenisation, and apply a powerful filter with any combination of rules as long as they can boil down into a lambda.

Out of the box the tokeniser includes only alpha chars. Hyphenated tokens and tokens with apostrophes are considered a single token.

#tokenise([pattern: TOKEN_REGEXP, exclude: nil])

tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise

# With `exclude`
tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise(exclude: "hello")

# With `pattern`
tokeniser = WordsCounted::Tokeniser.new("I <3 Beirut!").tokenise(pattern: /[a-z]/i)

See Excluding tokens from the analyser and Passing in a custom regexp for more information.

Counter

The WordsCounted::Counter class allows you to collect various statistics from an array of tokens.

#token_count

Returns the token count of a given string.

counter.token_count #=> 15

#token_frequency

Returns a sorted (unstable) two-dimensional array where each element is a token and its frequency. The array is sorted by frequency in descending order.

counter.token_frequency

[
  ["the", 2],
  ["are", 2],
  ["we",  1],
  # ...
  ["all", 1]
]

#most_frequent_tokens

Returns a hash where each key-value pair is a token and its frequency.

counter.most_frequent_tokens

{ "are" => 2, "the" => 2 }

#token_lengths

Returns a sorted (unstable) two-dimentional array where each element contains a token and its length. The array is sorted by length in descending order.

counter.token_lengths

[
  ["looking", 7],
  ["gutter",  6],
  ["stars",   5],
  # ...
  ["in",      2]
]

#longest_tokens

Returns a hash where each key-value pair is a token and its length.

counter.longest_tokens

{ "looking" => 7 }

#token_density([ precision: 2 ])

Returns a sorted (unstable) two-dimentional array where each element contains a token and its density as a float, rounded to a precision of two. The array is sorted by density in descending order. It accepts a precision argument, which must be a float.

counter.token_density

[
  ["are",     0.13],
  ["the",     0.13],
  ["but",     0.07 ],
  # ...
  ["we",      0.07 ]
]

#char_count

Returns the char count of tokens.

counter.char_count #=> 76

#average_chars_per_token([ precision: 2 ])

Returns the average char count per token rounded to two decimal places. Accepts a precision argument which defaults to two. Precision must be a float.

counter.average_chars_per_token #=> 4

#uniq_token_count

Returns the number of unique tokens.

counter.uniq_token_count #=> 13

Excluding tokens from the tokeniser

You can exclude anything you want from the input by passing the exclude option. The exclude option accepts a variety of filters and is extremely flexible.

  1. A space-delimited string. The filter will normalise the string.
  2. A regular expression.
  3. A lambda.
  4. A symbol that names a predicate method. For example :odd?.
  5. An array of any combination of the above.
tokeniser =
  WordsCounted::Tokeniser.new(
    "Magnificent! That was magnificent, Trevor."
  )

# Using a string
tokeniser.tokenise(exclude: "was magnificent")
# => ["that", "trevor"]

# Using a regular expression
tokeniser.tokenise(exclude: /trevor/)
# => ["magnificent", "that", "was", "magnificent"]

# Using a lambda
tokeniser.tokenise(exclude: ->(t) { t.length < 4 })
# => ["magnificent", "that", "magnificent", "trevor"]

# Using symbol
tokeniser = WordsCounted::Tokeniser.new("Hello! محمد")
tokeniser.tokenise(exclude: :ascii_only?)
# => ["محمد"]

# Using an array
tokeniser = WordsCounted::Tokeniser.new(
  "Hello! اسماءنا هي محمد، كارولينا، سامي، وداني"
)
tokeniser.tokenise(
  exclude: [:ascii_only?, /محمد/, ->(t) { t.length > 6}, "و"]
)
# => ["هي", "سامي", "وداني"]

Passing in a custom regexp

The default regexp accounts for letters, hyphenated tokens, and apostrophes. This means twenty-one is treated as one token. So is Mohamad's.

/[\p{Alpha}\-']+/

You can pass your own criteria as a Ruby regular expression to split your string as desired.

For example, if you wanted to include numbers, you can override the regular expression:

counter = WordsCounted.count("Numbers 1, 2, and 3", pattern: /[\p{Alnum}\-']+/)
counter.tokens
#=> ["numbers", "1", "2", "and", "3"]

Opening and reading files

Use the from_file method to open files. from_file accepts the same options as .count. The file path can be a URL.

counter = WordsCounted.from_file("url/or/path/to/file.text")

Gotchas

A hyphen used in leu of an em or en dash will form part of the token. This affects the tokeniser algorithm.

counter = WordsCounted.count("How do you do?-you are well, I see.")
counter.token_frequency

[
  ["do",   2],
  ["how",  1],
  ["you",  1],
  ["-you", 1], # WTF, mate!
  ["are",  1],
  # ...
]

In this example -you and you are separate tokens. Also, the tokeniser does not include numbers by default. Remember that you can pass your own regular expression if the default behaviour does not fit your needs.

A note on case sensitivity

The program will normalise (downcase) all incoming strings for consistency and filters.

Roadmap

Ability to open URLs

def self.from_url
  # open url and send string here after removing html
end

Contributors

See contributors.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Author: abitdodgy
Source code: https://github.com/abitdodgy/words_counted
License: MIT license

#ruby  #ruby-on-rails 

Royce  Reinger

Royce Reinger

1658068560

WordsCounted: A Ruby Natural Language Processor

WordsCounted

We are all in the gutter, but some of us are looking at the stars.

-- Oscar Wilde

WordsCounted is a Ruby NLP (natural language processor). WordsCounted lets you implement powerful tokensation strategies with a very flexible tokeniser class.

Features

  • Out of the box, get the following data from any string or readable file, or URL:
    • Token count and unique token count
    • Token densities, frequencies, and lengths
    • Char count and average chars per token
    • The longest tokens and their lengths
    • The most frequent tokens and their frequencies.
  • A flexible way to exclude tokens from the tokeniser. You can pass a string, regexp, symbol, lambda, or an array of any combination of those types for powerful tokenisation strategies.
  • Pass your own regexp rules to the tokeniser if you prefer. The default regexp filters special characters but keeps hyphens and apostrophes. It also plays nicely with diacritics (UTF and unicode characters): Bayrūt is treated as ["Bayrūt"] and not ["Bayr", "ū", "t"], for example.
  • Opens and reads files. Pass in a file path or a url instead of a string.

Installation

Add this line to your application's Gemfile:

gem 'words_counted'

And then execute:

$ bundle

Or install it yourself as:

$ gem install words_counted

Usage

Pass in a string or a file path, and an optional filter and/or regexp.

counter = WordsCounted.count(
  "We are all in the gutter, but some of us are looking at the stars."
)

# Using a file
counter = WordsCounted.from_file("path/or/url/to/my/file.txt")

.count and .from_file are convenience methods that take an input, tokenise it, and return an instance of WordsCounted::Counter initialized with the tokens. The WordsCounted::Tokeniser and WordsCounted::Counter classes can be used alone, however.

API

WordsCounted

WordsCounted.count(input, options = {})

Tokenises input and initializes a WordsCounted::Counter object with the resulting tokens.

counter = WordsCounted.count("Hello Beirut!")

Accepts two options: exclude and regexp. See Excluding tokens from the analyser and Passing in a custom regexp respectively.

WordsCounted.from_file(path, options = {})

Reads and tokenises a file, and initializes a WordsCounted::Counter object with the resulting tokens.

counter = WordsCounted.from_file("hello_beirut.txt")

Accepts the same options as .count.

Tokeniser

The tokeniser allows you to tokenise text in a variety of ways. You can pass in your own rules for tokenisation, and apply a powerful filter with any combination of rules as long as they can boil down into a lambda.

Out of the box the tokeniser includes only alpha chars. Hyphenated tokens and tokens with apostrophes are considered a single token.

#tokenise([pattern: TOKEN_REGEXP, exclude: nil])

tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise

# With `exclude`
tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise(exclude: "hello")

# With `pattern`
tokeniser = WordsCounted::Tokeniser.new("I <3 Beirut!").tokenise(pattern: /[a-z]/i)

See Excluding tokens from the analyser and Passing in a custom regexp for more information.

Counter

The WordsCounted::Counter class allows you to collect various statistics from an array of tokens.

#token_count

Returns the token count of a given string.

counter.token_count #=> 15

#token_frequency

Returns a sorted (unstable) two-dimensional array where each element is a token and its frequency. The array is sorted by frequency in descending order.

counter.token_frequency

[
  ["the", 2],
  ["are", 2],
  ["we",  1],
  # ...
  ["all", 1]
]

#most_frequent_tokens

Returns a hash where each key-value pair is a token and its frequency.

counter.most_frequent_tokens

{ "are" => 2, "the" => 2 }

#token_lengths

Returns a sorted (unstable) two-dimentional array where each element contains a token and its length. The array is sorted by length in descending order.

counter.token_lengths

[
  ["looking", 7],
  ["gutter",  6],
  ["stars",   5],
  # ...
  ["in",      2]
]

#longest_tokens

Returns a hash where each key-value pair is a token and its length.

counter.longest_tokens

{ "looking" => 7 }

#token_density([ precision: 2 ])

Returns a sorted (unstable) two-dimentional array where each element contains a token and its density as a float, rounded to a precision of two. The array is sorted by density in descending order. It accepts a precision argument, which must be a float.

counter.token_density

[
  ["are",     0.13],
  ["the",     0.13],
  ["but",     0.07 ],
  # ...
  ["we",      0.07 ]
]

#char_count

Returns the char count of tokens.

counter.char_count #=> 76

#average_chars_per_token([ precision: 2 ])

Returns the average char count per token rounded to two decimal places. Accepts a precision argument which defaults to two. Precision must be a float.

counter.average_chars_per_token #=> 4

#uniq_token_count

Returns the number of unique tokens.

counter.uniq_token_count #=> 13

Excluding tokens from the tokeniser

You can exclude anything you want from the input by passing the exclude option. The exclude option accepts a variety of filters and is extremely flexible.

  1. A space-delimited string. The filter will normalise the string.
  2. A regular expression.
  3. A lambda.
  4. A symbol that names a predicate method. For example :odd?.
  5. An array of any combination of the above.
tokeniser =
  WordsCounted::Tokeniser.new(
    "Magnificent! That was magnificent, Trevor."
  )

# Using a string
tokeniser.tokenise(exclude: "was magnificent")
# => ["that", "trevor"]

# Using a regular expression
tokeniser.tokenise(exclude: /trevor/)
# => ["magnificent", "that", "was", "magnificent"]

# Using a lambda
tokeniser.tokenise(exclude: ->(t) { t.length < 4 })
# => ["magnificent", "that", "magnificent", "trevor"]

# Using symbol
tokeniser = WordsCounted::Tokeniser.new("Hello! محمد")
tokeniser.tokenise(exclude: :ascii_only?)
# => ["محمد"]

# Using an array
tokeniser = WordsCounted::Tokeniser.new(
  "Hello! اسماءنا هي محمد، كارولينا، سامي، وداني"
)
tokeniser.tokenise(
  exclude: [:ascii_only?, /محمد/, ->(t) { t.length > 6}, "و"]
)
# => ["هي", "سامي", "وداني"]

Passing in a custom regexp

The default regexp accounts for letters, hyphenated tokens, and apostrophes. This means twenty-one is treated as one token. So is Mohamad's.

/[\p{Alpha}\-']+/

You can pass your own criteria as a Ruby regular expression to split your string as desired.

For example, if you wanted to include numbers, you can override the regular expression:

counter = WordsCounted.count("Numbers 1, 2, and 3", pattern: /[\p{Alnum}\-']+/)
counter.tokens
#=> ["numbers", "1", "2", "and", "3"]

Opening and reading files

Use the from_file method to open files. from_file accepts the same options as .count. The file path can be a URL.

counter = WordsCounted.from_file("url/or/path/to/file.text")

Gotchas

A hyphen used in leu of an em or en dash will form part of the token. This affects the tokeniser algorithm.

counter = WordsCounted.count("How do you do?-you are well, I see.")
counter.token_frequency

[
  ["do",   2],
  ["how",  1],
  ["you",  1],
  ["-you", 1], # WTF, mate!
  ["are",  1],
  # ...
]

In this example -you and you are separate tokens. Also, the tokeniser does not include numbers by default. Remember that you can pass your own regular expression if the default behaviour does not fit your needs.

A note on case sensitivity

The program will normalise (downcase) all incoming strings for consistency and filters.

Roadmap

Ability to open URLs

def self.from_url
  # open url and send string here after removing html
end

Are you using WordsCounted to do something interesting? Please tell me about it.

Gem Version 

RubyDoc documentation.

Demo

Visit this website for one example of what you can do with WordsCounted.


Contributors

See contributors.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Author: Abitdodgy
Source Code: https://github.com/abitdodgy/words_counted 
License: MIT license

#ruby #nlp