What is SubQuery Network | What is SubQuery Network token | What is SubQuery Network

SubQuery’s aim is to help Polkadot/Substrate projects build better dApps by allowing anyone to reliably find and consume data faster. Our service will allow users to extract, transform, persist, and query data initially, as well as connect and present data in the future. Our aim is to make this a core piece of infrastructure for the Substrate/Polkadot ecosystem, just as The Graph has become for Ethereum.

SubQuery is here to help you transform and query the world’s data for a web3.0 future.

Announcing SubQuery

For the Web 3.0 dream to be realised, it’s got to be as fast (if not faster) than centralised networks for the end user.

That’s why we’re incredibly proud to announce SubQuery, an open source project that allows users to run an indexer across their chain to build a dataset that can be queried with GraphQL. This suite of tools includes a command line interface to allow projects to generate their own SubQuery project, defining how the indexer should traverse and aggregate their own network. There’s a SubQuery node package that indexes the network and supports GraphQL queries. With the help of these tools, anyone can create and run queries easily.

SubQuery Delivers Its Open Source SDK Following a Web3 Foundation Grant

The Web3 Foundation Open Grants Program Deliverable

The Web3 Foundation Open Grants Program has enabled us to build SubQuery, an open source project that allows users to run an indexer across their chain to build a dataset that can be queried with GraphQL.

This suite of tools includes @subql/cli, to allow projects to generate their own SubQuery project, defining how the indexer should traverse and aggregate their own network. As part of our proposal, we have provided a basic tutorial that shows users how to use the cli to index their network that you can follow here. We’ve even provided more detailed developer documentation for more advanced usages.

Secondly, there’s a SubQuery node package that loads the defined SubQuery project created by the CLI and then indexes the network to a Postgres database. Using Hasura, you can run GraphQL queries right away over indexed tables. With the help of these tools, and the community support material that we’re always improving, anyone can create and run queries easily.

You can get started right away by following our example on the SubQuery Github repository: https://github.com/OnFinality-io/subql. Additionally, you can find out more by reading our SubQuery docs: https://doc.subquery.network/ or visiting our new website

We’re incredibly grateful for the support provided by Web3 Foundation to help us carry out this project for the community. Web3 Foundation funds research and development teams building the technology stack of the decentralized web. It was established in Zug, Switzerland by Ethereum co-founder and former chief technology officer Dr. Gavin Wood. Polkadot is the Foundation’s flagship project.

Why Now?

We saw Polkadot’s potential early and right from the start it felt natural to focus our efforts there. The core premise of Polkadot is to create a thriving community of developers, users, and businesses that will tap into its multichain interoperability — that community is going to need a service that allows them to reliably find and consume data quickly.

Polkadot’s unique architecture means that we can focus on one network and then be able to support multiple current and future chains with ease. Even though Polkadot is still under development, we will be there ready to help the next generation of blockchain developers create the next big dApp.

SubQuery Explorer

Since announcing SubQuery to the world just a month ago, the response and feedback we’ve received from the Polkadot community has been overwhelmingly positive. We’ve gained thousands of followers on our channels, and there have been over 1,353 installs from NPM. It’s inspiring to receive all the messages of support, and to see the engagement from the community. It’s time for us to give something back!

We’ve been working overtime over the last month to release the next major stage of our roadmap for SubQuery. Today we’re announcing the release of the SubQuery Explorer.

SubQuery Explorer is an online hosted service that provides access to published SubQuery projects made by contributors around the world and managed by the SubQuery team. It furthers our mission to support Polkadot developers by providing infrastructure services by making accessing Polkadot network data even easier.

Today, anyone can query and extract Polkadot network data in only minutes and at no cost.

The SubQuery explorer makes getting started easy. We’ve prebuilt SubQuery projects for two use cases (more about these below) and have indexed each network. We’re hosting these SubQuery nodes online and allow anyone to query each for free. These managed nodes will be monitored and run by the SubQuery team at a performance level that will allow production apps to use and rely on them.

You’ll also note that the SubQuery Explorer provides a playground for discovering available data with example queries. You can play around with each SubQuery Graph using this explorer without implementing anything in code. Additionally, we’ve made some small improvements to our documentation to better support developers on their journey to better query and analyse the world’s Polkadot data.

Launch Use Cases

Staking Sum SubQuery Project GitHub Project Code: https://github.com/subquery/subql-examples/tree/main/sum-reward

You can quickly find total staking revenue awarded to any account since the beginning of time by querying their account address. This subquery project indexes and records the accounts participating in the staking bond on the blockchain. The continued indexing will find out obtained staking reward and slashes for this account and aggregates their sums to a database.

Validator Threshold SubQuery Project GitHub Project Code: https://github.com/subquery/subql-examples/tree/main/validator-threshold

You can quickly see the minimum staking amount required for a validator to be elected. This project is an excellent example of implementing query states in the mapping function. It first finds the active staking Era through a state query and records the validators of this session staking amount by each. It then calculates the minimum staking amount and the total amount staked in this Era. Lastly, it records the maximum number of nominators that can be rewarded.

SubQuery Integrates Acala to Aggregate and Serve DeFi Data to Polkadot and Kusama Builders

SubQuery is a data aggregation layer that will operate between the layer-1 blockchain (Acala) and DApp layer. The solution aggregates and organizes data from Acala and other blockchains, serving up well-structured data for developers to use for a wide array of projects. This service allows DApp developers to focus on their core use case and front-end, without needing to waste time on building a custom backend for data processing.

Acala is a firm believer and a long-term builder for the multi-chain future — reducing liquidity fragmentation, increasing composability, and enabling DeFi accessibility to everyone. Acala is a specialized blockchain focusing on decentralized finance (DeFi), and created multiple DeFi primitives that became a DeFi hub and infrastructure serving the Polkadot and Kusama ecosystems. The team has built products including a multi-collateralized stablecoin (aUSD — The Acala Dollar), an automated market maker (AMM) DEX, a tokenized staked asset called Liquid DOT (LDOT), and implemented a bring-your-own-gas feature allowing gas fees to be payable in any supported assets such as stablecoins. Acala’s parachain plans to play the role of DeFi hub for Polkadot and a landing pad that aggregates assets and liquidity from a variety of blockchains.

Today when you access the SubQuery Explorer you’ll be welcomed with a new Acala SubQuery Project. This SubQuery dynamically tracks all the extrinsic data created on Acala and can quickly show derived aggregated stats for the following:

  • Historical Data on liquidity changes (broken down by providers)
  • Historical Data on all cross chain asset swaps
  • Transfer history

You can play around with the Acala SubQuery Graph using the Explorer without implementing anything in code. Additionally, we’ve documented the types that you can specify in each GraphQL request when analysing Acala’s data.

A future with a live Acala SubQuery Graph

Below is a simple example of how a user can quickly and easily see the previous 5 transfer events using the ACA token over the Acala Mandala network. You can see here that we use simple GraphQL language to sort and retrieve this data to any client. DApps can use this to monitor loan positions, and jump on auctions etc to help liquidate collaterals.

A slightly more complex example follows, where we inspect a single account and retrieve all token swap events made by it. A portfolio DApp might use this data to create an overview of the holder’s account and token performance, revenue from staking, liquidity provisioning, and expenses on borrowing.

SubQuery — Transform and Query the World’s Data for a Web3 future.

SubQuery Explorer is an online hosted service that provides access to published SubQuery projects made by contributors around the world and managed by the SubQuery team. Its mission is to ease access to Polkadot network data by providing infrastructure services to help developers achieve more.

SubQuery allows every Substrate/Polkadot team to process and query their data. The project is inspired by the growth of data protocols serving the application layer and its aim is to help Polkadot/Substrate projects build better dApps by allowing anyone to reliably find and consume data faster. Today, anyone can query and extract Polkadot network data in only minutes and at no cost.

Learn More

Start querying data on the new SubQuery Explorer

Build your own SubQuery project by following our SubQuery docs

Visit our website

Talk to us on:

🔺DISCLAIMER: Trading Cryptocurrency is VERY risky. Make sure that you understand these risks if you are a beginner. The Information in the post is my OPINION and not financial advice. You are responsible for what you do with your funds

Learn about Cryptocurrency in this article ☞ What You Should Know Before Investing in Cryptocurrency - For Beginner

Top exchanges for token-coin trading. Follow instructions and make unlimited money

https://www.binance.com
https://www.bittrex.com
https://www.poloniex.com
https://www.bitfinex.com
https://www.huobi.com
https://www.mxc.ai
https://www.probit.com
https://www.gate.io
https://www.coinbase.com

I hope this post will help you. If you liked this, please sharing it with others. Thank you!

#blockchain #bitcoin #subquery network

What is GEEK

Buddha Community

What is SubQuery Network | What is SubQuery Network token | What is SubQuery Network

Ryan Al

1615934150

Your reviews are great!

Thank you Ryan

What is SubQuery Network | What is SubQuery Network token | What is SubQuery Network

SubQuery’s aim is to help Polkadot/Substrate projects build better dApps by allowing anyone to reliably find and consume data faster. Our service will allow users to extract, transform, persist, and query data initially, as well as connect and present data in the future. Our aim is to make this a core piece of infrastructure for the Substrate/Polkadot ecosystem, just as The Graph has become for Ethereum.

SubQuery is here to help you transform and query the world’s data for a web3.0 future.

Announcing SubQuery

For the Web 3.0 dream to be realised, it’s got to be as fast (if not faster) than centralised networks for the end user.

That’s why we’re incredibly proud to announce SubQuery, an open source project that allows users to run an indexer across their chain to build a dataset that can be queried with GraphQL. This suite of tools includes a command line interface to allow projects to generate their own SubQuery project, defining how the indexer should traverse and aggregate their own network. There’s a SubQuery node package that indexes the network and supports GraphQL queries. With the help of these tools, anyone can create and run queries easily.

SubQuery Delivers Its Open Source SDK Following a Web3 Foundation Grant

The Web3 Foundation Open Grants Program Deliverable

The Web3 Foundation Open Grants Program has enabled us to build SubQuery, an open source project that allows users to run an indexer across their chain to build a dataset that can be queried with GraphQL.

This suite of tools includes @subql/cli, to allow projects to generate their own SubQuery project, defining how the indexer should traverse and aggregate their own network. As part of our proposal, we have provided a basic tutorial that shows users how to use the cli to index their network that you can follow here. We’ve even provided more detailed developer documentation for more advanced usages.

Secondly, there’s a SubQuery node package that loads the defined SubQuery project created by the CLI and then indexes the network to a Postgres database. Using Hasura, you can run GraphQL queries right away over indexed tables. With the help of these tools, and the community support material that we’re always improving, anyone can create and run queries easily.

You can get started right away by following our example on the SubQuery Github repository: https://github.com/OnFinality-io/subql. Additionally, you can find out more by reading our SubQuery docs: https://doc.subquery.network/ or visiting our new website

We’re incredibly grateful for the support provided by Web3 Foundation to help us carry out this project for the community. Web3 Foundation funds research and development teams building the technology stack of the decentralized web. It was established in Zug, Switzerland by Ethereum co-founder and former chief technology officer Dr. Gavin Wood. Polkadot is the Foundation’s flagship project.

Why Now?

We saw Polkadot’s potential early and right from the start it felt natural to focus our efforts there. The core premise of Polkadot is to create a thriving community of developers, users, and businesses that will tap into its multichain interoperability — that community is going to need a service that allows them to reliably find and consume data quickly.

Polkadot’s unique architecture means that we can focus on one network and then be able to support multiple current and future chains with ease. Even though Polkadot is still under development, we will be there ready to help the next generation of blockchain developers create the next big dApp.

SubQuery Explorer

Since announcing SubQuery to the world just a month ago, the response and feedback we’ve received from the Polkadot community has been overwhelmingly positive. We’ve gained thousands of followers on our channels, and there have been over 1,353 installs from NPM. It’s inspiring to receive all the messages of support, and to see the engagement from the community. It’s time for us to give something back!

We’ve been working overtime over the last month to release the next major stage of our roadmap for SubQuery. Today we’re announcing the release of the SubQuery Explorer.

SubQuery Explorer is an online hosted service that provides access to published SubQuery projects made by contributors around the world and managed by the SubQuery team. It furthers our mission to support Polkadot developers by providing infrastructure services by making accessing Polkadot network data even easier.

Today, anyone can query and extract Polkadot network data in only minutes and at no cost.

The SubQuery explorer makes getting started easy. We’ve prebuilt SubQuery projects for two use cases (more about these below) and have indexed each network. We’re hosting these SubQuery nodes online and allow anyone to query each for free. These managed nodes will be monitored and run by the SubQuery team at a performance level that will allow production apps to use and rely on them.

You’ll also note that the SubQuery Explorer provides a playground for discovering available data with example queries. You can play around with each SubQuery Graph using this explorer without implementing anything in code. Additionally, we’ve made some small improvements to our documentation to better support developers on their journey to better query and analyse the world’s Polkadot data.

Launch Use Cases

Staking Sum SubQuery Project GitHub Project Code: https://github.com/subquery/subql-examples/tree/main/sum-reward

You can quickly find total staking revenue awarded to any account since the beginning of time by querying their account address. This subquery project indexes and records the accounts participating in the staking bond on the blockchain. The continued indexing will find out obtained staking reward and slashes for this account and aggregates their sums to a database.

Validator Threshold SubQuery Project GitHub Project Code: https://github.com/subquery/subql-examples/tree/main/validator-threshold

You can quickly see the minimum staking amount required for a validator to be elected. This project is an excellent example of implementing query states in the mapping function. It first finds the active staking Era through a state query and records the validators of this session staking amount by each. It then calculates the minimum staking amount and the total amount staked in this Era. Lastly, it records the maximum number of nominators that can be rewarded.

SubQuery Integrates Acala to Aggregate and Serve DeFi Data to Polkadot and Kusama Builders

SubQuery is a data aggregation layer that will operate between the layer-1 blockchain (Acala) and DApp layer. The solution aggregates and organizes data from Acala and other blockchains, serving up well-structured data for developers to use for a wide array of projects. This service allows DApp developers to focus on their core use case and front-end, without needing to waste time on building a custom backend for data processing.

Acala is a firm believer and a long-term builder for the multi-chain future — reducing liquidity fragmentation, increasing composability, and enabling DeFi accessibility to everyone. Acala is a specialized blockchain focusing on decentralized finance (DeFi), and created multiple DeFi primitives that became a DeFi hub and infrastructure serving the Polkadot and Kusama ecosystems. The team has built products including a multi-collateralized stablecoin (aUSD — The Acala Dollar), an automated market maker (AMM) DEX, a tokenized staked asset called Liquid DOT (LDOT), and implemented a bring-your-own-gas feature allowing gas fees to be payable in any supported assets such as stablecoins. Acala’s parachain plans to play the role of DeFi hub for Polkadot and a landing pad that aggregates assets and liquidity from a variety of blockchains.

Today when you access the SubQuery Explorer you’ll be welcomed with a new Acala SubQuery Project. This SubQuery dynamically tracks all the extrinsic data created on Acala and can quickly show derived aggregated stats for the following:

  • Historical Data on liquidity changes (broken down by providers)
  • Historical Data on all cross chain asset swaps
  • Transfer history

You can play around with the Acala SubQuery Graph using the Explorer without implementing anything in code. Additionally, we’ve documented the types that you can specify in each GraphQL request when analysing Acala’s data.

A future with a live Acala SubQuery Graph

Below is a simple example of how a user can quickly and easily see the previous 5 transfer events using the ACA token over the Acala Mandala network. You can see here that we use simple GraphQL language to sort and retrieve this data to any client. DApps can use this to monitor loan positions, and jump on auctions etc to help liquidate collaterals.

A slightly more complex example follows, where we inspect a single account and retrieve all token swap events made by it. A portfolio DApp might use this data to create an overview of the holder’s account and token performance, revenue from staking, liquidity provisioning, and expenses on borrowing.

SubQuery — Transform and Query the World’s Data for a Web3 future.

SubQuery Explorer is an online hosted service that provides access to published SubQuery projects made by contributors around the world and managed by the SubQuery team. Its mission is to ease access to Polkadot network data by providing infrastructure services to help developers achieve more.

SubQuery allows every Substrate/Polkadot team to process and query their data. The project is inspired by the growth of data protocols serving the application layer and its aim is to help Polkadot/Substrate projects build better dApps by allowing anyone to reliably find and consume data faster. Today, anyone can query and extract Polkadot network data in only minutes and at no cost.

Learn More

Start querying data on the new SubQuery Explorer

Build your own SubQuery project by following our SubQuery docs

Visit our website

Talk to us on:

🔺DISCLAIMER: Trading Cryptocurrency is VERY risky. Make sure that you understand these risks if you are a beginner. The Information in the post is my OPINION and not financial advice. You are responsible for what you do with your funds

Learn about Cryptocurrency in this article ☞ What You Should Know Before Investing in Cryptocurrency - For Beginner

Top exchanges for token-coin trading. Follow instructions and make unlimited money

https://www.binance.com
https://www.bittrex.com
https://www.poloniex.com
https://www.bitfinex.com
https://www.huobi.com
https://www.mxc.ai
https://www.probit.com
https://www.gate.io
https://www.coinbase.com

I hope this post will help you. If you liked this, please sharing it with others. Thank you!

#blockchain #bitcoin #subquery network

Words Counted: A Ruby Natural Language Processor.

WordsCounted

We are all in the gutter, but some of us are looking at the stars.

-- Oscar Wilde

WordsCounted is a Ruby NLP (natural language processor). WordsCounted lets you implement powerful tokensation strategies with a very flexible tokeniser class.

Are you using WordsCounted to do something interesting? Please tell me about it.

 

Demo

Visit this website for one example of what you can do with WordsCounted.

Features

  • Out of the box, get the following data from any string or readable file, or URL:
    • Token count and unique token count
    • Token densities, frequencies, and lengths
    • Char count and average chars per token
    • The longest tokens and their lengths
    • The most frequent tokens and their frequencies.
  • A flexible way to exclude tokens from the tokeniser. You can pass a string, regexp, symbol, lambda, or an array of any combination of those types for powerful tokenisation strategies.
  • Pass your own regexp rules to the tokeniser if you prefer. The default regexp filters special characters but keeps hyphens and apostrophes. It also plays nicely with diacritics (UTF and unicode characters): Bayrūt is treated as ["Bayrūt"] and not ["Bayr", "ū", "t"], for example.
  • Opens and reads files. Pass in a file path or a url instead of a string.

Installation

Add this line to your application's Gemfile:

gem 'words_counted'

And then execute:

$ bundle

Or install it yourself as:

$ gem install words_counted

Usage

Pass in a string or a file path, and an optional filter and/or regexp.

counter = WordsCounted.count(
  "We are all in the gutter, but some of us are looking at the stars."
)

# Using a file
counter = WordsCounted.from_file("path/or/url/to/my/file.txt")

.count and .from_file are convenience methods that take an input, tokenise it, and return an instance of WordsCounted::Counter initialized with the tokens. The WordsCounted::Tokeniser and WordsCounted::Counter classes can be used alone, however.

API

WordsCounted

WordsCounted.count(input, options = {})

Tokenises input and initializes a WordsCounted::Counter object with the resulting tokens.

counter = WordsCounted.count("Hello Beirut!")

Accepts two options: exclude and regexp. See Excluding tokens from the analyser and Passing in a custom regexp respectively.

WordsCounted.from_file(path, options = {})

Reads and tokenises a file, and initializes a WordsCounted::Counter object with the resulting tokens.

counter = WordsCounted.from_file("hello_beirut.txt")

Accepts the same options as .count.

Tokeniser

The tokeniser allows you to tokenise text in a variety of ways. You can pass in your own rules for tokenisation, and apply a powerful filter with any combination of rules as long as they can boil down into a lambda.

Out of the box the tokeniser includes only alpha chars. Hyphenated tokens and tokens with apostrophes are considered a single token.

#tokenise([pattern: TOKEN_REGEXP, exclude: nil])

tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise

# With `exclude`
tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise(exclude: "hello")

# With `pattern`
tokeniser = WordsCounted::Tokeniser.new("I <3 Beirut!").tokenise(pattern: /[a-z]/i)

See Excluding tokens from the analyser and Passing in a custom regexp for more information.

Counter

The WordsCounted::Counter class allows you to collect various statistics from an array of tokens.

#token_count

Returns the token count of a given string.

counter.token_count #=> 15

#token_frequency

Returns a sorted (unstable) two-dimensional array where each element is a token and its frequency. The array is sorted by frequency in descending order.

counter.token_frequency

[
  ["the", 2],
  ["are", 2],
  ["we",  1],
  # ...
  ["all", 1]
]

#most_frequent_tokens

Returns a hash where each key-value pair is a token and its frequency.

counter.most_frequent_tokens

{ "are" => 2, "the" => 2 }

#token_lengths

Returns a sorted (unstable) two-dimentional array where each element contains a token and its length. The array is sorted by length in descending order.

counter.token_lengths

[
  ["looking", 7],
  ["gutter",  6],
  ["stars",   5],
  # ...
  ["in",      2]
]

#longest_tokens

Returns a hash where each key-value pair is a token and its length.

counter.longest_tokens

{ "looking" => 7 }

#token_density([ precision: 2 ])

Returns a sorted (unstable) two-dimentional array where each element contains a token and its density as a float, rounded to a precision of two. The array is sorted by density in descending order. It accepts a precision argument, which must be a float.

counter.token_density

[
  ["are",     0.13],
  ["the",     0.13],
  ["but",     0.07 ],
  # ...
  ["we",      0.07 ]
]

#char_count

Returns the char count of tokens.

counter.char_count #=> 76

#average_chars_per_token([ precision: 2 ])

Returns the average char count per token rounded to two decimal places. Accepts a precision argument which defaults to two. Precision must be a float.

counter.average_chars_per_token #=> 4

#uniq_token_count

Returns the number of unique tokens.

counter.uniq_token_count #=> 13

Excluding tokens from the tokeniser

You can exclude anything you want from the input by passing the exclude option. The exclude option accepts a variety of filters and is extremely flexible.

  1. A space-delimited string. The filter will normalise the string.
  2. A regular expression.
  3. A lambda.
  4. A symbol that names a predicate method. For example :odd?.
  5. An array of any combination of the above.
tokeniser =
  WordsCounted::Tokeniser.new(
    "Magnificent! That was magnificent, Trevor."
  )

# Using a string
tokeniser.tokenise(exclude: "was magnificent")
# => ["that", "trevor"]

# Using a regular expression
tokeniser.tokenise(exclude: /trevor/)
# => ["magnificent", "that", "was", "magnificent"]

# Using a lambda
tokeniser.tokenise(exclude: ->(t) { t.length < 4 })
# => ["magnificent", "that", "magnificent", "trevor"]

# Using symbol
tokeniser = WordsCounted::Tokeniser.new("Hello! محمد")
tokeniser.tokenise(exclude: :ascii_only?)
# => ["محمد"]

# Using an array
tokeniser = WordsCounted::Tokeniser.new(
  "Hello! اسماءنا هي محمد، كارولينا، سامي، وداني"
)
tokeniser.tokenise(
  exclude: [:ascii_only?, /محمد/, ->(t) { t.length > 6}, "و"]
)
# => ["هي", "سامي", "وداني"]

Passing in a custom regexp

The default regexp accounts for letters, hyphenated tokens, and apostrophes. This means twenty-one is treated as one token. So is Mohamad's.

/[\p{Alpha}\-']+/

You can pass your own criteria as a Ruby regular expression to split your string as desired.

For example, if you wanted to include numbers, you can override the regular expression:

counter = WordsCounted.count("Numbers 1, 2, and 3", pattern: /[\p{Alnum}\-']+/)
counter.tokens
#=> ["numbers", "1", "2", "and", "3"]

Opening and reading files

Use the from_file method to open files. from_file accepts the same options as .count. The file path can be a URL.

counter = WordsCounted.from_file("url/or/path/to/file.text")

Gotchas

A hyphen used in leu of an em or en dash will form part of the token. This affects the tokeniser algorithm.

counter = WordsCounted.count("How do you do?-you are well, I see.")
counter.token_frequency

[
  ["do",   2],
  ["how",  1],
  ["you",  1],
  ["-you", 1], # WTF, mate!
  ["are",  1],
  # ...
]

In this example -you and you are separate tokens. Also, the tokeniser does not include numbers by default. Remember that you can pass your own regular expression if the default behaviour does not fit your needs.

A note on case sensitivity

The program will normalise (downcase) all incoming strings for consistency and filters.

Roadmap

Ability to open URLs

def self.from_url
  # open url and send string here after removing html
end

Contributors

See contributors.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Author: abitdodgy
Source code: https://github.com/abitdodgy/words_counted
License: MIT license

#ruby  #ruby-on-rails 

Royce  Reinger

Royce Reinger

1658068560

WordsCounted: A Ruby Natural Language Processor

WordsCounted

We are all in the gutter, but some of us are looking at the stars.

-- Oscar Wilde

WordsCounted is a Ruby NLP (natural language processor). WordsCounted lets you implement powerful tokensation strategies with a very flexible tokeniser class.

Features

  • Out of the box, get the following data from any string or readable file, or URL:
    • Token count and unique token count
    • Token densities, frequencies, and lengths
    • Char count and average chars per token
    • The longest tokens and their lengths
    • The most frequent tokens and their frequencies.
  • A flexible way to exclude tokens from the tokeniser. You can pass a string, regexp, symbol, lambda, or an array of any combination of those types for powerful tokenisation strategies.
  • Pass your own regexp rules to the tokeniser if you prefer. The default regexp filters special characters but keeps hyphens and apostrophes. It also plays nicely with diacritics (UTF and unicode characters): Bayrūt is treated as ["Bayrūt"] and not ["Bayr", "ū", "t"], for example.
  • Opens and reads files. Pass in a file path or a url instead of a string.

Installation

Add this line to your application's Gemfile:

gem 'words_counted'

And then execute:

$ bundle

Or install it yourself as:

$ gem install words_counted

Usage

Pass in a string or a file path, and an optional filter and/or regexp.

counter = WordsCounted.count(
  "We are all in the gutter, but some of us are looking at the stars."
)

# Using a file
counter = WordsCounted.from_file("path/or/url/to/my/file.txt")

.count and .from_file are convenience methods that take an input, tokenise it, and return an instance of WordsCounted::Counter initialized with the tokens. The WordsCounted::Tokeniser and WordsCounted::Counter classes can be used alone, however.

API

WordsCounted

WordsCounted.count(input, options = {})

Tokenises input and initializes a WordsCounted::Counter object with the resulting tokens.

counter = WordsCounted.count("Hello Beirut!")

Accepts two options: exclude and regexp. See Excluding tokens from the analyser and Passing in a custom regexp respectively.

WordsCounted.from_file(path, options = {})

Reads and tokenises a file, and initializes a WordsCounted::Counter object with the resulting tokens.

counter = WordsCounted.from_file("hello_beirut.txt")

Accepts the same options as .count.

Tokeniser

The tokeniser allows you to tokenise text in a variety of ways. You can pass in your own rules for tokenisation, and apply a powerful filter with any combination of rules as long as they can boil down into a lambda.

Out of the box the tokeniser includes only alpha chars. Hyphenated tokens and tokens with apostrophes are considered a single token.

#tokenise([pattern: TOKEN_REGEXP, exclude: nil])

tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise

# With `exclude`
tokeniser = WordsCounted::Tokeniser.new("Hello Beirut!").tokenise(exclude: "hello")

# With `pattern`
tokeniser = WordsCounted::Tokeniser.new("I <3 Beirut!").tokenise(pattern: /[a-z]/i)

See Excluding tokens from the analyser and Passing in a custom regexp for more information.

Counter

The WordsCounted::Counter class allows you to collect various statistics from an array of tokens.

#token_count

Returns the token count of a given string.

counter.token_count #=> 15

#token_frequency

Returns a sorted (unstable) two-dimensional array where each element is a token and its frequency. The array is sorted by frequency in descending order.

counter.token_frequency

[
  ["the", 2],
  ["are", 2],
  ["we",  1],
  # ...
  ["all", 1]
]

#most_frequent_tokens

Returns a hash where each key-value pair is a token and its frequency.

counter.most_frequent_tokens

{ "are" => 2, "the" => 2 }

#token_lengths

Returns a sorted (unstable) two-dimentional array where each element contains a token and its length. The array is sorted by length in descending order.

counter.token_lengths

[
  ["looking", 7],
  ["gutter",  6],
  ["stars",   5],
  # ...
  ["in",      2]
]

#longest_tokens

Returns a hash where each key-value pair is a token and its length.

counter.longest_tokens

{ "looking" => 7 }

#token_density([ precision: 2 ])

Returns a sorted (unstable) two-dimentional array where each element contains a token and its density as a float, rounded to a precision of two. The array is sorted by density in descending order. It accepts a precision argument, which must be a float.

counter.token_density

[
  ["are",     0.13],
  ["the",     0.13],
  ["but",     0.07 ],
  # ...
  ["we",      0.07 ]
]

#char_count

Returns the char count of tokens.

counter.char_count #=> 76

#average_chars_per_token([ precision: 2 ])

Returns the average char count per token rounded to two decimal places. Accepts a precision argument which defaults to two. Precision must be a float.

counter.average_chars_per_token #=> 4

#uniq_token_count

Returns the number of unique tokens.

counter.uniq_token_count #=> 13

Excluding tokens from the tokeniser

You can exclude anything you want from the input by passing the exclude option. The exclude option accepts a variety of filters and is extremely flexible.

  1. A space-delimited string. The filter will normalise the string.
  2. A regular expression.
  3. A lambda.
  4. A symbol that names a predicate method. For example :odd?.
  5. An array of any combination of the above.
tokeniser =
  WordsCounted::Tokeniser.new(
    "Magnificent! That was magnificent, Trevor."
  )

# Using a string
tokeniser.tokenise(exclude: "was magnificent")
# => ["that", "trevor"]

# Using a regular expression
tokeniser.tokenise(exclude: /trevor/)
# => ["magnificent", "that", "was", "magnificent"]

# Using a lambda
tokeniser.tokenise(exclude: ->(t) { t.length < 4 })
# => ["magnificent", "that", "magnificent", "trevor"]

# Using symbol
tokeniser = WordsCounted::Tokeniser.new("Hello! محمد")
tokeniser.tokenise(exclude: :ascii_only?)
# => ["محمد"]

# Using an array
tokeniser = WordsCounted::Tokeniser.new(
  "Hello! اسماءنا هي محمد، كارولينا، سامي، وداني"
)
tokeniser.tokenise(
  exclude: [:ascii_only?, /محمد/, ->(t) { t.length > 6}, "و"]
)
# => ["هي", "سامي", "وداني"]

Passing in a custom regexp

The default regexp accounts for letters, hyphenated tokens, and apostrophes. This means twenty-one is treated as one token. So is Mohamad's.

/[\p{Alpha}\-']+/

You can pass your own criteria as a Ruby regular expression to split your string as desired.

For example, if you wanted to include numbers, you can override the regular expression:

counter = WordsCounted.count("Numbers 1, 2, and 3", pattern: /[\p{Alnum}\-']+/)
counter.tokens
#=> ["numbers", "1", "2", "and", "3"]

Opening and reading files

Use the from_file method to open files. from_file accepts the same options as .count. The file path can be a URL.

counter = WordsCounted.from_file("url/or/path/to/file.text")

Gotchas

A hyphen used in leu of an em or en dash will form part of the token. This affects the tokeniser algorithm.

counter = WordsCounted.count("How do you do?-you are well, I see.")
counter.token_frequency

[
  ["do",   2],
  ["how",  1],
  ["you",  1],
  ["-you", 1], # WTF, mate!
  ["are",  1],
  # ...
]

In this example -you and you are separate tokens. Also, the tokeniser does not include numbers by default. Remember that you can pass your own regular expression if the default behaviour does not fit your needs.

A note on case sensitivity

The program will normalise (downcase) all incoming strings for consistency and filters.

Roadmap

Ability to open URLs

def self.from_url
  # open url and send string here after removing html
end

Are you using WordsCounted to do something interesting? Please tell me about it.

Gem Version 

RubyDoc documentation.

Demo

Visit this website for one example of what you can do with WordsCounted.


Contributors

See contributors.

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Author: Abitdodgy
Source Code: https://github.com/abitdodgy/words_counted 
License: MIT license

#ruby #nlp 

Lisa joly

Lisa joly

1624658400

PAID NETWORK Review, Is it worth Investing in? Token Sale Coming Soon !!

Hey guys, in this video I review PAID NETWORK. This is a DeFi project that aims to solve complex legal process using decentralised protocols and DeFi products for 2021.

PAID Network is an ecosystem DAPP that leverages blockchain technology to deliver DeFi powered SMART Agreements to make business exponentially more efficient. We allow users to create their own policy, to ensure they Get PAID.

📺 The video in this post was made by Crypto expat
The origin of the article: https://www.youtube.com/watch?v=ZIU5javfL90
🔺 DISCLAIMER: The article is for information sharing. The content of this video is solely the opinions of the speaker who is not a licensed financial advisor or registered investment advisor. Not investment advice or legal advice.
Cryptocurrency trading is VERY risky. Make sure you understand these risks and that you are responsible for what you do with your money
🔥 If you’re a beginner. I believe the article below will be useful to you ☞ What You Should Know Before Investing in Cryptocurrency - For Beginner
⭐ ⭐ ⭐The project is of interest to the community. Join to Get free ‘GEEK coin’ (GEEKCASH coin)!
☞ **-----CLICK HERE-----**⭐ ⭐ ⭐
Thanks for visiting and watching! Please don’t forget to leave a like, comment and share!

#bitcoin #blockchain #paid network #paid network review #token sale #paid network review, is it worth investing in? token sale coming soon !!

aaron silva

aaron silva

1622197808

SafeMoon Clone | Create A DeFi Token Like SafeMoon | DeFi token like SafeMoon

SafeMoon is a decentralized finance (DeFi) token. This token consists of RFI tokenomics and auto-liquidity generating protocol. A DeFi token like SafeMoon has reached the mainstream standards under the Binance Smart Chain. Its success and popularity have been immense, thus, making the majority of the business firms adopt this style of cryptocurrency as an alternative.

A DeFi token like SafeMoon is almost similar to the other crypto-token, but the only difference being that it charges a 10% transaction fee from the users who sell their tokens, in which 5% of the fee is distributed to the remaining SafeMoon owners. This feature rewards the owners for holding onto their tokens.

Read More @ https://bit.ly/3oFbJoJ

#create a defi token like safemoon #defi token like safemoon #safemoon token #safemoon token clone #defi token