1675318380
A tensor-based search and analytics engine that seamlessly integrates with applications and websites. Marqo allows developers to turbocharge search functionality with the latest machine learning models, in 3 lines of code.
⚡ Performance
🤖 Machine Learning
☁️ Cloud-native
🌌 End-to-end
🍱 Managed cloud
📗 Quick start | Build your first application with Marqo in under 5 minutes. |
🔍 What is tensor search? | A beginner's guide to the fundamentals of Marqo and tensor search. |
🖼 Marqo for image data | Building text-to-image search in Marqo in 5 lines of code. |
📚 Marqo for text | Building a multilingual database in Marqo. |
🔮 Integrating Marqo with GPT | Making GPT a subject matter expert by using Marqo as a knowledge base. |
🎨 Marqo for Creative AI | Combining stable diffusion with semantic search to generate and categorise 100k images of hotdogs. |
🦾 Features | Marqo's core features. |
Marqo requires docker. To install Docker go to the Docker Official website.. Ensure that docker has at least 8GB memory and 50GB storage.
Use docker to run Marqo (Mac users with M-series chips will need to go here):
docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:latest
pip install marqo
import marqo
mq = marqo.Client(url='http://localhost:8882')
mq.index("my-first-index").add_documents([
{
"Title": "The Travels of Marco Polo",
"Description": "A 13th-century travelogue describing Polo's travels"
},
{
"Title": "Extravehicular Mobility Unit (EMU)",
"Description": "The EMU is a spacesuit that provides environmental protection, "
"mobility, life support, and communications for astronauts",
"_id": "article_591"
}]
)
results = mq.index("my-first-index").search(
q="What is the best outfit to wear on the moon?", searchable_attributes=["Title", "Description"]
)
mq
is the client that wraps the marqo
APIadd_documents()
takes a list of documents, represented as python dicts for indexing.add_documents()
creates an index with default settings, if one does not already exist._id
field. Otherwise, Marqo will generate one.Let's have a look at the results:
# let's print out the results:
import pprint
pprint.pprint(results)
{
'hits': [
{
'Title': 'Extravehicular Mobility Unit (EMU)',
'Description': 'The EMU is a spacesuit that provides environmental protection, mobility, life support, and'
'communications for astronauts',
'_highlights': {
'Description': 'The EMU is a spacesuit that provides environmental protection, '
'mobility, life support, and communications for astronauts'
},
'_id': 'article_591',
'_score': 0.61938936
},
{
'Title': 'The Travels of Marco Polo',
'Description': "A 13th-century travelogue describing Polo's travels",
'_highlights': {'Title': 'The Travels of Marco Polo'},
'_id': 'e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a',
'_score': 0.60237324
}
],
'limit': 10,
'processingTimeMs': 49,
'query': 'What is the best outfit to wear on the moon?'
}
limit
is the maximum number of hits to be returned. This can be set as a parameter during search._highlights
field. This was the part of the document that matched the query the best.Retrieve a document by ID.
result = mq.index("my-first-index").get_document(document_id="article_591")
Note that by adding the document using add_documents
again using the same _id
will cause a document to be updated.
Get information about an index.
results = mq.index("my-first-index").get_stats()
Perform a keyword search.
result = mq.index("my-first-index").search('marco polo', search_method=marqo.SearchMethods.LEXICAL)
Using the default tensor search method.
result = mq.index("my-first-index").search('adventure', searchable_attributes=['Title'])
Delete documents.
results = mq.index("my-first-index").delete_documents(ids=["article_591", "article_602"])
Delete an index.
results = mq.index("my-first-index").delete()
To power image and text search, Marqo allows users to plug and play with CLIP models from HuggingFace. Note that if you do not configure multi modal search, image urls will be treated as strings. To start indexing and searching with images, first create an index with a CLIP configuration, as below:
settings = {
"treat_urls_and_pointers_as_images":True, # allows us to find an image file and index it
"model":"ViT-L/14"
}
response = mq.create_index("my-multimodal-index", **settings)
Images can then be added within documents as follows. You can use urls from the internet (for example S3) or from the disk of the machine:
response = mq.index("my-multimodal-index").add_documents([{
"My Image": "https://upload.wikimedia.org/wikipedia/commons/thumb/f/f2/Portrait_Hippopotamus_in_the_water.jpg/440px-Portrait_Hippopotamus_in_the_water.jpg",
"Description": "The hippopotamus, also called the common hippopotamus or river hippopotamus, is a large semiaquatic mammal native to sub-Saharan Africa",
"_id": "hippo-facts"
}])
Setting searchable_attributes
to the image field ['My Image']
ensures only images are searched in this index:
results = mq.index("my-multimodal-index").search('animal', searchable_attributes=['My Image'])
You can then search using text as usual. Both text and image fields will be searched:
results = mq.index("my-multimodal-index").search('animal')
Setting searchable_attributes
to the image field ['My Image']
ensures only images are searched in this index:
results = mq.index("my-multimodal-index").search('animal', searchable_attributes=['My Image'])
Searching using an image can be achieved by providing the image link.
results = mq.index("my-multimodal-index").search('https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Standing_Hippopotamus_MET_DP248993.jpg/440px-Standing_Hippopotamus_MET_DP248993.jpg')
The full documentation for Marqo can be found here https://docs.marqo.ai/.
Note that you should not run other applications on Marqo's Opensearch cluster as Marqo automatically changes and adapts the settings on the cluster.
Marqo does not yet support the docker-in-docker backend configuration for the arm64 architecture. This means that if you have an M series Mac, you will also need to run marqo's backend, marqo-os, locally.
To run Marqo on an M series Mac, follow the next steps.
In one terminal run the following command to start opensearch:
docker rm -f marqo-os; docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" marqoai/marqo-os:0.0.3-arm
In another terminal run the following command to launch Marqo:
docker rm -f marqo; docker run --name marqo --privileged \
-p 8882:8882 --add-host host.docker.internal:host-gateway \
-e "OPENSEARCH_URL=https://localhost:9200" \
marqoai/marqo:latest
Marqo is a community project with the goal of making tensor search accessible to the wider developer community. We are glad that you are interested in helping out! Please read this to get started.
Create a virtual env python -m venv ./venv
.
Activate the virtual environment source ./venv/bin/activate
.
Install requirements from the requirements file: pip install -r requirements.txt
.
Run tests by running the tox file. CD into this dir and then run "tox".
If you update dependencies, make sure to delete the .tox dir and rerun.
Run the full test suite (by using the command tox
in this dir).
Create a pull request with an attached github issue.
This readme is available in the following translations:
Author: marqo-ai
Source Code: https://github.com/marqo-ai/marqo
License: Apache-2.0 license
#searchengine #machinelearning #deeplearning #transform #python
1673641260
A block-based API for NSValueTransformer, with a growing collection of useful examples.
NSValueTransformer
, while perhaps obscure to most iOS programmers, remains a staple of OS X development. Before Objective-C APIs got in the habit of flinging block parameters hither and thither with reckless abandon, NSValueTransformer
was the go-to way to encapsulate mutation functionality --- especially when it came to Bindings.
NSValueTransformer
is convenient to use but a pain to set up. To create a value transformer you have to create a subclass, implement a handful of required methods, and register a singleton instance by name.
TransformerKit breathes new life into NSValueTransformer
by making them dead-simple to define and register:
NSString * const TTTCapitalizedStringTransformerName = @"TTTCapitalizedStringTransformerName";
[NSValueTransformer registerValueTransformerWithName:TTTCapitalizedStringTransformerName
transformedValueClass:[NSString class]
returningTransformedValueWithBlock:^id(id value) {
return [value capitalizedString];
}];
TransformerKit pairs nicely with InflectorKit and FormatterKit, providing well-designed APIs for manipulating user-facing content.
TransformerKit also contains a growing number of convenient transformers that your apps will love and cherish:
* - Reversible
Mattt (@mattt)
Author: Mattt
Source Code: https://github.com/mattt/TransformerKit
License: MIT license
1671004500
In this tutorial, you’ll learn how to transform the character case of a string — to uppercase, lowercase, and title case — using native JavaScript methods.
JavaScript provides many functions and methods that allow you to manipulate data for different purposes. We’ve recently looked at methods for converting a string to a number and a number to a string or to an ordinal, and for splitting strings. This article will present methods for transforming the character case of a string — which is useful for representing strings in a certain format or for reliable string comparison.
If you need your string in lowercase, you can use the toLowerCase()
method available on strings. This method returns the string with all its characters in lowercase.
For example:
const str = 'HeLlO';
console.log(str.toLowerCase()); // "hello"
console.log(str); // "HeLlo"
By using toLowerCase()
method on the str
variable, you can retrieve the same string with all the characters in lowercase. Notice that a new string is returned without affecting the value of str
.
If you need your string in uppercase, you can use the toUpperCase()
method available on strings. This method returns the string with all its characters in uppercase.
For example:
const str = 'HeLlO';
console.log(str.toUpperCase()); // "HELLO"
console.log(str); // "HeLlo"
By using toUpperCase()
method on the str
variable, you can retrieve the same string with all the characters in uppercase. Notice that a new string is returned without affecting the value of str
.
The most common use case for transforming a string’s case is transforming it to title case. This can be used to display names and headlines.
There are different ways to do this. One way is by using the method toUpperCase()
on the first character of the string, then concatenating it to the rest of the string. For example:
const str = 'hello';
console.log(str[0].toUpperCase() + str.substring(1).toLowerCase()); // "Hello"
In this example, you retrieve the first character using the 0
index on the str
variable. Then, you transform it to uppercase using the toUpperCase()
method. Finally, you retrieve the rest of the string using the substr()
method and concatinate the rest of the string to the first letter. You apply toLowerCase()
on the rest of the string to ensure that it’s in lowercase.
This only transforms the first letter of the word to uppercase. However, in some cases if you have a sentence you might want to transform every word in the sentence to uppercase. In that case, it’s better to use a function like this:
function toTitleCase (str) {
if (!str) {
return '';
}
const strArr = str.split(' ').map((word) => {
return word[0].toUpperCase() + word.substring(1).toLowerCase();
});
return strArr.join(' ');
}
const str = 'hello world';
console.log(toTitleCase(str)); // "Hello World"
The toTitleCase()
function accepts one parameter, which is the string to transform to title case.
In the function, you first check if the string is empty and in that case return an empty string.
Then, you split the string on the space delimiter, which returns an array. After that, you use the map method on the array to apply the transformation you saw in the previous example on each item in the array. This transforms every word to title case.
Finally, you join the items in the array into a string by the same space delimiter and return it.
In the following CodePen demo, you can try out the functionality of toLowerCase()
and toUpperCase()
. When you enter a string in the input, it’s transformed to both uppercase and lowercase and displayed. You can try using characters with different case in the string.
In many situations, you’ll need to compare strings before executing a block of code. If you can’t control the character case the string is being written in, performing comparison on the string without enforcing any character case can lead to unexpected results.
For example:
const input = document.querySelector('input[type=text]');
if (input.value === 'yes') {
alert('Thank you for agreeing!');
} else {
alert('We still like you anyway')
}
If the user enters in the input Yes
instead of yes
, the equality condition will fail and the wrong alert will show.
You can resolve this by enforcing a character case on the string:
const input = document.querySelector('input[type=text]');
if (input.value.toLowerCase() === 'yes') {
alert('Thank you for agreeing!');
} else {
alert('We still like you anyway')
}
It’s necessary to learn how to transform the character case of a string in JavaScript. You’ll often need to use it for many use cases, such as displaying the string in a certain format. You can also use it to reliably compare strings.
Enforcing a character case on the strings you’re comparing ensures that you can check if the content of the strings are equal, regardless of how they’re written.
Original article source at: https://www.sitepoint.com/
1669925700
Transformation is one of the RDD operation in spark before moving this first discuss about what actual Spark and RDD is.
Apache Spark is an open-source cluster computing framework. Its main objective is to manage the data created in real time.
Hadoop MapReduce was the foundation upon which Spark was developed. Unlike competing methods like Hadoop’s MapReduce, which writes and reads data to and from computer hard drives, it was optimized to run in memory. As a result, Spark processes the data far more quickly than other options.
The fundamental abstraction of Spark is the RDD (Resilient Distributed Dataset). It is a group of components that have been divided up across the cluster nodes so that we can process different parallel operations on it.
RDDs can be produced in one of two ways:
The RDD provides the two types of operations:
A Transformation is a function that generates new RDDs from existing RDDs, but when we want to work with the actual dataset, we perform an Action. When the action is triggered after the result, a new RDD is not formed in the same way that transformation is.
The role of transformation in Spark is to create a new dataset from an existing one. Lazy transformations are those that are computed only when an action requires a result to be returned to the driver programme.
When we call an action, transformations are executed since they are inherently lazy. Not right away are they carried out. There are two primary types of transformations: map() and filter ().
The outcome RDD is always distinct from the parent RDD after the transformation. It could be smaller (filter, count, distinct, sample, for example), bigger (flatMap(), union(), Cartesian()), or the same size (e.g. map).
In this section, I will explain a few RDD Transformations with word count example in scala, before we start first, let’s create an RDD by reading a text file. The text file used here is a dummy datasets you can use any datasets here.
val spark:SparkSession = SparkSession.builder()
.master("local[3]")
.appName("SparkByExamples.com")
.getOrCreate()
val sc = spark.sparkContext
val rdd:RDD[String] = sc.textFile("src/main/scala/test.txt")
After applying the function, the flatMap() transformation flattens the RDD and creates a new RDD. The example below first divides each record in an RDD by space before flattening it. Each entry in the resulting RDD only contains one word.
val rdd2 = rdd.flatMap(f=>f.split(" "))
Any complex actions, such as the addition of a column or the updating of a column, are applied using the map() transformation, and the output of these transformations always has the same amount of records as the input.
In our word count example, we are creating a new column and assigning a value of 1 to each word. The RDD produces a PairRDDFunction that has key-value pairs with the keys being words of type String and the values being 1 of type Int. I’ve defined the type of the rdd3 variable for your understanding.
val rdd3:RDD[(String,Int)]= rdd2.map(m=>(m,1))
The records in an RDD can be filtered with the filter() transformation. In our illustration, we are filtering out all terms that begin with “a.”
val rdd4 = rdd3.filter(a=> a._1.startsWith("a"))
The method supplied by reduceByKey() merges the values for each key. By using the sum function on value in our example, the word string is condensed. Our RDD’s output includes a count of the number of unique words.
val rdd5 = rdd3.reduceByKey(_ + _)
We can obtain the elements from both RDDs in the new RDD using the union() function. The two RDDs must be of the same type in order for this function to work.
For instance, if RDD1’s elements are Spark, Spark, Hadoop, and Flink, and RDD2’s elements are Big data, Spark, and Flink, the resulting rdd1.union(rdd2) will have the following elements: Spark, Spark, Spark, Hadoop, Flink, and Flink, Big data.
val rdd6 = rdd5.union(rdd3)
With the intersection() function, we get only the common element of both the RDD in new RDD. The key rule of this function is that the two RDDs should be of the same type.
val rdd7 = rdd1.intersection(rdd2)
In this Spark RDD Transformations blog, you have learned different transformation functions and their usage with scala examples. In the next blog, we will learn about actions.
Happy Learning !!
Original article source at: https://blog.knoldus.com/
1664798760
Curvelet.jl - The 2D Curvlet Transform
The curvelet transform is a fairly recent image processing technique that is able to easily approximate curves present in images. This package is an implementation of the “Uniform Discrete Curvelet Transform” as described in “Uniform Discrete Curvelet Transform” by Truong T. Nguyen and Hervé Chauris.
Basic usage is as follows:
require("src/Curvelet.jl")
x = rand(128,128)
X = Curvelet.curveletTransform(x)
y = Curvelet.inverseCurveletTransform(X)
Currently this transform works only for a simple class of inputs: square images with dimensions that are powers of two in length and at least 16x16.
Author: Fundamental
Source Code: https://github.com/fundamental/Curvelet.jl
License: View license
1660450140
ShareDB is a realtime database backend based on Operational Transformation (OT) of JSON documents. It is the realtime backend for the DerbyJS web application framework.
For help, questions, discussion and announcements, join the ShareJS mailing list or read the documentation.
Please report any bugs you find to the issue tracker.
The documentation is stored as Markdown files, but sometimes it can be useful to run these locally. The docs are served using Jekyll, and require Ruby >2.4.0 and Bundler:
gem install jekyll bundler
The docs can be built locally and served with live reload:
npm run docs:install
npm run docs:start
https://share.github.io/sharedb/
Author: Share
Source Code: https://github.com/share/sharedb
License: View license
1659722100
CoordinateTransformations
CoordinateTransformations is a Julia package to manage simple or complex networks of coordinate system transformations. Transformations can be easily applied, inverted, composed, and differentiated (both with respect to the input coordinates and with respect to transformation parameters such as rotation angle). Transformations are designed to be light-weight and efficient enough for, e.g., real-time graphical applications, while support for both explicit and automatic differentiation makes it easy to perform optimization and therefore ideal for computer vision applications such as SLAM (simultaneous localization and mapping).
The package provide two main pieces of functionality
Primarily, an interface for defining Transformation
s and applying (by calling), inverting (inv()
), composing (∘
or compose()
) and differentiating (transform_deriv()
and transform_deriv_params()
) them.
A small set of built-in, composable, primitive transformations for transforming 2D and 3D points (optionally leveraging the StaticArrays and Rotations packages).
Let's translate a 3D point:
using CoordinateTransformations, Rotations, StaticArrays
x = SVector(1.0, 2.0, 3.0) # SVector is provided by StaticArrays.jl
trans = Translation(3.5, 1.5, 0.0)
y = trans(x)
We can either apply different transformations in turn,
rot = LinearMap(RotX(0.3)) # Rotate 0.3 radians about X-axis, from Rotations.jl
z = trans(rot(x))
or build a composed transformation using the ∘
operator (accessible at the REPL by typing \circ
then tab):
composed = trans ∘ rot # alternatively, use compose(trans, rot)
composed(x) == z
A composition of a Translation
and a LinearMap
results in an AffineMap
.
We can invert the transformation:
composed_inv = inv(composed)
composed_inv(z) == x
For any transformation, we can shift the origin to a new point using recenter
:
rot_around_x = recenter(rot, x)
Now rot_around_x
is a rotation around the point x = SVector(1.0, 2.0, 3.0)
.
Finally, we can construct a matrix describing how the components of z
differentiates with respect to components of x
:
∂z_∂x = transform_deriv(composed, x) # In general, the transform may be non-linear, and thus we require the value of x to compute the derivative
Or perhaps we want to know how y
will change with respect to changes of to the translation parameters:
∂y_∂θ = transform_deriv_params(trans, x)
Transformations are derived from Transformation
. As an example, we have Translation{T} <: Transformation
. A Translation
will accept and translate points in a variety of formats, such as Vector
or SVector
, but in general your custom-defined Transformation
s could transform any Julia object.
Transformations can be reversed using inv(trans)
. They can be chained together using the ∘
operator (trans1 ∘ trans2
) or compose
function (compose(trans1, trans2)
). In this case, trans2
is applied first to the data, before trans1
. Composition may be intelligent, for instance by precomputing a new Translation
by summing the elements of two existing Translation
s, and yet other transformations may compose to the IdentityTransformation
. But by default, composition will result in a ComposedTransformation
object which simply dispatches to apply the transformations in the correct order.
Finally, the matrix describing how differentials propagate through a transform can be calculated with the transform_deriv(trans, x)
method. The derivatives of how the output depends on the transformation parameters is accessed via transform_deriv_params(trans, x)
. Users currently have to overload these methods, as no fall-back automatic differentiation is currently included. Alternatively, all the built-in types and transformations are compatible with automatic differentiation techniques, and can be parameterized by DualNumbers' DualNumber
or ForwardDiff's Dual
.
A small number of 2D and 3D coordinate systems and transformations are included. We also have IdentityTransformation
and ComposedTransformation
, which allows us to nest together arbitrary transformations to create a complex yet efficient transformation chain.
The package accepts any AbstractVector
type for Cartesian coordinates (as well as FixedSizeArrays types in Julia v0.4 only). For speed, we recommend using a statically-sized container such as SVector{N}
from StaticArrays.
We do provide a few specialist coordinate types. The Polar(r, θ)
type is a 2D polar representation of a point, and similarly in 3D we have defined Spherical(r, θ, ϕ)
and Cylindrical(r, θ, z)
.
Two-dimensional coordinates may be converted using these parameterless (singleton) transformations:
PolarFromCartesian()
CartesianFromPolar()
Three-dimensional coordinates may be converted using these parameterless transformations:
SphericalFromCartesian()
CartesianFromSpherical()
SphericalFromCylindrical()
CylindricalFromSpherical()
CartesianFromCylindrical()
CylindricalFromCartesian()
However, you may find it simpler to use the convenience constructors like Polar(SVector(1.0, 2.0))
.
Translations can be be applied to Cartesian coordinates in arbitrary dimensions, by e.g. Translation(Δx, Δy)
or Translation(Δx, Δy, Δz)
in 2D/3D, or by Translation(Δv)
in general (with Δv
an AbstractVector
). Compositions of two Translation
s will intelligently create a new Translation
by adding the translation vectors.
Linear transformations (a.k.a. linear maps), including rotations, can be encapsulated in the LinearMap
type, which is a simple wrapper of an AbstractMatrix
.
You are able to provide any matrix of your choosing, but your choice of type will have a large effect on speed. For instance, if you know the dimensionality of your points (e.g. 2D or 3D) you might consider a statically sized matrix like SMatrix
from StaticArrays.jl. We recommend performing 3D rotations using those from Rotations.jl for their speed and flexibility. Scaling will be efficient with Julia's built-in UniformScaling
. Also note that compositions of two LinearMap
s will intelligently create a new LinearMap
by multiplying the transformation matrices.
An Affine map encapsulates a more general set of transformation which are defined by a composition of a translation and a linear transformation. An AffineMap
is constructed from an AbstractVector
translation v
and an AbstractMatrix
linear transformation M
. It will perform the mapping x -> M*x + v
, but the order of addition and multiplication will be more obvious (and controllable) if you construct it from a composition of a linear map and a translation, e.g. Translation(v) ∘ LinearMap(v)
(or any combination of LinearMap
, Translation
and AffineMap
).
The perspective transformation maps real-space coordinates to those on a virtual "screen" of one lesser dimension. For instance, this process is used to render 3D scenes to 2D images in computer generated graphics and games. It is an ideal model of how a pinhole camera operates and is a good approximation of the modern photography process.
The PerspectiveMap()
command creates a Transformation
to perform the projective mapping. It can be applied individually, but is particularly powerful when composed with an AffineMap
containing the position and orientation of the camera in your scene. For example, to transfer points
in 3D space to 2D screen_points
giving their projected locations on a virtual camera image, you might use the following code:
cam_transform = PerspectiveMap() ∘ inv(AffineMap(cam_rotation, cam_position))
screen_points = map(cam_transform, points)
There is also a cameramap()
convenience function that can create a composed transformation that includes the intrinsic scaling (e.g. focal length and pixel size) and offset (defining which pixel is labeled (0,0)
) of an imaging system.
Author: JuliaGeometry
Source Code: https://github.com/JuliaGeometry/CoordinateTransformations.jl
License: View license
1659511140
:warning: | This gem is now in [passive maintenance mode][passive]. [(more)][passive] |
Making HTML emails comfortable for the Ruby rockstars
Roadie tries to make sending HTML emails a little less painful by inlining stylesheets and rewriting relative URLs for you inside your emails.
Email clients have bad support for stylesheets, and some of them blocks stylesheets from downloading. The easiest way to handle this is to work with inline styles (style="..."
), but that is error prone and hard to work with as you cannot use classes and/or reuse styling over your HTML.
This gem makes this easier by automatically inlining stylesheets into the document. You give Roadie your CSS, or let it find it by itself from the <link>
and <style>
tags in the markup, and it will go through all of the selectors assigning the styles to the matching elements. Careful attention has been put into selectors being applied in the correct order, so it should behave just like in the browser.
"Dynamic" selectors (:hover
, :visited
, :focus
, etc.), or selectors not understood by Nokogiri will be inlined into a single <style>
element for those email clients that support it. This changes specificity a great deal for these rules, so it might not work 100% out of the box. (See more about this below)
Roadie also rewrites all relative URLs in the email to an absolute counterpart, making images you insert and those referenced in your stylesheets work. No more headaches about how to write the stylesheets while still having them work with emails from your acceptance environments. You can disable this on specific elements using a data-roadie-ignore
marker.
!important
styles.style
attribute of tags.:hover
, @media { ... }
and friends around in a separate <style>
element.href
s and img
src
s absolute.data-roadie-ignore
markers before finishing the HTML.Add this gem to your Gemfile as recommended by Rubygems and run bundle install
.
gem 'roadie', '~> 4.0'
Your document instance can be configured with several options:
url_options
- Dictates how absolute URLs should be built.keep_uninlinable_css
- Set to false to skip CSS that cannot be inlined.merge_media_queries
- Set to false to not group media queries. Some users might prefer to not group rules within media queries because it will result in rules getting reordered. e.g.@media(max-width: 600px) { .col-6 { display: block; } }
@media(max-width: 400px) { .col-12 { display: inline-block; } }
@media(max-width: 600px) { .col-12 { display: block; } }
@media(max-width: 600px) { .col-6 { display: block; } .col-12 { display: block; } }
@media(max-width: 400px) { .col-12 { display: inline-block; } }
asset_providers
- A list of asset providers that are invoked when CSS files are referenced. See below.external_asset_providers
- A list of asset providers that are invoked when absolute CSS URLs are referenced. See below.before_transformation
- A callback run before transformation starts.after_transformation
- A callback run after transformation is completed.In order to make URLs absolute you need to first configure the URL options of the document.
html = '... <a href="/about-us">Read more!</a> ...'
document = Roadie::Document.new html
document.url_options = {host: "myapp.com", protocol: "https"}
document.transform
# => "... <a href=\"https://myapp.com/about-us\">Read more!</a> ..."
The following URLs will be rewritten for you:
a[href]
(HTML)img[src]
(HTML)url()
(CSS)You can disable individual elements by adding an data-roadie-ignore
marker on them. CSS will still be inlined on those elements, but URLs will not be rewritten.
<a href="|UNSUBSCRIBE_URL|" data-roadie-ignore>Unsubscribe</a>
By default, style
and link
elements in the email document's head
are processed along with the stylesheets and removed from the head
.
You can set a special data-roadie-ignore
attribute on style
and link
tags that you want to ignore (the attribute will be removed, however). This is the place to put things like :hover
selectors that you want to have for email clients allowing them.
Style and link elements with media="print"
are also ignored.
<head>
<link rel="stylesheet" type="text/css" href="/assets/emails/rock.css"> <!-- Will be inlined with normal providers -->
<link rel="stylesheet" type="text/css" href="http://www.metal.org/metal.css"> <!-- Will be inlined with external providers, *IF* specified; otherwise ignored. -->
<link rel="stylesheet" type="text/css" href="/assets/jazz.css" media="print"> <!-- Will NOT be inlined; print style -->
<link rel="stylesheet" type="text/css" href="/ambient.css" data-roadie-ignore> <!-- Will NOT be inlined; ignored -->
<style></style> <!-- Will be inlined -->
<style data-roadie-ignore></style> <!-- Will NOT be inlined; ignored -->
</head>
Roadie will use the given asset providers to look for the actual CSS that is referenced. If you don't change the default, it will use the Roadie::FilesystemProvider
which looks for stylesheets on the filesystem, relative to the current working directory.
Example:
# /home/user/foo/stylesheets/primary.css
body { color: green; }
# /home/user/foo/script.rb
html = <<-HTML
<html>
<head>
<link rel="stylesheet" type="text/css" href="/stylesheets/primary.css">
</head>
<body>
</body>
</html>
HTML
Dir.pwd # => "/home/user/foo"
document = Roadie::Document.new html
document.transform # =>
# <!DOCTYPE html>
# <html>
# <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head>
# <body style="color:green;"></body>
# </html>
If a referenced stylesheet cannot be found, the #transform
method will raise an Roadie::CssNotFound
error. If you instead want to ignore missing stylesheets, you can use the NullProvider
.
You can write your own providers if you need very specific behavior for your app, or you can use the built-in providers. Providers come in two groups: normal and external. Normal providers handle paths without host information (/style/foo.css
) while external providers handle URLs with host information (//example.com/foo.css
, localhost:3001/bar.css
, and so on).
The default configuration is to not have any external providers configured, which will cause those referenced stylesheets to be ignored. Adding one or more providers for external assets causes all of them to be searched and inlined, so if you only want this to happen to specific stylesheets you need to add ignore markers to every other styleshheet (see above).
Included providers:
FilesystemProvider
– Looks for files on the filesystem, relative to the given directory unless otherwise specified.ProviderList
– Wraps a list of other providers and searches them in order. The asset_providers
setting is an instance of this. It behaves a lot like an array, so you can push, pop, shift and unshift to it.NullProvider
– Does not actually provide anything, it always finds empty stylesheets. Use this in tests or if you want to ignore stylesheets that cannot be found by your other providers (or if you want to force the other providers to never run).NetHttpProvider
– Downloads stylesheets using Net::HTTP
. Can be given a whitelist of hosts to download from.CachedProvider
– Wraps another provider (or ProviderList
) and caches responses inside the provided cache store.PathRewriterProvider
– Rewrites the passed path and then passes it on to another provider (or ProviderList
).If you want to search several locations on the filesystem, you can declare that:
document.asset_providers = [
Roadie::FilesystemProvider.new(App.root.join("resources", "stylesheets")),
Roadie::FilesystemProvider.new(App.root.join("system", "uploads", "stylesheets")),
]
NullProvider
If you want to ignore stylesheets that cannot be found instead of crashing, push the NullProvider
to the end:
# Don't crash on missing assets
document.asset_providers << Roadie::NullProvider.new
# Don't download assets in tests
document.external_asset_providers.unshift Roadie::NullProvider.new
Note: This will cause the referenced stylesheet to be removed from the source code, so email client will never see it either.
NetHttpProvider
The NetHttpProvider
will download the URLs that is is given using Ruby's standard Net::HTTP
library.
You can give it a whitelist of hosts that downloads are allowed from:
document.external_asset_providers << Roadie::NetHttpProvider.new(
whitelist: ["myapp.com", "assets.myapp.com", "cdn.cdnnetwork.co.jp"],
)
document.external_asset_providers << Roadie::NetHttpProvider.new # Allows every host
CachedProvider
You might want to cache providers from working several times. If you are sending several emails quickly from the same process, this might also save a lot of time on parsing the stylesheets if you use in-memory storage such as a hash.
You can wrap any other kind of providers with it, even a ProviderList
:
document.external_asset_providers = Roadie::CachedProvider.new(document.external_asset_providers, my_cache)
If you don't pass a cache backend, it will use a normal Hash
. The cache store must follow this protocol:
my_cache["key"] = some_stylesheet_instance # => #<Roadie::Stylesheet instance>
my_cache["key"] # => #<Roadie::Stylesheet instance>
my_cache["missing"] # => nil
Warning: The default Hash
store will never be cleared, so make sure you don't allow the number of unique asset paths to grow too large in a single run. This is especially important if you run Roadie in a daemon that accepts arbritary documents, and/or if you use hash digests in your filenames. Making a new instance of CachedProvider
will use a new Hash
instance.
You can implement your own custom cache store by implementing the []
and []=
methods.
class MyRoadieMemcacheStore
def initialize(memcache)
@memcache = memcache
end
def [](path)
css = memcache.read("assets/#{path}/css")
if css
name = memcache.read("assets/#{path}/name") || "cached #{path}"
Roadie::Stylesheet.new(name, css)
end
end
def []=(path, stylesheet)
memcache.write("assets/#{path}/css", stylesheet.to_s)
memcache.write("assets/#{path}/name", stylesheet.name)
stylesheet # You need to return the set Stylesheet
end
end
document.external_asset_providers = Roadie::CachedProvider.new(
document.external_asset_providers,
MyRoadieMemcacheStore.new(MemcacheClient.instance)
)
If you are using Rspec, you can test your implementation by using the shared examples for the "roadie cache store" role:
require "roadie/rspec"
describe MyRoadieMemcacheStore do
let(:memcache_client) { MemcacheClient.instance }
subject { MyRoadieMemcacheStore.new(memcache_client) }
it_behaves_like "roadie cache store" do
before { memcache_client.clear }
end
end
PathRewriterProvider
With this provider, you can rewrite the paths that are searched in order to more easily support another provider. Examples could include rewriting absolute URLs into something that can be found on the filesystem, or to access internal hosts instead of external ones.
filesystem = Roadie::FilesystemProvider.new("assets")
document.asset_providers << Roadie::PathRewriterProvider.new(filesystem) do |path|
path.sub('stylesheets', 'css').downcase
end
document.external_asset_providers = Roadie::PathRewriterProvider.new(filesystem) do |url|
if url =~ /myapp\.com/
URI.parse(url).path.sub(%r{^/assets}, '')
else
url
end
end
You can also wrap a list, for example to implement external_asset_providers
by composing the normal asset_providers
:
document.external_asset_providers =
Roadie::PathRewriterProvider.new(document.asset_providers) do |url|
URI.parse(url).path
end
Writing your own provider is also easy. You need to provide:
#find_stylesheet(name)
, returning either a Roadie::Stylesheet
or nil
.#find_stylesheet!(name)
, returning either a Roadie::Stylesheet
or raising Roadie::CssNotFound
.class UserAssetsProvider
def initialize(user_collection)
@user_collection = user_collection
end
def find_stylesheet(name)
if name =~ %r{^/users/(\d+)\.css$}
user = @user_collection.find_user($1)
Roadie::Stylesheet.new("user #{user.id} stylesheet", user.stylesheet)
end
end
def find_stylesheet!(name)
find_stylesheet(name) or
raise Roadie::CssNotFound.new(
css_name: name, message: "does not match a user stylesheet", provider: self
)
end
# Instead of implementing #find_stylesheet!, you could also:
# include Roadie::AssetProvider
# That will give you a default implementation without any error message. If
# you have multiple error cases, it's recommended that you implement
# #find_stylesheet! without #find_stylesheet and raise with an explanatory
# error message.
end
# Try to look for a user stylesheet first, then fall back to normal filesystem lookup.
document.asset_providers = [
UserAssetsProvider.new(app),
Roadie::FilesystemProvider.new('./stylesheets'),
]
You can test for compliance by using the built-in RSpec examples:
require 'spec_helper'
require 'roadie/rspec'
describe MyOwnProvider do
# Will use the default `subject` (MyOwnProvider.new)
it_behaves_like "roadie asset provider", valid_name: "found.css", invalid_name: "does_not_exist.css"
# Extra setup just for these tests:
it_behaves_like "roadie asset provider", valid_name: "found.css", invalid_name: "does_not_exist.css" do
subject { MyOwnProvider.new(...) }
before { stub_dependencies }
end
end
Some CSS is impossible to inline properly. :hover
and ::after
comes to mind. Roadie tries its best to keep these around by injecting them inside a new <style>
element in the <head>
(or at the beginning of the partial if transforming a partial document).
The problem here is that Roadie cannot possible adjust the specificity for you, so they will not apply the same way as they did before the styles were inlined.
Another caveat is that a lot of email clients does not support this (which is the entire point of inlining in the first place), so don't put anything important in here. Always handle the case of these selectors not being part of the email.
Inlined styles will have much higher specificity than styles in a <style>
. Here's an example:
<style>p:hover { color: blue; }</style>
<p style="color: green;">Hello world</p>
When hovering over this <p>
, the color will not change as the color: green
rule takes precedence. You can get it to work by adding !important
to the :hover
rule.
It would be foolish to try to automatically inject !important
on every rule automatically, so this is a manual process.
If you'd rather skip this and have the styles not possible to inline disappear, you can turn off this feature by setting the keep_uninlinable_css
option to false.
document.keep_uninlinable_css = false
Callbacks allow you to do custom work on documents before they are transformed. The Nokogiri document tree is passed to the callable along with the Roadie::Document
instance:
class TrackNewsletterLinks
def call(dom, document)
dom.css("a").each { |link| fix_link(link) }
end
def fix_link(link)
divider = (link['href'] =~ /?/ ? '&' : '?')
link['href'] = link['href'] + divider + 'source=newsletter'
end
end
document.before_transformation = ->(dom, document) {
logger.debug "Inlining document with title #{dom.at_css('head > title').try(:text)}"
}
document.after_transformation = TrackNewsletterLinks.new
You can configure the underlying HTML/XML engine to output XHTML or HTML (which is the default). One usecase for this is that {
tokens usually gets escaped to {
, which would be a problem if you then pass the resulting HTML on to some other templating engine that uses those tokens (like Handlebars or Mustache).
document.mode = :xhtml
This will also affect the emitted <!DOCTYPE>
if transforming a full document. Partial documents does not have a <!DOCTYPE>
.
Tested with Github CI using:
Let me know if you want any other runtime supported officially.
This project follows Semantic Versioning and has been since version 1.0.0.
Roadie uses Nokogiri to parse and regenerate the HTML of your email, which means that some unintentional changes might show up.
One example would be that Nokogiri might remove your
s in some cases.
Another example is Nokogiri's lack of HTML5 support, so certain new element might have spaces removed. I recommend you don't use HTML5 in emails anyway because of bad email client support (that includes web mail!).
Roadie uses Nokogiri to parse the HTML of your email, so any C-like problems like segfaults are likely in that end. The best way to fix this is to first upgrade libxml2 on your system and then reinstall Nokogiri. Instructions on how to do this on most platforms, see Nokogiri's official install guide.
@keyframes
?The CSS Parser used in Roadie does not handle keyframes. I don't think any email clients do either, but if you want to keep on trying you can add them manually to a <style>
element (or a separate referenced stylesheet) and tell Roadie not to touch them.
@media
queries are reordered, how can I fix this?Different @media
query blocks with the same conditions are merged by default, which will change the order in some cases. You can disable this by setting merge_media_queries
to false
. (See Install & Usage section above).
<body>
elements that are added?It sounds like you want to transform a partial document. Maybe you are building partials or template fragments to later place in other documents. Use Document#transform_partial
instead of Document#transform
in order to treat the HTML as a partial document.
If you add the data-roadie-ignore
attribute on an element, URL rewriting will not be performed on that element. This could be really useful for you if you intend to send the email through some other rendering pipeline that replaces some placeholders/variables.
<a href="/about-us">About us</a>
<a href="|UNSUBSCRIBE_URL|" data-roadie-ignore>Unsubscribe</a>
Note that this will not skip CSS inlining on the element; it will still get the correct styles applied.
If the URL is invalid on purpose, see Can I skip URL rewriting on a specific element? above. Otherwise, you can try to parse it yourself using Ruby's URI
class and see if you can figure it out.
require "uri"
URI.parse("https://example.com/best image.jpg") # raises
URI.parse("https://example.com/best%20image.jpg") # Works!
bundle install
rake
Roadie is set up with the assumption that all CSS and HTML passing through it is under your control. It is not recommended to run arbritary HTML with the default settings.
Care has been given to try to secure all file system accesses, but it is never guaranteed that someone cannot access something they should not be able to access.
In order to secure Roadie against file system access, only use your own asset providers that you yourself can secure against your particular environment.
If you have found any security vulnerability, please email me at magnus.bergmark+security@gmail.com
to disclose it. For very sensitive issues, please use my public GPG key. You can also encrypt your message with my public key and open an issue if you do not want to email me directly. Thank you.
This gem was previously tied to Rails. It is now framework-agnostic and supports any type of HTML documents. If you want to use it with Rails, check out roadie-rails.
Major contributors to Roadie:
You can see all contributors on GitHub.
(The MIT License)
Copyright (c) 2009-2022 Magnus Bergmark, Jim Neath / Purify, and contributors.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the ‘Software’), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED ‘AS IS’, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Author: Mange
Source code: https://github.com/Mange/roadie
License: MIT license
1658802077
Using Natural Language Processing, we make use of the text data available across the internet to generate insights for the business. In order to understand this huge amount of data and make insights from them, we need to make them usable. Natural language processing helps us to do so.
Bag of words is a Natural Language Processing technique of text modelling. In technical terms, we can say that it is a method of feature extraction with text data. This approach is a simple and flexible way of extracting features from documents.
A bag of words is a representation of text that describes the occurrence of words within a document. We just keep track of word counts and disregard the grammatical details and the word order. It is called a “bag” of words because any information about the order or structure of words in the document is discarded. The model is only concerned with whether known words occur in the document, not where in the document.
So, why bag-of-words, what is wrong with the simple and easy text?
One of the biggest problems with text is that it is messy and unstructured, and machine learning algorithms prefer structured, well defined fixed-length inputs and by using the Bag-of-Words technique we can convert variable-length texts into a fixed-length vector.
Also, at a much granular level, the machine learning models work with numerical data rather than textual data. So to be more specific, by using the bag-of-words (BoW) technique, we convert a text into its equivalent vector of numbers.
Let us see an example of how the bag of words technique converts text into vectors
Sentence 1: ”Welcome to Great Learning, Now start learning”
Sentence 2: “Learning is a good practice”
Sentence 1 | Sentence 2 |
Welcome | Learning |
to | is |
Great | a |
Learning | good |
, | practice |
Now | |
start | |
learning |
Step 1: Go through all the words in the above text and make a list of all of the words in our model vocabulary.
Note that the words ‘Learning’ and ‘ learning’ are not the same here because of the difference in their cases and hence are repeated. Also, note that a comma ‘ , ’ is also taken in the list.
Because we know the vocabulary has 12 words, we can use a fixed-length document-representation of 12, with one position in the vector to score each word.
The scoring method we use here is to count the presence of each word and mark 0 for absence. This scoring method is used more generally.
The scoring of sentence 1 would look as follows:
Word | Frequency |
Welcome | 1 |
to | 1 |
Great | 1 |
Learning | 1 |
, | 1 |
Now | 1 |
start | 1 |
learning | 1 |
is | 0 |
a | 0 |
good | 0 |
practice | 0 |
Writing the above frequencies in the vector
Sentence 1 ➝ [ 1,1,1,1,1,1,1,1,0,0,0 ]
Now for sentence 2, the scoring would like
Word | Frequency |
Welcome | 0 |
to | 0 |
Great | 0 |
Learning | 1 |
, | 0 |
Now | 0 |
start | 0 |
learning | 0 |
is | 1 |
a | 1 |
good | 1 |
practice | 1 |
Similarly, writing the above frequencies in the vector form
Sentence 2 ➝ [ 0,0,0,0,0,0,0,1,1,1,1,1 ]
Sentence | Welcome | to | Great | Learning | , | Now | start | learning | is | a | good | practice |
Sentence1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
Sentence2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
But is this the best way to perform a bag of words. The above example was not the best example of how to use a bag of words. The words Learning and learning, although having the same meaning are taken twice. Also, a comma ’,’ which does not convey any information is also included in the vocabulary.
Let us make some changes and see how we can use ‘bag of words in a more effective way.
Sentence 1: ”Welcome to Great Learning, Now start learning”
Sentence 2: “Learning is a good practice”
Step 1: Convert the above sentences in lower case as the case of the word does not hold any information.
Step 2: Remove special characters and stopwords from the text. Stopwords are the words that do not contain much information about text like ‘is’, ‘a’,’the and many more’.
After applying the above steps, the sentences are changed to
Sentence 1: ”welcome great learning now start learning”
Sentence 2: “learning good practice”
Although the above sentences do not make much sense the maximum information is contained in these words only.
Step 3: Go through all the words in the above text and make a list of all of the words in our model vocabulary.
Now as the vocabulary has only 7 words, we can use a fixed-length document-representation of 7, with one position in the vector to score each word.
The scoring method we use here is the same as used in the previous example. For sentence 1, the count of words is as follow:
Word | Frequency |
welcome | 1 |
great | 1 |
learning | 2 |
now | 1 |
start | 1 |
good | 0 |
practice | 0 |
Writing the above frequencies in the vector
Sentence 1 ➝ [ 1,1,2,1,1,0,0 ]
Now for sentence 2, the scoring would be like
Word | Frequency |
welcome | 0 |
great | 0 |
learning | 1 |
now | 0 |
start | 0 |
good | 1 |
practice | 1 |
Similarly, writing the above frequencies in the vector form
Sentence 2 ➝ [ 0,0,1,0,0,1,1 ]
Sentence | welcome | great | learning | now | start | good | practice |
Sentence1 | 1 | 1 | 2 | 1 | 1 | 0 | 0 |
Sentence2 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
The approach used in example two is the one that is generally used in the Bag-of-Words technique, the reason being that the datasets used in Machine learning are tremendously large and can contain vocabulary of a few thousand or even millions of words. Hence, preprocessing the text before using bag-of-words is a better way to go.
In the examples above we use all the words from vocabulary to form a vector, which is neither a practical way nor the best way to implement the BoW model. In practice, only a few words from the vocabulary, more preferably most common words are used to form the vector.
In this section, we are going to implement a bag of words algorithm with Python. Also, this is a very basic implementation to understand how bag of words algorithm work, so I would not recommend using this in your project, instead use the method described in the next section.
def vectorize(tokens):
''' This function takes list of words in a sentence as input
and returns a vector of size of filtered_vocab.It puts 0 if the
word is not present in tokens and count of token if present.'''
vector=[]
for w in filtered_vocab:
vector.append(tokens.count(w))
return vector
def unique(sequence):
'''This functions returns a list in which the order remains
same and no item repeats.Using the set() function does not
preserve the original ordering,so i didnt use that instead'''
seen = set()
return [x for x in sequence if not (x in seen or seen.add(x))]
#create a list of stopwords.You can import stopwords from nltk too
stopwords=["to","is","a"]
#list of special characters.You can use regular expressions too
special_char=[",",":"," ",";",".","?"]
#Write the sentences in the corpus,in our case, just two
string1="Welcome to Great Learning , Now start learning"
string2="Learning is a good practice"
#convert them to lower case
string1=string1.lower()
string2=string2.lower()
#split the sentences into tokens
tokens1=string1.split()
tokens2=string2.split()
print(tokens1)
print(tokens2)
#create a vocabulary list
vocab=unique(tokens1+tokens2)
print(vocab)
#filter the vocabulary list
filtered_vocab=[]
for w in vocab:
if w not in stopwords and w not in special_char:
filtered_vocab.append(w)
print(filtered_vocab)
#convert sentences into vectords
vector1=vectorize(tokens1)
print(vector1)
vector2=vectorize(tokens2)
print(vector2)
Output:
We can use the CountVectorizer() function from the Sk-learn library to easily implement the above BoW model using Python.
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
sentence_1="This is a good job.I will not miss it for anything"
sentence_2="This is not good at all"
CountVec = CountVectorizer(ngram_range=(1,1), # to use bigrams ngram_range=(2,2)
stop_words='english')
#transform
Count_data = CountVec.fit_transform([sentence_1,sentence_2])
#create dataframe
cv_dataframe=pd.DataFrame(Count_data.toarray(),columns=CountVec.get_feature_names())
print(cv_dataframe)
Again same questions, what are n-grams and why do we use them? Let us understand this with an example below-
Sentence 1: “This is a good job. I will not miss it for anything”
Sentence 2: ”This is not good at all”
For this example, let us take the vocabulary of 5 words only. The five words being-
So, the respective vectors for these sentences are:
“This is a good job. I will not miss it for anything”=[1,1,1,1,0]
”This is not good at all”=[1,0,0,1,1]
Can you guess what is the problem here? Sentence 2 is a negative sentence and sentence 1 is a positive sentence. Does this reflect in any way in the vectors above? Not at all. So how can we solve this problem? Here come the N-grams to our rescue.
An N-gram is an N-token sequence of words: a 2-gram (more commonly called a bigram) is a two-word sequence of words like “really good”, “not good”, or “your homework”, and a 3-gram (more commonly called a trigram) is a three-word sequence of words like “not at all”, or “turn off light”.
For example, the bigrams in the first line of text in the previous section: “This is not good at all” are as follows:
Now if instead of using just words in the above example, we use bigrams (Bag-of-bigrams) as shown above. The model can differentiate between sentence 1 and sentence 2. So, using bi-grams makes tokens more understandable (for example, “HSR Layout”, in Bengaluru, is more informative than “HSR” and “layout”)
So we can conclude that a bag-of-bigrams representation is much more powerful than bag-of-words, and in many cases proves very hard to beat.
The scoring method being used above takes the count of each word and represents the word in the vector by the number of counts of that particular word. What does a word having high word count signify?
Does this mean that the word is important in retrieving information about documents? The answer is NO. Let me explain, if a word occurs many times in a document but also along with many other documents in our dataset, maybe it is because this word is just a frequent word; not because it is relevant or meaningful.
One approach is to rescale the frequency of words by how often they appear in all documents so that the scores for frequent words like “the” that are also frequent across all documents are penalized. This approach is called term frequency-inverse document frequency or shortly known as Tf-Idf approach of scoring.TF-IDF is intended to reflect how relevant a term is in a given document. So how is Tf-Idf of a document in a dataset calculated?
TF-IDF for a word in a document is calculated by multiplying two different metrics:
The term frequency (TF) of a word in a document. There are several ways of calculating this frequency, with the simplest being a raw count of instances a word appears in a document. Then, there are other ways to adjust the frequency. For example, by dividing the raw count of instances of a word by either length of the document, or by the raw frequency of the most frequent word in the document. The formula to calculate Term-Frequency is
TF(i,j)=n(i,j)/Σ n(i,j)
Where,
n(i,j )= number of times nth word occurred in a document
Σn(i,j) = total number of words in a document.
The inverse document frequency(IDF) of the word across a set of documents. This suggests how common or rare a word is in the entire document set. The closer it is to 0, the more common is the word. This metric can be calculated by taking the total number of documents, dividing it by the number of documents that contain a word, and calculating the logarithm.
So, if the word is very common and appears in many documents, this number will approach 0. Otherwise, it will approach 1.
Multiplying these two numbers results in the TF-IDF score of a word in a document. The higher the score, the more relevant that word is in that particular document.
To put it in mathematical terms, the TF-IDF score is calculated as follows:
IDF=1+log(N/dN)
Where
N=Total number of documents in the dataset
dN=total number of documents in which nth word occur
Also, note that the 1 added in the above formula is so that terms with zero IDF don’t get suppressed entirely. This process is known as IDF smoothing.
The TF-IDF is obtained by
TF-IDF=TF*IDF
Does this seem too complicated? Don’t worry, this can be attained with just a few lines of code and you don’t even have to remember these scary formulas.
We can use the TfidfVectorizer() function from the Sk-learn library to easily implement the above BoW(Tf-IDF), model.
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
sentence_1="This is a good job.I will not miss it for anything"
sentence_2="This is not good at all"
#without smooth IDF
print("Without Smoothing:")
#define tf-idf
tf_idf_vec = TfidfVectorizer(use_idf=True,
smooth_idf=False,
ngram_range=(1,1),stop_words='english') # to use only bigrams ngram_range=(2,2)
#transform
tf_idf_data = tf_idf_vec.fit_transform([sentence_1,sentence_2])
#create dataframe
tf_idf_dataframe=pd.DataFrame(tf_idf_data.toarray(),columns=tf_idf_vec.get_feature_names())
print(tf_idf_dataframe)
print("\n")
#with smooth
tf_idf_vec_smooth = TfidfVectorizer(use_idf=True,
smooth_idf=True,
ngram_range=(1,1),stop_words='english')
tf_idf_data_smooth = tf_idf_vec_smooth.fit_transform([sentence_1,sentence_2])
print("With Smoothing:")
tf_idf_dataframe_smooth=pd.DataFrame(tf_idf_data_smooth.toarray(),columns=tf_idf_vec_smooth.get_feature_names())
print(tf_idf_dataframe_smooth)
Although Bag-of-Words is quite efficient and easy to implement, still there are some disadvantages to this technique which are given below:
Original article source at https://www.mygreatlearning.com
#bagofwords #python #datascience #nlp
1654600026
ts-transform-json
Inline specific values from a JSON file or the whole JSON blob. For example:
import {version} from 'package.json'
// becomes
var version = '1.0.5'
// OR
import * as packageJson from 'package.json'
// becomes
var packageJson = {"version": "1.0.5", dependencies: {}}
First of all, you need some level of familiarity with the TypeScript Compiler API.
compile.ts
& tests should have examples of how this works. The available options are:
Whether you're running this transformer in declaration files (typically specified in afterDeclarations
instead of after
in transformer list). This flag will inline types instead of actual value.
Author: longlho
Source Code: https://github.com/longlho/ts-transform-json
License: MIT license
1652736240
React-Rails
React-Rails is a flexible tool to use React with Rails. The benefits:
Alternatively, get started with Sprockets
Webpacker provides modern JS tooling for Rails. Here are the listed steps for integrating Webpacker and Rails-React with Rails:
1) Create a new Rails app:
$ rails new my-app
$ cd my-app
2) Add react-rails
to your Gemfile:
gem 'react-rails'
Note: On rails versions < 6.0, You need to add gem 'webpacker'
to your Gemfile in step 2 above.
3) Now run the installers:
Rails 6.x and 5.x:
$ bundle install
$ rails webpacker:install # OR (on rails version < 5.0) rake webpacker:install
$ rails webpacker:install:react # OR (on rails version < 5.0) rake webpacker:install:react
$ rails generate react:install
This gives you:
app/javascript/components/
directory for your React componentsReactRailsUJS
setup in app/javascript/packs/application.js
app/javascript/packs/server_rendering.js
for server-side renderingNote: On rails versions < 6.0, link the JavaScript pack in Rails view using javascript_pack_tag
helper:
<!-- application.html.erb in Head tag below turbolinks -->
<%= javascript_pack_tag 'application' %>
4) Generate your first component:
$ rails g react:component HelloWorld greeting:string
5) You can also generate your component in a subdirectory:
$ rails g react:component my_subdirectory/HelloWorld greeting:string
Note: Your component is added to app/javascript/components/
by default.
Note: If your component is in a subdirectory you will append the directory path to your erb component call.
Example:
<%= react_component("my_subdirectory/HelloWorld", { greeting: "Hello from react-rails." }) %>
<!-- erb: paste this in view -->
<%= react_component("HelloWorld", { greeting: "Hello from react-rails." }) %>
7) Lets Start the app:
$ rails s
output: greeting: Hello from react-rails", inspect webpage in your browser too see change in tag props.
The component name tells react-rails
where to load the component. For example:
react_component call | component require |
---|---|
react_component("Item") | require("Item") |
react_component("items/index") | require("items/index") |
react_component("items.Index") | require("items").Index |
react_component("items.Index.Header") | require("items").Index.Header |
This way, you can access top-level, default, or named exports.
The require.context
inserted into packs/application.js
is used to load components. If you want to load components from a different directory, override it by calling ReactRailsUJS.useContext
:
var myCustomContext = require.context("custom_components", true)
var ReactRailsUJS = require("react_ujs")
// use `custom_components/` for <%= react_component(...) %> calls
ReactRailsUJS.useContext(myCustomContext)
If require
fails to find your component, ReactRailsUJS
falls back to the global namespace, described in Use with Asset Pipeline.
React-Rails supports plenty of file extensions such as: .js, .jsx.js, .js.jsx, .es6.js, .coffee, etcetera! Sometimes this will cause a stumble when searching for filenames.
Component File Name | react_component call |
---|---|
app/javascript/components/samplecomponent.js | react_component("samplecomponent") |
app/javascript/components/sample_component.js | react_component("sample_component") |
app/javascript/components/SampleComponent.js | react_component("SampleComponent") |
app/javascript/components/SampleComponent.js.jsx | Has to be renamed to SampleComponent.jsx, then use react_component("SampleComponent") |
If you want to use React-Rails with Typescript, simply run the installer and add @types:
$ bundle exec rails webpacker:install:typescript
$ yarn add @types/react @types/react-dom
Doing this will allow React-Rails to support the .tsx extension. Additionally, it is recommended to add ts
and tsx
to the server_renderer_extensions
in your application configuration:
config.react.server_renderer_extensions = ["jsx", "js", "tsx", "ts"]
You can use assert_react_component
to test component render:
app/views/welcome/index.html.erb
<%= react_component("HelloWorld", { greeting: "Hello from react-rails.", info: { name: "react-rails" } }, { class: "hello-world" }) %>
class WelcomeControllerTest < ActionDispatch::IntegrationTest
test 'assert_react_component' do
get "/welcome"
assert_equal 200, response.status
# assert rendered react component and check the props
assert_react_component "HelloWorld" do |props|
assert_equal "Hello from react-rails.", props[:greeting]
assert_equal "react-rails", props[:info][:name]
assert_select "[class=?]", "hello-world"
end
# or just assert component rendered
assert_react_component "HelloWorld"
end
end
react-rails
provides a pre-bundled React.js & a UJS driver to the Rails asset pipeline. Get started by adding the react-rails
gem:
gem 'react-rails'
And then install the react generator:
$ rails g react:install
Then restart your development server.
This will:
//= require
s to application.js
components/
directory for React componentsserver_rendering.js
for server-side renderingNow, you can create React components in .jsx
files:
// app/assets/javascripts/components/post.jsx
window.Post = createReactClass({
render: function() {
return <h1>{this.props.title}</h1>
}
})
// or, equivalent:
class Post extends React.Component {
render() {
return <h1>{this.props.title}</h1>
}
}
Then, you can render those components in views:
<%= react_component("Post", {title: "Hello World"}) %>
Components must be accessible from the top level, but they may be namespaced, for example:
<%= react_component("Comments.NewForm", {post_id: @post.id}) %>
<!-- looks for `window.Comments.NewForm` -->
react-rails
uses a transformer class to transform JSX in the asset pipeline. The transformer is initialized once, at boot. You can provide a custom transformer to config.react.jsx_transformer_class
. The transformer must implement:
#initialize(options)
, where options is the value passed to config.react.jsx_transform_options
#transform(code_string)
to return a string of transformed codereact-rails
provides two transformers, React::JSX::BabelTransformer
(which uses ruby-babel-transpiler) and React::JSX::JSXTransformer
(which uses the deprecated JSXTransformer.js
).
To supply additional transform plugins to your JSX Transformer, assign them to config.react.jsx_transform_options
react-rails
uses the Babel version of the babel-source
gem.
For example, to use babel-plugin-transform-class-properties
:
config.react.jsx_transform_options = {
optional: ['es7.classProperties']
}
//= require react
brings React
into your project.
By default, React's [development version] is provided to Rails.env.development
. You can override the React build with a config:
# Here are the defaults:
# config/environments/development.rb
MyApp::Application.configure do
config.react.variant = :development
end
# config/environments/production.rb
MyApp::Application.configure do
config.react.variant = :production
end
Be sure to restart your Rails server after changing these files. See VERSIONS.md to learn which version of React.js is included with your react-rails
version. In some edge cases you may need to bust the sprockets cache with rake tmp:clear
react-rails
includes a view helper and an unobtrusive JavaScript driver which work together to put React components on the page.
The view helper (react_component
) puts a div
on the page with the requested component class & props. For example:
<%= react_component('HelloMessage', name: 'John') %>
<!-- becomes: -->
<div data-react-class="HelloMessage" data-react-props="{"name":"John"}"></div>
On page load, the react_ujs
driver will scan the page and mount components using data-react-class
and data-react-props
.
The view helper's signature is:
react_component(component_class_name, props={}, html_options={})
component_class_name
is a string which identifies a component. See getConstructor for details.props
is either:#to_json
; orhtml_options
may include:tag:
to use an element other than a div
to embed data-react-class
and data-react-props
.prerender: true
to render the component on the server.camelize_props
to transform a props hash**other
Any other arguments (eg class:
, id:
) are passed through to content_tag
.react-rails
uses a "helper implementation" class to generate the output of the react_component
helper. The helper is initialized once per request and used for each react_component
call during that request. You can provide a custom helper class to config.react.view_helper_implementation
. The class must implement:
#react_component(name, props = {}, options = {}, &block)
to return a string to inject into the Rails view#setup(controller_instance)
, called when the helper is initialized at the start of the request#teardown(controller_instance)
, called at the end of the requestreact-rails
provides one implementation, React::Rails::ComponentMount
.
react-rails
's JavaScript is available as "react_ujs"
in the asset pipeline or from NPM. It attaches itself to the window as ReactRailsUJS
.
Usually, react-rails
mounts & unmounts components automatically as described in Event Handling below.
You can also mount & unmount components from <%= react_component(...) %>
tags using UJS:
// Mount all components on the page:
ReactRailsUJS.mountComponents()
// Mount components within a selector:
ReactRailsUJS.mountComponents(".my-class")
// Mount components within a specific node:
ReactRailsUJS.mountComponents(specificDOMnode)
// Unmounting works the same way:
ReactRailsUJS.unmountComponents()
ReactRailsUJS.unmountComponents(".my-class")
ReactRailsUJS.unmountComponents(specificDOMnode)
You can use this when the DOM is modified by AJAX calls or modal windows.
ReactRailsUJS
checks for various libraries to support their page change events:
Turbolinks
pjax
jQuery
ReactRailsUJS
will automatically mount components on <%= react_component(...) %>
tags and unmount them when appropriate.
If you need to re-detect events, you can call detectEvents
:
// Remove previous event handlers and add new ones:
ReactRailsUJS.detectEvents()
For example, if Turbolinks
is loaded after ReactRailsUJS
, you'll need to call this again. This function removes previous handlers before adding new ones, so it's safe to call as often as needed.
If Turbolinks
is import
ed via Webpacker (and thus not available globally), ReactRailsUJS
will be unable to locate it. To fix this, you can temporarily add it to the global namespace:
// Order is particular. First start Turbolinks:
Turbolinks.start();
// Add Turbolinks to the global namespace:
window.Turbolinks = Turbolinks;
// Remove previous event handlers and add new ones:
ReactRailsUJS.detectEvents();
// (Optional) Clean up global namespace:
delete window.Turbolinks;
getConstructor
Components are loaded with ReactRailsUJS.getConstructor(className)
. This function has two built-in implementations:
className
in the global namespace.require
s files and accesses named exports, as described in Get started with Webpacker.You can override this function to customize the mapping of name-to-constructor. Server-side rendering also uses this function.
You can render React components inside your Rails server with prerender: true
:
<%= react_component('HelloMessage', {name: 'John'}, {prerender: true}) %>
<!-- becomes: -->
<div data-react-class="HelloMessage" data-react-props="{"name":"John"}">
<h1>Hello, John!</h1>
</div>
(It will also be mounted by the UJS on page load.)
Server rendering is powered by ExecJS
and subject to some requirements:
react-rails
must load your code. By convention, it uses server_rendering.js
, which was created by the install task. This file must include your components and their dependencies (eg, Underscore.js).document
or window
. Prerender processes don't have access to document
or window
, so jQuery and some other libs won't work in this environment :(ExecJS
supports many backends. CRuby users will get the best performance from mini_racer
.
Server renderers are stored in a pool and reused between requests. Threaded Rubies (eg jRuby) may see a benefit to increasing the pool size beyond the default 0
.
These are the default configurations:
# config/application.rb
# These are the defaults if you don't specify any yourself
module MyApp
class Application < Rails::Application
# Settings for the pool of renderers:
config.react.server_renderer_pool_size ||= 1 # ExecJS doesn't allow more than one on MRI
config.react.server_renderer_timeout ||= 20 # seconds
config.react.server_renderer = React::ServerRendering::BundleRenderer
config.react.server_renderer_options = {
files: ["server_rendering.js"], # files to load for prerendering
replay_console: true, # if true, console.* will be replayed client-side
}
# Changing files matching these dirs/exts will cause the server renderer to reload:
config.react.server_renderer_extensions = ["jsx", "js"]
config.react.server_renderer_directories = ["/app/assets/javascripts", "/app/javascript/"]
end
end
Some of ExecJS's backends are stateful (eg, mini_racer, therubyracer). This means that any side-effects of a prerender will affect later renders with that renderer.
To manage state, you have a couple options:
#before_render
/ #after_render
hooks as described belowper_request_react_rails_prerenderer
to manage state for a whole controller action.To check out a renderer for the duration of a controller action, call the per_request_react_rails_prerenderer
helper in the controller class:
class PagesController < ApplicationController
# Use the same React server renderer for the entire request:
per_request_react_rails_prerenderer
end
Then, you can access the ExecJS context directly with react_rails_prerenderer.context
:
def show
react_rails_prerenderer # => #<React::ServerRendering::BundleRenderer>
react_rails_prerenderer.context # => #<ExecJS::Context>
# Execute arbitrary JavaScript code
# `self` is the global context
react_rails_prerenderer.context.exec("self.Store.setup()")
render :show
react_rails_prerenderer.context.exec("self.Store.teardown()")
end
react_rails_prerenderer
may also be accessed in before- or after-actions.
react-rails
depends on a renderer class for rendering components on the server. You can provide a custom renderer class to config.react.server_renderer
. The class must implement:
#initialize(options={})
, which accepts the hash from config.react.server_renderer_options
#render(component_name, props, prerender_options)
to return a string of HTMLreact-rails
provides two renderer classes: React::ServerRendering::ExecJSRenderer
and React::ServerRendering::BundleRenderer
.
ExecJSRenderer
offers two other points for extension:
#before_render(component_name, props, prerender_options)
to return a string of JavaScript to execute before calling React.render
#after_render(component_name, props, prerender_options)
to return a string of JavaScript to execute after calling React.render
Any subclass of ExecJSRenderer
may use those hooks (for example, BundleRenderer
uses them to handle console.*
on the server).
Components can also be server-rendered directly from a controller action with the custom component
renderer. For example:
class TodoController < ApplicationController
def index
@todos = Todo.all
render component: 'TodoList', props: { todos: @todos }, tag: 'span', class: 'todo'
end
end
You can also provide the "usual" render
arguments: content_type
, layout
, location
and status
. By default, your current layout will be used and the component, rather than a view, will be rendered in place of yield
. Custom data-* attributes can be passed like data: {remote: true}
.
Prerendering is set to true
by default, but can be turned off with prerender: false
.
You can generate a new component file with:
rails g react:component ComponentName prop1:type prop2:type ...
For example,
rails g react:component Post title:string published:bool published_by:instanceOf{Person}
would generate:
var Post = createReactClass({
propTypes: {
title: PropTypes.string,
published: PropTypes.bool,
publishedBy: PropTypes.instanceOf(Person)
},
render: function() {
return (
<React.Fragment>
Title: {this.props.title}
Published: {this.props.published}
Published By: {this.props.publishedBy}
</React.Fragment>
);
}
});
The generator also accepts options:
--es6
: use class ComponentName extends React.Component
--coffee
: use CoffeeScriptAccepted PropTypes are:
any
, array
, bool
, element
, func
, number
, object
, node
, shape
, string
instanceOf
takes an optional class name in the form of instanceOf{className}
.oneOf
behaves like an enum, and takes an optional list of strings in the form of 'name:oneOf{one,two,three}'
.oneOfType
takes an optional list of react and custom types in the form of 'model:oneOfType{string,number,OtherType}'
.Note that the arguments for oneOf
and oneOfType
must be enclosed in single quotes to prevent your terminal from expanding them into an argument list.
If you use Jbuilder to pass a JSON string to react_component
, make sure your JSON is a stringified hash, not an array. This is not the Rails default -- you should add the root node yourself. For example:
# BAD: returns a stringified array
json.array!(@messages) do |message|
json.extract! message, :id, :name
json.url message_url(message, format: :json)
end
# GOOD: returns a stringified hash
json.messages(@messages) do |message|
json.extract! message, :id, :name
json.url message_url(message, format: :json)
end
You can configure camelize_props
option:
MyApp::Application.configure do
config.react.camelize_props = true # default false
end
Now, Ruby hashes given to react_component(...)
as props will have their keys transformed from underscore- to camel-case, for example:
{ all_todos: @todos, current_status: @status }
# becomes:
{ "allTodos" => @todos, "currentStatus" => @status }
You can also specify this option in react_component
:
<%= react_component('HelloMessage', {name: 'John'}, {camelize_props: true}) %>
Keep your react_ujs
up to date, yarn upgrade
React-Rails 2.4.x uses React 16+ which no longer has React Addons. Therefore the pre-bundled version of react no longer has an addons version, if you need addons still, there is the 2.3.1+ version of the gem that still has addons.
If you need to make changes in your components for the prebundled react, see the migration docs here:
For the vast majority of cases this will get you most of the migration:
React.Prop
-> Prop
import PropTypes from 'prop-types'
(Webpacker only)bundle exec rails webpacker:install:react
to update npm packages (Webpacker only)public/packs/manifest.json. Possible causes:
1. You want to set webpacker.yml value of compile to true for your environment
unless you are using the `webpack -w` or the webpack-dev-server.
2. webpack has not yet re-run to reflect updates.
3. You have misconfigured Webpacker's config/webpacker.yml file.
4. Your webpack configuration is not creating a manifest.
or
yarn: error: no such option: --dev
ERROR: [Errno 2] No such file or directory: 'add'
Fix: Try updating yarn package.
sudo apt remove cmdtest
sudo apt remove yarn
curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
sudo apt-get update && sudo apt-get install yarn
yarn install
ExecJS::ProgramError (identifier 'Set' undefined):
(execjs):1
If you see any variation of this issue, see Using TheRubyRacer
TheRubyRacer hasn't updated LibV8 (The library that powers Node.js) from v3 in 2 years, any new features are unlikely to work.
LibV8 itself is already beyond version 7 therefore many serverside issues are caused by old JS engines and fixed by using an up to date one such as MiniRacer or TheRubyRhino on JRuby.
Hot Module Replacement is possible with this gem as it does just pass through to Webpacker. Please open an issue to let us know tips and tricks for it to add to the wiki.
Sample repo that shows HMR working with react-rails
: https://github.com/edelgado/react-rails-hmr
One caveat is that currently you cannot Server-Side Render along with HMR.
react-rails
.react-rails
.🎉 Thanks for taking the time to contribute! 🎉
With 5 Million+ downloads of the react-rails Gem and another 2 Million+ downloads of react_ujs on NPM, you're helping the biggest React + Rails community!
By contributing to React-Rails, you agree to abide by the code of conduct.
You can always help by submitting patches or triaging issues, even offering reproduction steps to issues is incredibly helpful!
Please see our Contribution guide for more info.
A source code example utilizing React-Rails: https://github.com/BookOfGreg/react-rails-example-app
Author: Reactjs
Source Code: https://github.com/reactjs/react-rails
License: Apache-2.0 license
1626598080
Line detection with Hough Transform has been presented.
(1) Starting with a grayscale input image (we may use binary image as well), where the line or edge pixels have been assigned a value of 255 (or a value of 1 for binary image),then, a 2D Hough Accumulator array has been created.
(2) Loop through the input image to fill the Hough Accumulator array.
(3) Finally, display the original input image and Hough Transform result.
Source Code Link
https://sigmoidtek.com/blogs/tutorials/line-detect-ht
#machine #learning #hough #transform
1624095120
CSS Animations can add some polish and shine to website. They can also be useful to provide users with some visual feedback about the user interface. Although there are some concerns about using CSS animation for critical aspects of a website — especially where use of CSS animation compromises accessibility — if used carefully, they can enhance a website in some very appealing ways.
To make use of basic CSS animations, it’s important to understand the concepts of transitions, transforms, and animation in the CSS context. These concepts support the creation of simple animations, like the gradual change of the color for a button, to complex animations, like moving an object on the screen and simultaneously changing it’s shape and opacity.
Transitions will apply a controlled change from one CSS property to another. CSS developers can control aspects of the transition including the property, the duration, timing function, and delay of a transition.
The basic structure of a transition looks like the following:
div {
transition: <property> <duration> <timing-function> <delay>;
}
Each of these aspects of the transition can be defined individually.
#css #transform #css3 #animation
1602925200
In this article I will show you how to setup Data loaders and Transformers in Pytorch, You need to import below for the same exercise
import torchvision
import torch
import os
import matplotlib.pyplot as plt
import numpy as np
Image Resize (256,256) or Any other size
Convert to Pytorch Tensors
Normalize the Image by calling torchvision.transform.Normalize
transform_img = torchvision.transforms.Compose([torchvision.transforms.Resize((256, 256)),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.485],std=[0.229])])
Set some Directory Path, download = True will download the data into the directory specified, transform should be set to transform defined above
dir_path= ‘C:\Users\Asus\pytorch-basics-part2’
dataset_mnist_train = torchvision.datasets.MNIST(dir_path, train=True, transform=transform_img,
target_transform=None, download=True)
You can index this Dataset, dataset_mnist_train[i] will contain the Tuple of (Image, Label).
#pytorch #transform #dataload #deep-learning
1600140758
Pandas is an amazing library that contains extensive built-in functions for manipulating data. Among them, transform()
is super useful when you are looking to manipulate rows or columns.
In this article, we will cover the following most frequently used Pandas transform()
features:
groupby()
resultsPlease check out my Github repo for the source code
Let’s take a look at pd.transform(**func**, **axis=0**)
_func_
is to specify the function to be used for manipulating data. It can be a function, a string function name, a list of functions, or a dictionary of axis labels -> functions_func_
is applied to. 0
for applying the _func_
to each column and 1
for applying the _func_
to each row.Let’s see how transform()
works with the help of some examples.
#pandas #transform #machine-learning #data-science #python