Connor Mills

Connor Mills

1548460077

How do I validate to see if a text field has the value that I typed in?

I've very new to Python and Selenium and I think I need to use the Assert command to verify a text field has what I typed in via Selenium.

I've searched for an hour and can't find the answer.

Here is my code:

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains as AC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select, WebDriverWait

driver = webdriver.Ie()

driver.get(“https://bie.farmersinsurance.com/”)

WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, “/html/body/div/form/table/tbody/tr[5]/td/table/tbody/tr/td/table/tbody/tr[2]/td/table/tbody/tr[1]/td[1]/input”))).send_keys(“tess893”)

element = driver.find_element_by_xpath(“/html/body/div/form/table/tbody/tr[5]/td/table/tbody/tr/td/table/tbody/tr[2]/td/table/tbody/tr[1]/td[1]/input”)
assert element.text == “tess893”

Here are my results:

Traceback (most recent call last):
File “c:\Users\uswarv41.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\ptvsd_launcher.py”, line 45, in <module>
main(ptvsdArgs)
File “c:\Users\uswarv41.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_main_.py”, line 265, in main
wait=args.wait)
File “c:\Users\uswarv41.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_main_.py”, line 256, in handle_args
run_main(addr, name, kind, *extra, **kwargs)
File “c:\Users\uswarv41.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_local.py”, line 52, in run_main
runner(addr, name, kind == ‘module’, *extra, **kwargs)
File “c:\Users\uswarv41.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd\runner.py”, line 32, in run
set_trace=False)
File “c:\Users\uswarv41.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_vendored\pydevd\pydevd.py”, line 1283, in run
return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
File “c:\Users\uswarv41.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_vendored\pydevd\pydevd.py”, line 1290, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File “c:\Users\uswarv41.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_vendored\pydevd_pydev_imps_pydev_execfile.py”, line 25, in execfile
exec(compile(contents+“\n”, file, ‘exec’), glob, loc)
File “c:_TMP\Test2.py”, line 18, in <module>
assert element.text == “tess893”
AssertionError

I don’t know why am I getting an AssertionError. Based on what I read, it should not throw any errors.

#python #selenium

What is GEEK

Buddha Community

Valerio Tana

1548664027

.text is not the property you are looking for. It returns the visible text between the elements tags

You want the value of the input element. To get that, i think you’d use element.get_attribute(‘value’)

The assert statement below should work

assert element.get_attribute('value') == "tess893"

Navigating Between DOM Nodes in JavaScript

In the previous chapters you've learnt how to select individual elements on a web page. But there are many occasions where you need to access a child, parent or ancestor element. See the JavaScript DOM nodes chapter to understand the logical relationships between the nodes in a DOM tree.

DOM node provides several properties and methods that allow you to navigate or traverse through the tree structure of the DOM and make changes very easily. In the following section we will learn how to navigate up, down, and sideways in the DOM tree using JavaScript.

Accessing the Child Nodes

You can use the firstChild and lastChild properties of the DOM node to access the first and last direct child node of a node, respectively. If the node doesn't have any child element, it returns null.

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");
console.log(main.firstChild.nodeName); // Prints: #text

var hint = document.getElementById("hint");
console.log(hint.firstChild.nodeName); // Prints: SPAN
</script>

Note: The nodeName is a read-only property that returns the name of the current node as a string. For example, it returns the tag name for element node, #text for text node, #comment for comment node, #document for document node, and so on.

If you notice the above example, the nodeName of the first-child node of the main DIV element returns #text instead of H1. Because, whitespace such as spaces, tabs, newlines, etc. are valid characters and they form #text nodes and become a part of the DOM tree. Therefore, since the <div> tag contains a newline before the <h1> tag, so it will create a #text node.

To avoid the issue with firstChild and lastChild returning #text or #comment nodes, you could alternatively use the firstElementChild and lastElementChild properties to return only the first and last element node, respectively. But, it will not work in IE 9 and earlier.

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");
alert(main.firstElementChild.nodeName); // Outputs: H1
main.firstElementChild.style.color = "red";

var hint = document.getElementById("hint");
alert(hint.firstElementChild.nodeName); // Outputs: SPAN
hint.firstElementChild.style.color = "blue";
</script>

Similarly, you can use the childNodes property to access all child nodes of a given element, where the first child node is assigned index 0. Here's an example:

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");

// First check that the element has child nodes 
if(main.hasChildNodes()) {
    var nodes = main.childNodes;
    
    // Loop through node list and display node name
    for(var i = 0; i < nodes.length; i++) {
        alert(nodes[i].nodeName);
    }
}
</script>

The childNodes returns all child nodes, including non-element nodes like text and comment nodes. To get a collection of only elements, use children property instead.

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");

// First check that the element has child nodes 
if(main.hasChildNodes()) {
    var nodes = main.children;
    
    // Loop through node list and display node name
    for(var i = 0; i < nodes.length; i++) {
        alert(nodes[i].nodeName);
    }
}
</script>

#javascript 

Controller Extra Bundle for Symfony2

ControllerExtra for Symfony2

This bundle provides a collection of annotations for Symfony2 Controllers, designed to streamline the creation of certain objects and enable smaller and more concise actions.

Reference

By default, all annotations are loaded, but any individual annotation can be completely disabled by setting to false active parameter.

Default values are:

controller_extra:
    resolver_priority: -8
    request: current
    paginator:
        active: true
        default_name: paginator
        default_page: 1
        default_limit_per_page: 10
    entity:
        active: true
        default_name: entity
        default_persist: true
        default_mapping_fallback: false
        default_factory_method: create
        default_factory_mapping: true
    form:
        active: true
        default_name: form
    object_manager:
        active: true
        default_name: form
    flush:
        active: true
        default_manager: default
    json_response:
        active: true
        default_status: 200
        default_headers: []
    log:
        active: true
        default_level: info
        default_execute: pre

ResolverEventListener is subscribed to kernel.controller event with priority -8. This element can be configured and customized with resolver_priority config value. If you need to get ParamConverter entities, make sure that this value is lower than 0. The reason is that this listener must be executed always after ParamConverter one.

Entity provider

In some annotations, you can define an entity by several ways. This chapter is about how you can define them.

By namespace

You can define an entity using its namespace. A simple new new() be performed.

/**
 * Simple controller method
 *
 * @SomeAnnotation(
 *      class = "Mmoreram\CustomBundle\Entity\MyEntity",
 * )
 */
public function indexAction()
{
}

By doctrine shortcut

You can define an entity using Doctrine shortcut notations. With this format you should ensure that your Entities follow Symfony Bundle standards and your entities are placed under Entity/ folder.

/**
 * Simple controller method
 *
 * @SomeAnnotation(
 *      class = "MmoreramCustomBundle:MyEntity",
 * )
 */
public function indexAction()
{
}

By parameter

You can define an entity using a simple config parameter. Some projects use parameters to define all entity namespaces (To allow overriding). If you define the entity with a parameter, this bundle will try to instance it with a simple new() accessing directly to the container ParametersBag.

parameters:

    #
    # Entities
    #
    my.bundle.entity.myentity: Mmoreram\CustomBundle\Entity\MyEntity
/**
 * Simple controller method
 *
 * @SomeAnnotation(
 *      class = "my.bundle.entity.myentity",
 * )
 */
public function indexAction()
{
}

Controller annotations

This bundle provide a reduced but useful set of annotations for your controller actions.

@CreatePaginator

Creates a Doctrine Paginator object, given a request and a configuration. This annotation just injects into de controller a new Doctrine\ORM\Tools\Pagination\Pagination instance ready to be iterated.

You can enable/disable this bundle by overriding active flag in configuration file config.yml

controller_extra:
    pagination:
        active: true

By default, if name option is not set, the generated object will be placed in a parameter named $paginator. This behaviour can be configured using default_name in configuration.

This annotation can be configured with these sections

Paginator Entity

To create a new Pagination object you need to refer to an existing Entity. You can check all available formats you can define it just reading the Entity Provider section.

<?php

use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 * )
 */
public function indexAction(Paginator $paginator)
{
}

Paginator page

You need to specify Paginator annotation the page to fetch. By default, if none is specified, this bundle will use the default one defined in configuration. You can override in config.yml

controller_extra:
    pagination:
        default_page: 1

You can refer to an existing Request attribute using ~value~ format, to any $_GET element by using format ?field? or to any $_POST by using format #field#

You can choose between Master Request or Current Request accessing to its attributes, by configuring the request value of the configuration.

use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * This Controller matches pattern /myroute/paginate/{foo}
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      page = "~foo~"
 * )
 */
public function indexAction(Paginator $paginator)
{
}

or you can hardcode the page to use.

use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * This Controller matches pattern /myroute/paginate/
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      page = 1
 * )
 */
public function indexAction(Paginator $paginator)
{
}

Paginator limit

You need to specify Paginator annotation the limit to fetch. By default, if none is specified, this bundle will use the default one defined in configuration. You can override in config.yml

controller_extra:
    pagination:
        default_limit_per_page: 10

You can refer to an existing Request attribute using ~value~ format, to any $_GET element by using format ?field? or to any $_POST by using format #field#

use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * This Controller matches pattern /myroute/paginate/{foo}/{limit}
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      page = "~foo~",
 *      limit = "~limit~"
 * )
 */
public function indexAction(Paginator $paginator)
{
}

or you can hardcode the page to use.

use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * This Controller matches pattern /myroute/paginate/
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      page = 1,
 *      limit = 10
 * )
 */
public function indexAction(Paginator $paginator)
{
}

Paginator OrderBy

You can order your Pagination just defining the fields you want to orderBy and the desired direction. The orderBy section must be defined as an array of arrays, and each array should contain these positions:

  • First position: Entity alias (Principal object is set as x)
  • Second position: Entity field
  • Third position: Direction
  • Fourth position: Custom direction map (optional)
use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      orderBy = {
 *          {"x", "createdAt", "ASC"},
 *          {"x", "updatedAt", "DESC"},
 *          {"x", "id", 1, {
 *              0 => "ASC",
 *              1 => "DESC",
 *          }},
 *      }
 * )
 */
public function indexAction(Paginator $paginator)
{
}

With the third and fourth value you can define a map where to match your own direction nomenclature with DQL one. DQL nomenclature just accept ASC for Ascendant and DESC for Descendant.

This is very useful when you need to match a url format with the DQL one. You can refer to an existing Request attribute using ~value~ format, to any $_GET element by using format ?field? or to any $_POST by using format #field#

use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * This Controller matches pattern /myroute/paginate/order/{field}/{direction}
 *
 * For example, some matchings...
 *
 * /myroute/paginate/order/id/1 -> ORDER BY id DESC
 * /myroute/paginate/order/enabled/0 - ORDER BY enabled ASC
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      orderBy = {
 *          {"x", "createdAt", "ASC"},
 *          {"x", "updatedAt", "DESC"},
 *          {"x", "~field~", ~direction~, {
 *              0 => "ASC",
 *              1 => "DESC",
 *          }},
 *      }
 * )
 */
public function indexAction(Paginator $paginator)
{
}

The order of the definitions will alter the order of the DQL query.

Paginator Wheres

You can define some where statements in your Paginator. The wheres section must be defined as an array of arrays, and each array should contain these positions:

  • First position: Entity alias (Principal object is set as x)
  • Second position: Entity field
  • Third position: Operator =, <=, >, LIKE...
  • Fourth position: Value to compare with
  • Fifth position: Is a filter. By default, false
use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      wheres = {
 *          {"x", "enabled", "=", true},
 *          {"x", "age", ">", 18},
 *          {"x", "name", "LIKE", "Eferv%"},
 *      }
 * )
 */
public function indexAction(Paginator $paginator)
{
}

You can refer to an existing Request attribute using ~value~ format, to any $_GET element by using format ?field? or to any $_POST by using format #field#

use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * This Controller matches pattern /myroute/{field}
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      wheres = {
 *          {"x", "name", "LIKE", "~field~"},
 *      }
 * )
 */
public function indexAction(Paginator $paginator)
{
}

You can use as well this feature for optional filtering by setting the last position to true. In that case, if the filter value is not found, such line will be ignored.

use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * This Controller matches pattern /myroute?query=name%
 * This Controller matches pattern /myroute as well
 *
 * In both cases this will work. In the first case we will apply the where line
 * in the paginator. In the second case, we wont.
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      wheres = {
 *          {"x", "name", "LIKE", "?query?", true},
 *      }
 * )
 */
public function indexAction(Paginator $paginator)
{
}

Paginator Not Nulls

You can also define some fields to not null. Is same as wheres section, but specific for NULL assignments. The notNulls section must be defined as an array of arrays, and each array should contain these positions:

  • First position: Object (Principal object is set as x)
  • Second position: Field
use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      notNulls = {
 *          {"x", "enabled"},
 *          {"x", "deleted"},
 *      }
 * )
 */
public function indexAction(Paginator $paginator)
{
}

Paginator Left Join

You can do some left joins in this section. The leftJoins section must be defined as an array of array, where each array can have these fields:

  • First position: Entity alias (Principal object is set as x)
  • Second position: Entity relation (Address)
  • Third position: Relation identifier (a)
  • Fourth position: If true, this relation is added in select group. Otherwise, wont be loaded until its request (optional)
use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      leftJoins = {
 *          {"x", "User", "u", true},
 *          {"x", "Address", "a", true},
 *          {"x", "Cart", "c"},
 *      }
 * )
 */
public function indexAction(Paginator $paginator)
{
}

Paginator Inner Join

You can do some inner joins in this section. The innerJoins section must be defined as an array of array, where each array can have these fields:

  • First position: Entity alias (x)
  • Second position: Entity relation (Address)
  • Third position: Relation identifier (a)
  • Fourth position: If true, this relation is added in select group. Otherwise, wont be loaded until its request (optional)
use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      innerJoins = {
 *          {"x", "User", "u", true},
 *          {"x", "Address", "a", true},
 *          {"x", "Cart", "c"},
 *      }
 * )
 */
public function indexAction(Paginator $paginator)
{
}

Paginator Attributes

A nice feature of this annotation is that you can also inject into your controller a Mmoreram\ControllerExtraBundle\ValueObject\PaginatorAttributes instance with some interesting information about your pagination.

  • currentPage : Current page fetched
  • totalElements : Total elements given your criteria. If none criteria is defined in your configuration, this value will show all elements of a certain entity.
  • totalPages : Total pages you can fetch given a criteria.
  • limitPerPage: Maximum number of elements in each page.

To inject this object you need to define the "attributes" annotation field with the method parameter name.

use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;
use Mmoreram\ControllerExtraBundle\ValueObject\PaginatorAttributes;

/**
 * Simple controller method
 *
 * This Controller matches pattern /myroute/paginate/
 *
 * @CreatePaginator(
 *      attributes = "paginatorAttributes",
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      page = 1,
 *      limit = 10
 * )
 */
public function indexAction(
    Paginator $paginator,
    PaginatorAttributes $paginatorAttributes
)
{
    $currentPage = $paginatorAttributes->getCurrentPage();
    $totalElements = $paginatorAttributes->getTotalElements();
    $totalPages = $paginatorAttributes->getTotalPages();
    $limitPerPage = $paginatorAttributes->getLimitPerPage();

}

Paginator Example

This is a completed example and its DQL resolution

use Doctrine\ORM\Tools\Pagination\Pagination;
use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;

/**
 * Simple controller method
 *
 * This Controller matches pattern /paginate/nb/{limit}/{page}
 *
 * Where:
 *
 * * limit = 10
 * * page = 1
 *
 * @CreatePaginator(
 *      entityNamespace = "ControllerExtraBundle:Fake",
 *      page = "~page~",
 *      limit = "~limit~",
 *      orderBy = {
 *          { "x", "createdAt", "ASC" },
 *          { "x", "updatedAt", "DESC" },
 *          { "x", "id", "0", {
 *              "1" = "ASC",
 *              "2" = "DESC",
 *          }}
 *      },
 *      wheres = {
 *          { "x", "enabled" , "=", true }
 *      },
 *      leftJoins = {
 *          { "x", "relation", "r" },
 *          { "x", "relation2", "r2" },
 *          { "x", "relation5", "r5", true },
 *      },
 *      innerJoins = {
 *          { "x", "relation3", "r3" },
 *          { "x", "relation4", "r4", true },
 *      },
 *      notNulls = {
 *          {"x", "address1"},
 *          {"x", "address2"},
 *      }
 * )
 */
public function indexAction(Paginator $paginator)
{
}

The DQL generated by this annotation is

    SELECT x, r4, r5
    FROM Mmoreram\\ControllerExtraBundle\\Tests\\FakeBundle\\Entity\\Fake x

    INNER JOIN x.relation3 r3
    INNER JOIN x.relation4 r4

    LEFT JOIN x.relation r
    LEFT JOIN x.relation2 r2
    LEFT JOIN x.relation5 r5

    WHERE enabled = ?where0
    AND x.address1 IS NOT NULL
    AND x.address2 IS NOT NULL

    ORDER BY createdAt ASC, id ASC

PagerFanta Add-on

This annotation can create a PagerFanta instance if you need it. You only have to define your parameter as such, and the annotation resolver will wrap your paginator with a Pagerfanta object instance.

use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;
use Pagerfanta\Pagerfanta;

/**
 * Simple controller method
 *
 * This Controller matches pattern /myroute/paginate/
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      page = 1,
 *      limit = 10
 * )
 */
public function indexAction(Pagerfanta $paginator)
{
}

KNPPaginator Add-on

This annotation can create a KNPPaginator instance if you need it. You only have to define your parameter as such, and the annotation resolver will wrap your paginator with a KNPPaginator object instance.

use Mmoreram\ControllerExtraBundle\Annotation\CreatePaginator;
use Knp\Component\Pager\Pagination\PaginationInterface;

/**
 * Simple controller method
 *
 * This Controller matches pattern /myroute/paginate/
 *
 * @CreatePaginator(
 *      entityNamespace = "MmoreramCustomBundle:User",
 *      page = 1,
 *      limit = 10
 * )
 */
public function indexAction(PaginationInterface $paginator)
{
}

@LoadEntity

Loads an entity from your database, or creates a new one.

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Entity;
use Mmoreram\ControllerExtraBundle\Entity\User;

/**
 * Simple controller method
 *
 * @Entity(
 *      namespace = "MmoreramCustomBundle:User",
 *      name  = "user"
 * )
 */
public function indexAction(User $user)
{
}

By default, if name option is not set, the generated object will be placed in a parameter named $entity. This behaviour can be configured using default_name in configuration.

You can also use setters in Entity annotation. It means that you can simply call entity setters using Request attributes.

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Entity;
use Mmoreram\ControllerExtraBundle\Entity\Address;
use Mmoreram\ControllerExtraBundle\Entity\User;

/**
 * Simple controller method
 *
 * @Entity(
 *      namespace = "MmoreramCustomBundle:Address",
 *      name  = "address"
 * )
 * @Entity(
 *      namespace = "MmoreramCustomBundle:User",
 *      name  = "user",
 *      setters = {
 *          "setAddress": "address"
 *      }
 * )
 */
public function indexAction(Address $address, User $user)
{
}

When User instance is built, method setAddress is called using as parameter the new Address instance.

New entities are just created with a simple new(), so they are not persisted. By default, they will be persisted using configured manager, but you can disable this feature using persist option.

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Entity;
use Mmoreram\ControllerExtraBundle\Entity\User;

/**
 * Simple controller method
 *
 * @Entity(
 *      namespace = "MmoreramCustomBundle:User",
 *      name  = "user",
 *      persist = false
 * )
 */
public function indexAction(User $user)
{
}

Entity Mapping

When you define a new Entity annotation, you can also request the mapped entity given a map. It means that if a map is defined, this bundle will try to request the mapped instance satisfying it.

The keys of the map represent the names of the mapped fields and the values represent their desired values. Remember than you can refer to any Request attribute by using format ~field~, to any $_GET element by using format ?field? or to any $_POST by using format #field#

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Entity;
use Mmoreram\ControllerExtraBundle\Entity\User;

/**
 * Simple controller method
 *
 * This Controller matches pattern /user/edit/{id}/{username}
 *
 * @Entity(
 *      namespace = "MmoreramCustomBundle:User",
 *      name  = "user",
 *      mapping = {
 *          "id": "~id~",
 *          "username": "~username~"
 *      }
 * )
 */
public function indexAction(User $user)
{
}

In this case, you will try to get the mapped instance of User with passed id. If some mapping is defined and any entity is found, a new EntityNotFoundException` is thrown.

Entity Mapping Fallback

So what if one ore more than one mapping references are not found? For example, you're trying to map the {id} parameter from your route, but this parameter is not even defined. Whan happens here? Well, you can assume then that you want to pass a new entity instance by using the mappingFallback.

By default, if mapping_fallback option is not set, the used value will be the parameter default_mapping_fallback defined in configuration. By default this value is false

Don't confuse with the scenario where you're looking for an entity in your database, all mapping references have been resolved, and the entity is not found. In that case, a common "EntityNotFound" exception will be thrown by Doctrine.

Lets see an example. Because we have enabled the mappingFallback, and because the mapping definition does not match the assigned route, we will return a new empty User entity.

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Entity;
use Mmoreram\ControllerExtraBundle\Entity\User;

/**
 * Simple controller method
 *
 * This Controller matches pattern /user/edit/{id}
 *
 * @LoadEntity(
 *      namespace = "MmoreramCustomBundle:User",
 *      name  = "user",
 *      mapping = {
 *          "id": "~id~",
 *          "username": "~nonexisting~"
 *      },
 *      mappingFallback = true
 * )
 */
public function indexAction(User $user)
{
    // $user->getId() === null
}

Entity Repository

By default, the Doctrine entity manager provides the right repository per each entity (not the default one, but the right specific one). Although, you can define a custom repository to be used in your annotation by using the repository configuration.

/**
 * Simple controller method
 *
 * @CreateEntity(
 *      namespace = "MmoreramCustomBundle:User",
 *      mapping = {
 *          "id": "~id~",
 *          "username": "~username~"
 *      }
 *      repository = {
 *          "class" = "Mmoreram\CustomBundle\Repository\AnotherRepository",
 *      },
 * )
 */
public function indexAction(User $user)
{
}

By default, the method findOneBy will always be used, unless you define another one.

/**
 * Simple controller method
 *
 * @CreateEntity(
 *      namespace = "MmoreramCustomBundle:User",
 *      mapping = {
 *          "id": "~id~",
 *          "username": "~username~"
 *      }
 *      repository = {
 *          "class" = "Mmoreram\CustomBundle\Repository\AnotherRepository",
 *          "method" = "find",
 *      },
 * )
 */
public function indexAction(User $user)
{
}

Entity Factory

When the annotation considers that a new entity must be created, because no mapping information has been provided, or because the mapping fallback has been activated, by default a new instance will be created by using the namespace value.

This configuration block has three positions

  • class - factory class
  • method - Method to use when retrieving the object
  • static - Method is static

You can define the factory with a simple namespace

/**
 * Simple controller method
 *
 * @CreateEntity(
 *      namespace = "MmoreramCustomBundle:User",
 *      factory = {
 *          "class" = "Mmoreram\CustomBundle\Factory\UserFactory",
 *          "method" = "create",
 *          "static" = true,
 *      },
 * )
 */
public function indexAction(User $user)
{
}

If you want to define your Factory as a service, with the possibility of overriding namespace, you can simply define service name. All other options have the same behaviour.

parameters:

    #
    # Factories
    #
    my.bundle.factory.user_factory: Mmoreram\CustomBundle\Factory\UserFactory
/**
 * Simple controller method
 *
 * @CreateEntity(
 *      class = {
 *          "factory" = my.bundle.factory.user_factory,
 *          "method" = "create",
 *          "static" = true,
 *      },
 * )
 */
public function indexAction(User $user)
{
}

If you do not define the method, default one will be used. You can override this default value by defining new one in your config.yml. Same with static value

controller_extra:
    entity:
        default_factory_method: create
        default_factory_static: true

@CreateForm

Provides form injection in your controller actions. This annotation only needs a name to be defined in, where you must define namespace where your form is placed.

<?php

use Mmoreram\ControllerExtraBundle\Annotation\CreateForm;
use Symfony\Component\Form\AbstractType;

/**
 * Simple controller method
 *
 * @CreateForm(
 *      class = "\Mmoreram\CustomBundle\Form\Type\UserType",
 *      name  = "userType"
 * )
 */
public function indexAction(AbstractType $userType)
{
}

By default, if name option is not set, the generated object will be placed in a parameter named $form. This behaviour can be configured using default_name in configuration.

You can not just define your Type location using the namespace, in which case a new AbstractType element will be created. but you can also define it using service alias, in which case this bundle will return an instance using Symfony DI.

<?php

use Mmoreram\ControllerExtraBundle\Annotation\CreateForm;
use Symfony\Component\Form\AbstractType;

/**
 * Simple controller method
 *
 * @CreateForm(
 *      class = "user_type",
 *      name  = "userType"
 * )
 */
public function indexAction(AbstractType $userType)
{
}

This annotation allows you to not only create an instance of FormType, but also allows you to inject a Form object or a FormView object

To inject a Form object you only need to cast method value as such.

<?php

use Mmoreram\ControllerExtraBundle\Annotation\CreateForm;
use Symfony\Component\Form\Form;

/**
 * Simple controller method
 *
 * @CreateForm(
 *      class = "user_type",
 *      name  = "userForm"
 * )
 */
public function indexAction(Form $userForm)
{
}

You can also, using [SensioFrameworkExtraBundle][1]'s [ParamConverter][2], create a Form object with an previously created entity. you can define this entity using entity parameter.

<?php

use Sensio\Bundle\FrameworkExtraBundle\Configuration\Route;
use Sensio\Bundle\FrameworkExtraBundle\Configuration\ParamConverter;
use Symfony\Component\Form\Form;

use Mmoreram\ControllerExtraBundle\Annotation\CreateForm;
use Mmoreram\ControllerExtraBundle\Entity\User;

/**
 * Simple controller method
 *
 * @Route(
 *      path = "/user/{id}",
 *      name = "view_user"
 * )
 * @ParamConverter("user", class="MmoreramCustomBundle:User")
 * @CreateForm(
 *      class  = "user_type",
 *      entity = "user"
 *      name   = "userForm",
 * )
 */
public function indexAction(User $user, Form $userForm)
{
}

To handle current request, you can set handleRequest to true. By default this value is set to false

<?php

use Sensio\Bundle\FrameworkExtraBundle\Configuration\Route;
use Sensio\Bundle\FrameworkExtraBundle\Configuration\ParamConverter;
use Symfony\Component\Form\Form;

use Mmoreram\ControllerExtraBundle\Annotation\CreateForm;
use Mmoreram\ControllerExtraBundle\Entity\User;

/**
 * Simple controller method
 *
 * @Route(
 *      path = "/user/{id}",
 *      name = "view_user"
 * )
 * @ParamConverter("user", class="MmoreramCustomBundle:User")
 * @CreateForm(
 *      class         = "user_type",
 *      entity        = "user"
 *      handleRequest = true,
 *      name          = "userForm",
 * )
 */
public function indexAction(User $user, Form $userForm)
{
}

You can also add as a method parameter if the form is valid, using validate setting. Annotation will place result of $form->isValid() in specified method argument.

<?php

use Sensio\Bundle\FrameworkExtraBundle\Configuration\Route;
use Sensio\Bundle\FrameworkExtraBundle\Configuration\ParamConverter;
use Symfony\Component\Form\Form;

use Mmoreram\ControllerExtraBundle\Annotation\CreateForm;
use Mmoreram\ControllerExtraBundle\Entity\User;

/**
 * Simple controller method
 *
 * @Route(
 *      path = "/user/{id}",
 *      name = "view_user"
 * )
 * @ParamConverter("user", class="MmoreramCustomBundle:User")
 * @CreateForm(
 *      class         = "user_type",
 *      entity        = "user"
 *      handleRequest = true,
 *      name          = "userForm",
 *      validate      = "isValid",
 * )
 */
public function indexAction(User $user, Form $userForm, $isValid)
{
}

To inject a FormView object you only need to cast method variable as such.

<?php

use Symfony\Component\Form\FormView;

use Mmoreram\ControllerExtraBundle\Annotation\CreateForm;

/**
 * Simple controller method
 *
 * @CreateForm(
 *      class = "user_type",
 *      name  = "userFormView"
 * )
 */
public function indexAction(FormView $userFormView)
{
}

@Flush

Flush annotation allows you to flush entityManager at the end of request using kernel.response event

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Flush;

/**
 * Simple controller method
 *
 * @Flush
 */
public function indexAction()
{
}

If not otherwise specified, default Doctrine Manager will be flushed with this annotation. You can overwrite default Manager in your config.yml file.

controller_extra:
    flush:
        default_manager: my_custom_manager

You can also override this value in every single Flush Annotation instance defining manager value

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Flush;

/**
 * Simple controller method
 *
 * @Flush(
 *      manager = "my_own_manager"
 * )
 */
public function indexAction()
{
}

If you want to change default manager in all annotation instances, you should override bundle parameter in your config.yml file.

controller_extra:
    flush:
        default_manager: my_own_manager

If any parameter is set, annotation will flush all. If you only need to flush one or many entities, you can define explicitly which entity must be flushed.

<?php

use Sensio\Bundle\FrameworkExtraBundle\Configuration\ParamConverter;

use Mmoreram\ControllerExtraBundle\Annotation\Flush;
use Mmoreram\ControllerExtraBundle\Entity\User;

/**
 * Simple controller method
 *
 * @ParamConverter("user", class="MmoreramCustomBundle:User")
 * @Flush(
 *      entity = "user"
 * )
 */
public function indexAction(User $user)
{
}

You can also define a set of entities to flush

<?php

use Sensio\Bundle\FrameworkExtraBundle\Configuration\ParamConverter;

use Mmoreram\ControllerExtraBundle\Annotation\Flush;
use Mmoreram\ControllerExtraBundle\Entity\Address;
use Mmoreram\ControllerExtraBundle\Entity\User;

/**
 * Simple controller method
 *
 * @ParamConverter("user", class="MmoreramCustomBundle:User")
 * @ParamConverter("address", class="MmoreramCustomBundle:Address")
 * @Flush(
 *      entity = {
 *          "user", 
 *          "address"
 *      }
 * )
 */
public function indexAction(User $user, Address $address)
{
}

If multiple @Mmoreram\Flush are defined in same action, last instance will overwrite previous. Anyway just one instance should be defined.

@ToJsonResponse

JsonResponse annotation allows you to create a Symfony\Component\HttpFoundation\JsonResponse object, given a simple controller return value.

<?php

use Mmoreram\ControllerExtraBundle\Annotation\ToJsonResponse;

/**
 * Simple controller method
 *
 * @ToJsonResponse
 */
public function indexAction(User $user, Address $address)
{
    return array(
        'This is my response'
    );
}

By default, JsonResponse is created using default status and headers defined in bundle parameters. You can overwrite them in your config.yml file.

controller_extra:
    json_response:
        default_status: 403
        default_headers:
            "User-Agent": "Googlebot/2.1"

You can also overwrite these values in each @JsonResponse annotation.

<?php

use Mmoreram\ControllerExtraBundle\Annotation\ToJsonResponse;

/**
 * Simple controller method
 *
 * @ToJsonResponse(
 *      status = 403,
 *      headers = {
 *          "User-Agent": "Googlebot/2.1"
 *      }
 * )
 */
public function indexAction(User $user, Address $address)
{
    return array(
        'This is my response'
    );
}

If an Exception is returned the response status is set by default to 500 and the Exception message is returned as response.

STATUS 500 Internal server error

{
    message : 'Exception message'
}

In case we use a HttpExceptionInterface the use the exception status code as status code. In case we launch this exception

use Symfony\Component\HttpKernel\Exception\NotFoundHttpException;

...

return new NotFoundHttpException('Resource not found');

We'll receive this response

STATUS 404 Not Found

{
    message : 'Resource not found'
}

If the exception is being launched on an annotation (e.g. Entity annotation) remember to add the JsonResponse annotation at the beginning or at least before any annotation that could cause an exception.

If multiple @Mmoreram\JsonResponse are defined in same action, last instance will overwrite previous. Anyway just one instance should be defined.

@Log

Log annotation allows you to log any plain message before or after controller action execution

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Log;

/**
 * Simple controller method
 *
 * @Log("Executing index Action")
 */
public function indexAction()
{
}

You can define the level of the message. You can define default one if none is specified overriding it in your config.yml file.

controller_extra:
    log:
        default_level: warning

Every Annotation instance can overwrite this value using level field.

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Log;

/**
 * Simple controller method
 *
 * @Log(
 *      value   = "Executing index Action",
 *      level   = @Log::LVL_WARNING
 * )
 */
public function indexAction()
{
}

Several levels can be used, as defined in [Psr\Log\LoggerInterface][6] interface

  • @Mmoreram\Log::LVL_EMERG
  • @Mmoreram\Log::LVL_CRIT
  • @Mmoreram\Log::LVL_ERR
  • @Mmoreram\Log::LVL_WARN
  • @Mmoreram\Log::LVL_NOTICE
  • @Mmoreram\Log::LVL_INFO
  • @Mmoreram\Log::LVL_DEBUG
  • @Mmoreram\Log::LVL_LOG

You can also define the execution of the log. You can define default one if none is specified overriding it in your config.yml file.

controller_extra:
    log:
        default_execute: pre

Every Annotation instance can overwrite this value using level field.

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Log;

/**
 * Simple controller method
 *
 * @Log(
 *      value   = "Executing index Action",
 *      execute = @Log::EXEC_POST
 * )
 */
public function indexAction()
{
}

Several executions can be used,

  • @Mmoreram\Log::EXEC_PRE - Logged before controller execution
  • @Mmoreram\Log::EXEC_POST - Logged after controller execution
  • @Mmoreram\Log::EXEC_BOTH - Logged both

@Get

The Get annotation allows you to get any parameter from the request query string.

For a GET request like:

GET /my-page?foo=bar HTTP/1.1

You can can simply get the foo var using the GET annotation

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Get;

/**
 * Simple controller method
 *
 * @Get(
 *     path = "foo"
 * )
 */
public function indexAction($foo)
{
    // Use the foo var
}

You can also customize the var name and the default value in case the var is not sent on the query string.

For a GET request like:

GET /my-page HTTP/1.1

And this annotation

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Get;

/**
 * Simple controller method
 *
 * @Get(
 *     path = "foo",
 *     name = "varName",
 *     default = 'bar',
 * )
 */
public function indexAction($varName)
{
    // This would print 'bar'
    echo $varName;
}

@Post

The Post annotation allows you to get any parameter from the post request body.

For a POST request like:

POST /my-page HTTP/1.1
foo=bar

You can can simply get the foo var using the POST annotation

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Post;

/**
 * Simple controller method
 *
 * @Post(
 *     path = "foo"
 * )
 */
public function indexAction($foo)
{
    // Use the foo var
}

You can also customize the var name and the default value in case the var is not sent on the query string.

For a POST request like:

POST /my-page HTTP/1.1

And this annotation

<?php

use Mmoreram\ControllerExtraBundle\Annotation\Post;

/**
 * Simple controller method
 *
 * @Post(
 *     path = "foo",
 *     name = "varName",
 *     default = 'bar',
 * )
 */
public function indexAction($varName)
{
    // This would print 'bar'
    echo $varName;
}

Custom annotations

Using this bundle you can now create, in a very easy way, your own controller annotation.

Annotation

The annotation object. You need to define the fields your custom annotation will contain. Must extends Mmoreram\ControllerExtraBundle\Annotation\Annotation abstract class.

<?php

namespace My\Bundle\Annotation;

use Mmoreram\ControllerExtraBundle\Annotation\Annotation;

/**
 * Entity annotation driver
 *
 * @Annotation
 * @Target({"METHOD"})
 */
final class MyCustomAnnotation extends Annotation
{
    /**
     * @var string
     *
     * Dummy field
     */
    public $field;
    
    /**
     * Get Dummy field
     *
     * @return string Dummy field
     */
    public function getField()
    {
        return $this->field;
    }
}

Resolver

Once you have defined your own annotation, you have to resolve how this annotation works in a controller. You can manage this using a Resolver. Must extend Mmoreram\ControllerExtraBundle\Resolver\AnnotationResolver; abstract class.

<?php

namespace My\Bundle\Resolver;

use Symfony\Component\HttpFoundation\Request;

use Mmoreram\ControllerExtraBundle\Resolver\AnnotationResolver;
use Mmoreram\ControllerExtraBundle\Annotation\Annotation;

/**
 * MyCustomAnnotation Resolver
 */
class MyCustomAnnotationResolver extends AnnotationResolver
{
    /**
     * Specific annotation evaluation.
     *
     * This method must be implemented in every single EventListener
     * with specific logic
     *
     * All method code will executed only if specific active flag is true
     *
     * @param Request          $request
     * @param Annotation       $annotation
     * @param ReflectionMethod $method
     */
    public function evaluateAnnotation(
        Request $request,
        Annotation $annotation,
        ReflectionMethod $method
    )
    {
        /**
         * You can now manage your annotation.
         * You can access to its fields using public methods.
         * 
         * Annotation fields can be public and can be acceded directly,
         * but is better for testing to use getters; they can be mocked.
         */
        $field = $annotation->getField();
        
        /**
         * You can also access to existing method parameters.
         * 
         * Available parameters are:
         * 
         * # ParamConverter parameters ( See `resolver_priority` config value )
         * # All method defined parameters, included Request object if is set.
         */
        $entity = $request->attributes->get('entity');
        
        /**
         * And you can now place new elements in the controller action.
         * In this example we are creating new method parameter
         * called $myNewField with some value
         */
        $request->attributes->set(
            'myNewField',
            new $field()
        );
        
        return $this;
    }

}

This class will be defined as a service, so this method is computed just before executing current controller. You can also subscribe to some kernel events and do whatever you need to do ( You can check Mmoreram\ControllerExtraBundle\Resolver\LogAnnotationResolver for some examples.

Definition

Once Resolver is done, we need to define our service as an Annotation Resolver. We will use a custom tag.

parameters:
    #
    # Resolvers
    #
    my.bundle.resolver.my_custom_annotation_resolver.class: My\Bundle\Resolver\MyCustomAnnotationResolver

services:
    #
    # Resolvers
    #
    my.bundle.resolver.my_custom_annotation_resolver:
        class: %my.bundle.resolver.my_custom_annotation_resolver.class%
        tags:
            - { name: controller_extra.annotation }

Registration

We need to register our annotation inside our application. We can just do it in the boot() method of bundle.php file.

<?php

namespace My\Bundle;

use Symfony\Component\HttpKernel\Bundle\Bundle;
use Doctrine\Common\Annotations\AnnotationRegistry;

/**
 * MyBundle
 */
class ControllerExtraBundle extends Bundle
{

    /**
     * Boots the Bundle.
     */
    public function boot()
    {
        $kernel = $this->container->get('kernel');

        AnnotationRegistry::registerFile($kernel
            ->locateResource("@MyBundle/Annotation/MyCustomAnnotation.php")
        );
    }
}

Et voilà! We can now use our custom Annotation in our project controllers.


Download Details:

Author: mmoreram
Source Code: https://github.com/mmoreram/ControllerExtraBundle

License: MIT license

#symfony #php 

Brook  Legros

Brook Legros

1659408900

TinyTDS: FreeTDS Bindings for Ruby using DB-Library

TinyTDS - Simple and fast FreeTDS bindings for Ruby using DB-Library.

  • TravisCI - TravisCI
  • Build Status - Appveyor
  • Gem Version - Gem Version
  • Gitter chat - Community

About TinyTDS

The TinyTDS gem is meant to serve the extremely common use-case of connecting, querying and iterating over results to Microsoft SQL Server or Sybase databases from Ruby using the FreeTDS's DB-Library API.

TinyTDS offers automatic casting to Ruby primitives along with proper encoding support. It converts all SQL Server datatypes to native Ruby primitives while supporting :utc or :local time zones for time-like types. To date it is the only Ruby client library that allows client encoding options, defaulting to UTF-8, while connecting to SQL Server. It also properly encodes all string and binary data. The motivation for TinyTDS is to become the de-facto low level connection mode for the SQL Server Adapter for ActiveRecord.

The API is simple and consists of these classes:

  • TinyTds::Client - Your connection to the database.
  • TinyTds::Result - Returned from issuing an #execute on the connection. It includes Enumerable.
  • TinyTds::Error - A wrapper for all FreeTDS exceptions.

Install

Installing with rubygems should just work. TinyTDS is currently tested on Ruby version 2.0.0 and upward.

$ gem install tiny_tds

If you use Windows, we pre-compile TinyTDS with static versions of FreeTDS and supporting libraries. If you're using RubyInstaller the binary gem will require that devkit is installed and in your path to operate properly.

On all other platforms, we will find these dependencies. It is recommended that you install the latest FreeTDS via your method of choice. For example, here is how to install FreeTDS on Ubuntu. You might also need the build-essential and possibly the libc6-dev packages.

$ apt-get install wget
$ apt-get install build-essential
$ apt-get install libc6-dev

$ wget http://www.freetds.org/files/stable/freetds-1.1.24.tar.gz
$ tar -xzf freetds-1.1.24.tar.gz
$ cd freetds-1.1.24
$ ./configure --prefix=/usr/local --with-tdsver=7.3
$ make
$ make install

Please read the MiniPortile and/or Windows sections at the end of this file for advanced configuration options past the following:

--with-freetds-dir=DIR
  Use the freetds library placed under DIR.

Getting Started

Optionally, Microsoft has done a great job writing some articles on how to get started with SQL Server and Ruby using TinyTDS. Please checkout one of the following posts that match your platform.

FreeTDS Compatibility & Configuration

TinyTDS is developed against FreeTDS 0.95, 0.99, and 1.0 current. Our default and recommended is 1.0. We also test with SQL Server 2008, 2014, and Azure. However, usage of TinyTDS with SQL Server 2000 or 2005 should be just fine. Below are a few QA style notes about installing FreeTDS.

NOTE: Windows users of our pre-compiled native gems need not worry about installing FreeTDS and its dependencies.

Do I need to install FreeTDS? Yes! Somehow, someway, you are going to need FreeTDS for TinyTDS to compile against.

OK, I am installing FreeTDS, how do I configure it? Contrary to what most people think, you do not need to specially configure FreeTDS in any way for client libraries like TinyTDS to use it. About the only requirement is that you compile it with libiconv for proper encoding support. FreeTDS must also be compiled with OpenSSL (or the like) to use it with Azure. See the "Using TinyTDS with Azure" section below for more info.

Do I need to configure --with-tdsver equal to anything? Most likely! Technically you should not have to. This is only a default for clients/configs that do not specify what TDS version they want to use. We are currently having issues with passing down a TDS version with the login bit. Till we get that fixed, if you are not using a freetds.conf or a TDSVER environment variable, then make sure to use 7.1.

But I want to use TDS version 7.2 for SQL Server 2005 and up! TinyTDS uses TDS version 7.1 (previously named 8.0) and fully supports all the data types supported by FreeTDS, this includes varchar(max) and nvarchar(max). Technically compiling and using TDS version 7.2 with FreeTDS is not supported. But this does not mean those data types will not work. I know, it's confusing If you want to learn more, read this thread. http://lists.ibiblio.org/pipermail/freetds/2011q3/027306.html

I want to configure FreeTDS using --enable-msdblib and/or --enable-sybase-compat so it works for my database. Cool? It's a waste of time and totally moot! Client libraries like TinyTDS define their own C structure names where they diverge from Sybase to SQL Server. Technically we use the MSDBLIB structures which does not mean we only work with that database vs Sybase. These configs are just a low level default for C libraries that do not define what they want. So I repeat, you do not NEED to use any of these, nor will they hurt anything since we control what C structure names we use internally!

Data Types

Our goal is to support every SQL Server data type and covert it to a logical Ruby object. When dates or times are returned, they are instantiated to either :utc or :local time depending on the query options. Only [datetimeoffset] types are excluded. All strings are associated the to the connection's encoding and all binary data types are associated to Ruby's ASCII-8BIT/BINARY encoding.

Below is a list of the data types we support when using the 7.3 TDS protocol version. Using a lower protocol version will result in these types being returned as strings.

  • [date]
  • [datetime2]
  • [datetimeoffset]
  • [time]

TinyTds::Client Usage

Connect to a database.

client = TinyTds::Client.new username: 'sa', password: 'secret', host: 'mydb.host.net'

Creating a new client takes a hash of options. For valid iconv encoding options, see the output of iconv -l. Only a few have been tested and highly recommended to leave blank for the UTF-8 default.

  • :username - The database server user.
  • :password - The user password.
  • :dataserver - Can be the name for your data server as defined in freetds.conf. Raw hostname or hostname:port will work here too. FreeTDS says that named instance like 'localhost\SQLEXPRESS' work too, but I highly suggest that you use the :host and :port options below. Google how to find your host port if you are using named instances or go here.
  • :host - Used if :dataserver blank. Can be an host name or IP.
  • :port - Defaults to 1433. Only used if :host is used.
  • :database - The default database to use.
  • :appname - Short string seen in SQL Servers process/activity window.
  • :tds_version - TDS version. Defaults to "7.3".
  • :login_timeout - Seconds to wait for login. Default to 60 seconds.
  • :timeout - Seconds to wait for a response to a SQL command. Default 5 seconds. Prior to 1.0rc5, FreeTDS was unable to set the timeout on a per-client basis, permitting only a global timeout value. This means that if you're using an older version, the timeout values for all clients will be overwritten each time you instantiate a new TinyTds::Client object. If you are using 1.0rc5 or later, all clients will have an independent timeout setting as you'd expect. Timeouts caused by network failure will raise a timeout error 1 second after the configured timeout limit is hit (see #481 for details).
  • :encoding - Any valid iconv value like CP1251 or ISO-8859-1. Default UTF-8.
  • :azure - Pass true to signal that you are connecting to azure.
  • :contained - Pass true to signal that you are connecting with a contained database user.
  • :use_utf16 - Instead of using UCS-2 for database wide character encoding use UTF-16. Newer Windows versions use this encoding instead of UCS-2. Default true.
  • :message_handler - Pass in a call-able object such as a Proc or a method to receive info messages from the database. It should have a single parameter, which will be a TinyTds::Error object representing the message. For example:
opts = ... # host, username, password, etc
opts[:message_handler] = Proc.new { |m| puts m.message }
client = TinyTds::Client.new opts
# => Changed database context to 'master'.
# => Changed language setting to us_english.
client.execute("print 'hello world!'").do
# => hello world!

Use the #active? method to determine if a connection is good. The implementation of this method may change but it should always guarantee that a connection is good. Current it checks for either a closed or dead connection.

client.dead?    # => false
client.closed?  # => false
client.active?  # => true
client.execute("SQL TO A DEAD SERVER")
client.dead?    # => true
client.closed?  # => false
client.active?  # => false
client.close
client.closed?  # => true
client.active?  # => false

Escape strings.

client.escape("How's It Going'") # => "How''s It Going''"

Send a SQL string to the database and return a TinyTds::Result object.

result = client.execute("SELECT * FROM [datatypes]")

TinyTds::Result Usage

A result object is returned by the client's execute command. It is important that you either return the data from the query, most likely with the #each method, or that you cancel the results before asking the client to execute another SQL batch. Failing to do so will yield an error.

Calling #each on the result will lazily load each row from the database.

result.each do |row|
  # By default each row is a hash.
  # The keys are the fields, as you'd expect.
  # The values are pre-built Ruby primitives mapped from their corresponding types.
end

A result object has a #fields accessor. It can be called before the result rows are iterated over. Even if no rows are returned, #fields will still return the column names you expected. Any SQL that does not return columned data will always return an empty array for #fields. It is important to remember that if you access the #fields before iterating over the results, the columns will always follow the default query option's :symbolize_keys setting at the client's level and will ignore the query options passed to each.

result = client.execute("USE [tinytdstest]")
result.fields # => []
result.do

result = client.execute("SELECT [id] FROM [datatypes]")
result.fields # => ["id"]
result.cancel
result = client.execute("SELECT [id] FROM [datatypes]")
result.each(:symbolize_keys => true)
result.fields # => [:id]

You can cancel a result object's data from being loading by the server.

result = client.execute("SELECT * FROM [super_big_table]")
result.cancel

You can use results cancelation in conjunction with results lazy loading, no problem.

result = client.execute("SELECT * FROM [super_big_table]")
result.each_with_index do |row, i|
  break if row > 10
end
result.cancel

If the SQL executed by the client returns affected rows, you can easily find out how many.

result.each
result.affected_rows # => 24

This pattern is so common for UPDATE and DELETE statements that the #do method cancels any need for loading the result data and returns the #affected_rows.

result = client.execute("DELETE FROM [datatypes]")
result.do # => 72

Likewise for INSERT statements, the #insert method cancels any need for loading the result data and executes a SCOPE_IDENTITY() for the primary key.

result = client.execute("INSERT INTO [datatypes] ([xml]) VALUES ('<html><br/></html>')")
result.insert # => 420

The result object can handle multiple result sets form batched SQL or stored procedures. It is critical to remember that when calling each with a block for the first time will return each "row" of each result set. Calling each a second time with a block will yield each "set".

sql = ["SELECT TOP (1) [id] FROM [datatypes]",
       "SELECT TOP (2) [bigint] FROM [datatypes] WHERE [bigint] IS NOT NULL"].join(' ')

set1, set2 = client.execute(sql).each
set1 # => [{"id"=>11}]
set2 # => [{"bigint"=>-9223372036854775807}, {"bigint"=>9223372036854775806}]

result = client.execute(sql)

result.each do |rowset|
  # First time data loading, yields each row from each set.
  # 1st: {"id"=>11}
  # 2nd: {"bigint"=>-9223372036854775807}
  # 3rd: {"bigint"=>9223372036854775806}
end

result.each do |rowset|
  # Second time over (if columns cached), yields each set.
  # 1st: [{"id"=>11}]
  # 2nd: [{"bigint"=>-9223372036854775807}, {"bigint"=>9223372036854775806}]
end

Use the #sqlsent? and #canceled? query methods on the client to determine if an active SQL batch still needs to be processed and or if data results were canceled from the last result object. These values reset to true and false respectively for the client at the start of each #execute and new result object. Or if all rows are processed normally, #sqlsent? will return false. To demonstrate, lets assume we have 100 rows in the result object.

client.sqlsent?   # = false
client.canceled?  # = false

result = client.execute("SELECT * FROM [super_big_table]")

client.sqlsent?   # = true
client.canceled?  # = false

result.each do |row|
  # Assume we break after 20 rows with 80 still pending.
  break if row["id"] > 20
end

client.sqlsent?   # = true
client.canceled?  # = false

result.cancel

client.sqlsent?   # = false
client.canceled?  # = true

It is possible to get the return code after executing a stored procedure from either the result or client object.

client.return_code  # => nil

result = client.execute("EXEC tinytds_TestReturnCodes")
result.do
result.return_code  # => 420
client.return_code  # => 420

Query Options

Every TinyTds::Result object can pass query options to the #each method. The defaults are defined and configurable by setting options in the TinyTds::Client.default_query_options hash. The default values are:

  • :as => :hash - Object for each row yielded. Can be set to :array.
  • :symbolize_keys => false - Row hash keys. Defaults to shared/frozen string keys.
  • :cache_rows => true - Successive calls to #each returns the cached rows.
  • :timezone => :local - Local to the Ruby client or :utc for UTC.
  • :empty_sets => true - Include empty results set in queries that return multiple result sets.

Each result gets a copy of the default options you specify at the client level and can be overridden by passing an options hash to the #each method. For example

result.each(:as => :array, :cache_rows => false) do |row|
  # Each row is now an array of values ordered by #fields.
  # Rows are yielded and forgotten about, freeing memory.
end

Besides the standard query options, the result object can take one additional option. Using :first => true will only load the first row of data and cancel all remaining results.

result = client.execute("SELECT * FROM [super_big_table]")
result.each(:first => true) # => [{'id' => 24}]

Row Caching

By default row caching is turned on because the SQL Server adapter for ActiveRecord would not work without it. I hope to find some time to create some performance patches for ActiveRecord that would allow it to take advantages of lazily created yielded rows from result objects. Currently only TinyTDS and the Mysql2 gem allow such a performance gain.

Encoding Error Handling

TinyTDS takes an opinionated stance on how we handle encoding errors. First, we treat errors differently on reads vs. writes. Our opinion is that if you are reading bad data due to your client's encoding option, you would rather just find ? marks in your strings vs being blocked with exceptions. This is how things wold work via ODBC or SMS. On the other hand, writes will raise an exception. In this case we raise the SYBEICONVO/2402 error message which has a description of Error converting characters into server's character set. Some character(s) could not be converted.. Even though the severity of this message is only a 4 and TinyTDS will automatically strip/ignore unknown characters, we feel you should know that you are inserting bad encodings. In this way, a transaction can be rolled back, etc. Remember, any database write that has bad characters due to the client encoding will still be written to the database, but it is up to you rollback said write if needed. Most ORMs like ActiveRecord handle this scenario just fine.

Timeout Error Handling

TinyTDS will raise a TinyTDS::Error when a timeout is reached based on the options supplied to the client. Depending on the reason for the timeout, the connection could be dead or alive. When db processing is the cause for the timeout, the connection should still be usable after the error is raised. When network failure is the cause of the timeout, the connection will be dead. If you attempt to execute another command batch on a dead connection you will see a DBPROCESS is dead or not enabled error. Therefore, it is recommended to check for a dead? connection before trying to execute another command batch.

Binstubs

The TinyTDS gem uses binstub wrappers which mirror compiled FreeTDS Utilities binaries. These native executables are usually installed at the system level when installing FreeTDS. However, when using MiniPortile to install TinyTDS as we do with Windows binaries, these binstubs will find and prefer local gem exe directory executables. These are the following binstubs we wrap.

  • tsql - Used to test connections and debug compile time settings.
  • defncopy - Used to dump schema structures.

Using TinyTDS With Rails & The ActiveRecord SQL Server adapter.

TinyTDS is the default connection mode for the SQL Server adapter in versions 3.1 or higher. The SQL Server adapter can be found using the links below.

Using TinyTDS with Azure

TinyTDS is fully tested with the Azure platform. You must set the azure: true connection option when connecting. This is needed to specify the default database name in the login packet since Azure has no notion of USE [database]. FreeTDS must be compiled with OpenSSL too.

IMPORTANT: Do not use username@server.database.windows.net for the username connection option! You must use the shorter username@server instead!

Also, please read the Azure SQL Database General Guidelines and Limitations MSDN article to understand the differences. Specifically, the connection constraints section!

Connection Settings

A DBLIB connection does not have the same default SET options for a standard SMS SQL Server connection. Hence, we recommend the following options post establishing your connection.

SQL Server

SET ANSI_DEFAULTS ON

SET QUOTED_IDENTIFIER ON
SET CURSOR_CLOSE_ON_COMMIT OFF
SET IMPLICIT_TRANSACTIONS OFF
SET TEXTSIZE 2147483647
SET CONCAT_NULL_YIELDS_NULL ON

Azure

SET ANSI_NULLS ON
SET ANSI_NULL_DFLT_ON ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON

SET QUOTED_IDENTIFIER ON
SET CURSOR_CLOSE_ON_COMMIT OFF
SET IMPLICIT_TRANSACTIONS OFF
SET TEXTSIZE 2147483647
SET CONCAT_NULL_YIELDS_NULL ON

Thread Safety

TinyTDS must be used with a connection pool for thread safety. If you use ActiveRecord or the Sequel gem this is done for you. However, if you are using TinyTDS on your own, we recommend using the ConnectionPool gem when using threads:

Please read our thread_test.rb file for details on how we test its usage.

Emoji Support 😍

This is possible using FreeTDS version 0.95 or higher. You must use the use_utf16 login option or add the following config to your freetds.conf in either the global section or a specfic dataserver. If you are on Windows, the default location for your conf file will be in C:\Sites.

[global]
  use utf-16 = true

The default is true and since FreeTDS v1.0 would do this as well.

Compiling Gems for Windows

For the convenience of Windows users, TinyTDS ships pre-compiled gems for supported versions of Ruby on Windows. In order to generate these gems, rake-compiler-dock is used. This project provides several Docker images with rvm, cross-compilers and a number of different target versions of Ruby.

Run the following rake task to compile the gems for Windows. This will check the availability of Docker (and boot2docker on Windows or OS-X) and will give some advice for download and installation. When docker is running, it will download the docker image (once-only) and start the build:

$ rake gem:windows

The compiled gems will exist in ./pkg directory.

Development & Testing

First, clone the repo using the command line or your Git GUI of choice.

$ git clone git@github.com:rails-sqlserver/tiny_tds.git

After that, the quickest way to get setup for development is to use Docker. Assuming you have downloaded docker for your platform, you can use docker-compose to run the necessary containers for testing.

$ docker-compose up -d

This will download our SQL Server for Linux Docker image based from microsoft/mssql-server-linux/. Our image already has the [tinytdstest] DB and tinytds users created. This will also download a toxiproxy Docker image which we can use to simulate network failures for tests. Basically, it does the following.

$ docker network create main-network
$ docker pull metaskills/mssql-server-linux-tinytds
$ docker run -p 1433:1433 -d --name sqlserver --network main-network metaskills/mssql-server-linux-tinytds
$ docker pull shopify/toxiproxy
$ docker run -p 8474:8474 -p 1234:1234 -d --name toxiproxy --network main-network shopify/toxiproxy

If you are using your own database. Make sure to run these SQL commands as SA to get the test database and user installed.

CREATE DATABASE [tinytdstest];
CREATE LOGIN [tinytds] WITH PASSWORD = '', CHECK_POLICY = OFF, DEFAULT_DATABASE = [tinytdstest];
USE [tinytdstest];
CREATE USER [tinytds] FOR LOGIN [tinytds];
EXEC sp_addrolemember N'db_owner', N'tinytds';

From here you can build and run tests against an installed version of FreeTDS.

$ bundle install
$ bundle exec rake

Examples us using enviornment variables to customize the test task.

$ rake TINYTDS_UNIT_DATASERVER=mydbserver
$ rake TINYTDS_UNIT_DATASERVER=mydbserver TINYTDS_SCHEMA=sqlserver_2008
$ rake TINYTDS_UNIT_HOST=mydb.host.net TINYTDS_SCHEMA=sqlserver_azure
$ rake TINYTDS_UNIT_HOST=mydb.host.net TINYTDS_UNIT_PORT=5000 TINYTDS_SCHEMA=sybase_ase

Docker Builds

If you use a multi stage Docker build to assemble your gems in one phase and then copy your app and gems into another, lighter, container without build tools you will need to make sure you tell the OS how to find dependencies for TinyTDS.

After you have built and installed FreeTDS it will normally place library files in /usr/local/lib. When TinyTDS builds native extensions, it already knows to look here but if you copy your app to a new container that link will be broken.

Set the LD_LIBRARY_PATH environment variable export LD_LIBRARY_PATH=/usr/local/lib:${LD_LIBRARY_PATH} and run ldconfig. If you run ldd tiny_tds.so you should not see any broken links. Make sure you also copied in the library dependencies from your build container with a command like COPY --from=builder /usr/local/lib /usr/local/lib.

Help & Support

About Me

My name is Ken Collins and I currently maintain the SQL Server adapter for ActiveRecord and wrote this library as my first cut into learning Ruby C extensions. Hopefully it will help promote the power of Ruby and the Rails framework to those that have not yet discovered it. My blog is metaskills.net and I can be found on twitter as @metaskills. Enjoy!

Special Thanks

License

TinyTDS is Copyright (c) 2010-2015 Ken Collins, ken@metaskills.net and Will Bond (Veracross LLC) wbond@breuer.com. It is distributed under the MIT license. Windows binaries contain pre-compiled versions of FreeTDS http://www.freetds.org/ which is licensed under the GNU LGPL license at http://www.gnu.org/licenses/lgpl-2.0.html


Author: rails-sqlserver
Source code: https://github.com/rails-sqlserver/tiny_tds
License:

#ruby   #ruby-on-rails 

Hertha  Mayer

Hertha Mayer

1594769515

How to validate mobile phone number in laravel with example

Data validation and sanitization is a very important thing from security point of view for a web application. We can not rely on user’s input. In this article i will let you know how to validate mobile phone number in laravel with some examples.

if we take some user’s information in our application, so usually we take phone number too. And if validation on the mobile number field is not done, a user can put anything in the mobile number field and without genuine phone number, this data would be useless.

Since we know that mobile number can not be an alpha numeric or any alphabates aand also it should be 10 digit number. So here in this examples we will add 10 digit number validation in laravel application.

We will aalso see the uses of regex in the validation of mobile number. So let’s do it with two different way in two examples.

Example 1:

In this first example we will write phone number validation in HomeController where we will processs user’s data.

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;
use App\User;

class HomeController extends Controller
{
    /**
     * Show the application dashboard.
     *
     * @return \Illuminate\Http\Response
     */
    public function create()
    {
        return view('createUser');
    }

    /**
     * Show the application dashboard.
     *
     * @return \Illuminate\Http\Response
     */
    public function store(Request $request)
    {
        $request->validate([
                'name' => 'required',
                'phone' => 'required|digits:10',
                'email' => 'required|email|unique:users'
            ]);

        $input = $request->all();
        $user = User::create($input);

        return back()->with('success', 'User created successfully.');
    }
}

Example 2:

In this second example, we will use regex for user’s mobile phone number validation before storing user data in our database. Here, we will write the validation in Homecontroller like below.

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;
use App\User;
use Validator;

class HomeController extends Controller
{
    /**
     * Show the application dashboard.
     *
     * @return \Illuminate\Http\Response
     */
    public function create()
    {
        return view('createUser');
    }

    /**
     * Show the application dashboard.
     *
     * @return \Illuminate\Http\Response
     */
    public function store(Request $request)
    {
        $request->validate([
                'name' => 'required',
                'phone' => 'required|regex:/^([0-9\s\-\+\(\)]*)$/|min:10',
                'email' => 'required|email|unique:users'
            ]);

        $input = $request->all();
        $user = User::create($input);

        return back()->with('success', 'User created successfully.');
    }
}

#laravel #laravel phone number validation #laravel phone validation #laravel validation example #mobile phone validation in laravel #phone validation with regex #validate mobile in laravel

Como construir um detector de notícias falsas em Python

Explorando o conjunto de dados de notícias falsas, realizando análises de dados, como nuvens de palavras e ngrams, e ajustando o transformador BERT para construir um detector de notícias falsas em Python usando a biblioteca de transformadores.

Fake news é a transmissão intencional de alegações falsas ou enganosas como notícias, onde as declarações são propositalmente enganosas.

Jornais, tablóides e revistas foram suplantados por plataformas de notícias digitais, blogs, feeds de mídia social e uma infinidade de aplicativos de notícias móveis. As organizações de notícias se beneficiaram do aumento do uso de mídias sociais e plataformas móveis, fornecendo aos assinantes informações atualizadas.

Os consumidores agora têm acesso instantâneo às últimas notícias. Essas plataformas de mídia digital ganharam destaque devido à sua fácil conexão com o resto do mundo e permitem aos usuários discutir e compartilhar ideias e debater temas como democracia, educação, saúde, pesquisa e história. As notícias falsas nas plataformas digitais estão cada vez mais populares e são usadas para fins lucrativos, como ganhos políticos e financeiros.

Quão Grande é este Problema?

Como a Internet, as mídias sociais e as plataformas digitais são amplamente utilizadas, qualquer pessoa pode propagar informações imprecisas e tendenciosas. É quase impossível evitar a disseminação de notícias falsas. Há um tremendo aumento na distribuição de notícias falsas, que não se restringe a um setor como a política, mas inclui esportes, saúde, história, entretenimento, ciência e pesquisa.

A solução

É vital reconhecer e diferenciar entre notícias falsas e verdadeiras. Um método é fazer com que um especialista decida e verifique cada informação, mas isso leva tempo e requer conhecimentos que não podem ser compartilhados. Em segundo lugar, podemos usar ferramentas de aprendizado de máquina e inteligência artificial para automatizar a identificação de notícias falsas.

As informações de notícias on-line incluem vários dados de formato não estruturado (como documentos, vídeos e áudio), mas vamos nos concentrar nas notícias em formato de texto aqui. Com o progresso do aprendizado de máquina e do processamento de linguagem natural , agora podemos reconhecer o caráter enganoso e falso de um artigo ou declaração.

Vários estudos e experimentos estão sendo realizados para detectar notícias falsas em todos os meios.

Nosso principal objetivo deste tutorial é:

  • Explore e analise o conjunto de dados de Fake News.
  • Construa um classificador que possa distinguir Fake news com o máximo de precisão possível.

Aqui está a tabela de conteúdo:

  • Introdução
  • Quão Grande é este Problema?
  • A solução
  • Exploração de dados
    • Distribuição de aulas
  • Limpeza de dados para análise
  • Análise Explorativa de Dados
    • Nuvem de palavra única
    • Bigrama mais frequente (combinação de duas palavras)
    • Trigrama mais frequente (combinação de três palavras)
  • Construindo um classificador ajustando o BERT
    • Preparação de dados
    • Tokenização do conjunto de dados
    • Carregando e Ajustando o Modelo
    • Avaliação do modelo
  • Apêndice: Criando um arquivo de envio para o Kaggle
  • Conclusão

Exploração de dados

Neste trabalho, utilizamos o conjunto de dados de notícias falsas do Kaggle para classificar notícias não confiáveis ​​como notícias falsas. Temos um conjunto de dados de treinamento completo contendo as seguintes características:

  • id: ID exclusivo para um artigo de notícias
  • title: título de uma notícia
  • author: autor da reportagem
  • text: texto do artigo; pode estar incompleto
  • label: um rótulo que marca o artigo como potencialmente não confiável indicado por 1 (não confiável ou falso) ou 0 (confiável).

É um problema de classificação binária no qual devemos prever se uma determinada notícia é confiável ou não.

Se você tiver uma conta Kaggle, basta baixar o conjunto de dados do site e extrair o arquivo ZIP.

Também carreguei o conjunto de dados no Google Drive, e você pode obtê-lo aqui ou usar a gdownbiblioteca para baixá-lo automaticamente nos notebooks do Google Colab ou Jupyter:

$ pip install gdown
# download from Google Drive
$ gdown "https://drive.google.com/uc?id=178f_VkNxccNidap-5-uffXUW475pAuPy&confirm=t"
Downloading...
From: https://drive.google.com/uc?id=178f_VkNxccNidap-5-uffXUW475pAuPy&confirm=t
To: /content/fake-news.zip
100% 48.7M/48.7M [00:00<00:00, 74.6MB/s]

Descompactando os arquivos:

$ unzip fake-news.zip

Três arquivos aparecerão no diretório de trabalho atual: train.csv, test.csv, e submit.csv, que usaremos train.csvna maior parte do tutorial.

Instalando as dependências necessárias:

$ pip install transformers nltk pandas numpy matplotlib seaborn wordcloud

Nota: Se você estiver em um ambiente local, certifique-se de instalar o PyTorch para GPU, vá para esta página para uma instalação adequada.

Vamos importar as bibliotecas essenciais para análise:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Os corpora e módulos NLTK devem ser instalados usando o downloader NLTK padrão:

import nltk
nltk.download('stopwords')
nltk.download('wordnet')

O conjunto de dados de notícias falsas inclui títulos e textos de artigos originais e fictícios de vários autores. Vamos importar nosso conjunto de dados:

# load the dataset
news_d = pd.read_csv("train.csv")
print("Shape of News data:", news_d.shape)
print("News data columns", news_d.columns)

Saída:

 Shape of News data: (20800, 5)
 News data columns Index(['id', 'title', 'author', 'text', 'label'], dtype='object')

Veja como fica o conjunto de dados:

# by using df.head(), we can immediately familiarize ourselves with the dataset. 
news_d.head()

Saída:

id	title	author	text	label
0	0	House Dem Aide: We Didn’t Even See Comey’s Let...	Darrell Lucus	House Dem Aide: We Didn’t Even See Comey’s Let...	1
1	1	FLYNN: Hillary Clinton, Big Woman on Campus - ...	Daniel J. Flynn	Ever get the feeling your life circles the rou...	0
2	2	Why the Truth Might Get You Fired	Consortiumnews.com	Why the Truth Might Get You Fired October 29, ...	1
3	3	15 Civilians Killed In Single US Airstrike Hav...	Jessica Purkiss	Videos 15 Civilians Killed In Single US Airstr...	1
4	4	Iranian woman jailed for fictional unpublished...	Howard Portnoy	Print \nAn Iranian woman has been sentenced to...	1

Temos 20.800 linhas, que têm cinco colunas. Vamos ver algumas estatísticas da textcoluna:

#Text Word startistics: min.mean, max and interquartile range

txt_length = news_d.text.str.split().str.len()
txt_length.describe()

Saída:

count    20761.000000
mean       760.308126
std        869.525988
min          0.000000
25%        269.000000
50%        556.000000
75%       1052.000000
max      24234.000000
Name: text, dtype: float64

Estatísticas da titlecoluna:

#Title statistics 

title_length = news_d.title.str.split().str.len()
title_length.describe()

Saída:

count    20242.000000
mean        12.420709
std          4.098735
min          1.000000
25%         10.000000
50%         13.000000
75%         15.000000
max         72.000000
Name: title, dtype: float64

As estatísticas para os conjuntos de treinamento e teste são as seguintes:

  • O textatributo possui maior contagem de palavras com média de 760 palavras e 75% com mais de 1000 palavras.
  • O titleatributo é uma declaração curta com uma média de 12 palavras, sendo que 75% delas são em torno de 15 palavras.

Nosso experimento seria com texto e título juntos.

Distribuição de aulas

Contando parcelas para ambos os rótulos:

sns.countplot(x="label", data=news_d);
print("1: Unreliable")
print("0: Reliable")
print("Distribution of labels:")
print(news_d.label.value_counts());

Saída:

1: Unreliable
0: Reliable
Distribution of labels:
1    10413
0    10387
Name: label, dtype: int64

Distribuição de rótulos

print(round(news_d.label.value_counts(normalize=True),2)*100);

Saída:

1    50.0
0    50.0
Name: label, dtype: float64

O número de artigos não confiáveis ​​(falsos ou 1) é 10.413, enquanto o número de artigos confiáveis ​​(confiáveis ​​ou 0) é 10.387. Quase 50% dos artigos são falsos. Portanto, a métrica de precisão medirá o desempenho do nosso modelo ao construir um classificador.

Limpeza de dados para análise

Nesta seção, vamos limpar nosso conjunto de dados para fazer algumas análises:

  • Elimine linhas e colunas não utilizadas.
  • Execute a imputação de valor nulo.
  • Remova os caracteres especiais.
  • Remova palavras de parada.
# Constants that are used to sanitize the datasets 

column_n = ['id', 'title', 'author', 'text', 'label']
remove_c = ['id','author']
categorical_features = []
target_col = ['label']
text_f = ['title', 'text']
# Clean Datasets
import nltk
from nltk.corpus import stopwords
import re
from nltk.stem.porter import PorterStemmer
from collections import Counter

ps = PorterStemmer()
wnl = nltk.stem.WordNetLemmatizer()

stop_words = stopwords.words('english')
stopwords_dict = Counter(stop_words)

# Removed unused clumns
def remove_unused_c(df,column_n=remove_c):
    df = df.drop(column_n,axis=1)
    return df

# Impute null values with None
def null_process(feature_df):
    for col in text_f:
        feature_df.loc[feature_df[col].isnull(), col] = "None"
    return feature_df

def clean_dataset(df):
    # remove unused column
    df = remove_unused_c(df)
    #impute null values
    df = null_process(df)
    return df

# Cleaning text from unused characters
def clean_text(text):
    text = str(text).replace(r'http[\w:/\.]+', ' ')  # removing urls
    text = str(text).replace(r'[^\.\w\s]', ' ')  # remove everything but characters and punctuation
    text = str(text).replace('[^a-zA-Z]', ' ')
    text = str(text).replace(r'\s\s+', ' ')
    text = text.lower().strip()
    #text = ' '.join(text)    
    return text

## Nltk Preprocessing include:
# Stop words, Stemming and Lemmetization
# For our project we use only Stop word removal
def nltk_preprocess(text):
    text = clean_text(text)
    wordlist = re.sub(r'[^\w\s]', '', text).split()
    #text = ' '.join([word for word in wordlist if word not in stopwords_dict])
    #text = [ps.stem(word) for word in wordlist if not word in stopwords_dict]
    text = ' '.join([wnl.lemmatize(word) for word in wordlist if word not in stopwords_dict])
    return  text

No bloco de código acima:

  • Importamos o NLTK, que é uma famosa plataforma de desenvolvimento de aplicativos Python que interagem com a linguagem humana. Em seguida, importamos repara regex.
  • Importamos palavras irrelevantes de nltk.corpus. Ao trabalhar com palavras, principalmente ao considerar a semântica, às vezes precisamos eliminar palavras comuns que não adicionam nenhum significado significativo a uma declaração, como "but", "can", "we", etc.
  • PorterStemmeré usado para executar palavras derivadas com NLTK. Stemmers retiram palavras de seus afixos morfológicos, deixando apenas o radical da palavra.
  • Importamos WordNetLemmatizer()da biblioteca NLTK para lematização. A lematização é muito mais eficaz do que a derivação . Ele vai além da redução de palavras e avalia todo o léxico de uma língua para aplicar a análise morfológica às palavras, com o objetivo de apenas remover as extremidades flexionais e retornar a forma base ou dicionário de uma palavra, conhecida como lema.
  • stopwords.words('english')permite-nos ver a lista de todas as palavras de parada em inglês suportadas pelo NLTK.
  • remove_unused_c()A função é usada para remover as colunas não utilizadas.
  • Nós imputamos valores nulos Noneusando a null_process()função.
  • Dentro da função clean_dataset(), chamamos remove_unused_c()e null_process()funções. Esta função é responsável pela limpeza dos dados.
  • Para limpar o texto de caracteres não utilizados, criamos a clean_text()função.
  • Para pré-processamento, usaremos apenas a remoção de palavras de parada. Criamos a nltk_preprocess()função para isso.

Pré-processando o texte title:

# Perform data cleaning on train and test dataset by calling clean_dataset function
df = clean_dataset(news_d)
# apply preprocessing on text through apply method by calling the function nltk_preprocess
df["text"] = df.text.apply(nltk_preprocess)
# apply preprocessing on title through apply method by calling the function nltk_preprocess
df["title"] = df.title.apply(nltk_preprocess)
# Dataset after cleaning and preprocessing step
df.head()

Saída:

title	text	label
0	house dem aide didnt even see comeys letter ja...	house dem aide didnt even see comeys letter ja...	1
1	flynn hillary clinton big woman campus breitbart	ever get feeling life circle roundabout rather...	0
2	truth might get fired	truth might get fired october 29 2016 tension ...	1
3	15 civilian killed single u airstrike identified	video 15 civilian killed single u airstrike id...	1
4	iranian woman jailed fictional unpublished sto...	print iranian woman sentenced six year prison ...	1

Análise Explorativa de Dados

Nesta seção, vamos realizar:

  • Análise Univariada : É uma análise estatística do texto. Usaremos a nuvem de palavras para esse propósito. Uma nuvem de palavras é uma abordagem de visualização de dados de texto em que o termo mais comum é apresentado no tamanho de fonte mais considerável.
  • Análise Bivariada : Bigrama e Trigrama serão usados ​​aqui. Segundo a Wikipedia: " um n-grama é uma sequência contígua de n itens de uma determinada amostra de texto ou fala. De acordo com a aplicação, os itens podem ser fonemas, sílabas, letras, palavras ou pares de bases. Os n-gramas são normalmente coletados de um texto ou corpus de fala".

Nuvem de palavra única

As palavras mais frequentes aparecem em negrito e fonte maior em uma nuvem de palavras. Esta seção realizará uma nuvem de palavras para todas as palavras no conjunto de dados.

A função da biblioteca WordCloudwordcloud() será usada, e o generate()é utilizado para gerar a imagem da nuvem de palavras:

from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

# initialize the word cloud
wordcloud = WordCloud( background_color='black', width=800, height=600)
# generate the word cloud by passing the corpus
text_cloud = wordcloud.generate(' '.join(df['text']))
# plotting the word cloud
plt.figure(figsize=(20,30))
plt.imshow(text_cloud)
plt.axis('off')
plt.show()

Saída:

WordCloud para todos os dados de notícias falsas

Nuvem de palavras apenas para notícias confiáveis:

true_n = ' '.join(df[df['label']==0]['text']) 
wc = wordcloud.generate(true_n)
plt.figure(figsize=(20,30))
plt.imshow(wc)
plt.axis('off')
plt.show()

Saída:

Word Cloud para notícias confiáveis

Nuvem de palavras apenas para notícias falsas:

fake_n = ' '.join(df[df['label']==1]['text'])
wc= wordcloud.generate(fake_n)
plt.figure(figsize=(20,30))
plt.imshow(wc)
plt.axis('off')
plt.show()

Saída:

Nuvem de palavras para notícias falsas

Bigrama mais frequente (combinação de duas palavras)

Um N-gram é uma sequência de letras ou palavras. Um unigrama de caractere é composto por um único caractere, enquanto um bigrama compreende uma série de dois caracteres. Da mesma forma, os N-gramas de palavras são compostos de uma série de n palavras. A palavra "unidos" é um 1 grama (unigrama). A combinação das palavras "estado unido" é um 2 gramas (bigrama), "nova york cidade" é um 3 gramas.

Vamos traçar o bigrama mais comum nas notícias confiáveis:

def plot_top_ngrams(corpus, title, ylabel, xlabel="Number of Occurences", n=2):
  """Utility function to plot top n-grams"""
  true_b = (pd.Series(nltk.ngrams(corpus.split(), n)).value_counts())[:20]
  true_b.sort_values().plot.barh(color='blue', width=.9, figsize=(12, 8))
  plt.title(title)
  plt.ylabel(ylabel)
  plt.xlabel(xlabel)
  plt.show()
plot_top_ngrams(true_n, 'Top 20 Frequently Occuring True news Bigrams', "Bigram", n=2)

Principais bigramas em notícias falsas

O bigrama mais comum nas notícias falsas:

plot_top_ngrams(fake_n, 'Top 20 Frequently Occuring Fake news Bigrams', "Bigram", n=2)

Principais bigramas em notícias falsas

Trigrama mais frequente (combinação de três palavras)

O trigrama mais comum em notícias confiáveis:

plot_top_ngrams(true_n, 'Top 20 Frequently Occuring True news Trigrams', "Trigrams", n=3)

O trigrama mais comum em notícias falsas

Para notícias falsas agora:

plot_top_ngrams(fake_n, 'Top 20 Frequently Occuring Fake news Trigrams', "Trigrams", n=3)

Trigramas mais comuns em Fake news

Os gráficos acima nos dão algumas ideias de como as duas classes se parecem. Na próxima seção, usaremos a biblioteca de transformadores para construir um detector de notícias falsas.

Construindo um classificador ajustando o BERT

Esta seção irá pegar o código extensivamente do tutorial BERT de ajuste fino para fazer um classificador de notícias falsas usando a biblioteca de transformadores. Portanto, para obter informações mais detalhadas, você pode acessar o tutorial original .

Se você não instalou transformadores, você deve:

$ pip install transformers

Vamos importar as bibliotecas necessárias:

import torch
from transformers.file_utils import is_tf_available, is_torch_available, is_torch_tpu_available
from transformers import BertTokenizerFast, BertForSequenceClassification
from transformers import Trainer, TrainingArguments
import numpy as np
from sklearn.model_selection import train_test_split

import random

Queremos tornar nossos resultados reproduzíveis mesmo se reiniciarmos nosso ambiente:

def set_seed(seed: int):
    """
    Helper function for reproducible behavior to set the seed in ``random``, ``numpy``, ``torch`` and/or ``tf`` (if
    installed).

    Args:
        seed (:obj:`int`): The seed to set.
    """
    random.seed(seed)
    np.random.seed(seed)
    if is_torch_available():
        torch.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
        # ^^ safe to call this function even if cuda is not available
    if is_tf_available():
        import tensorflow as tf

        tf.random.set_seed(seed)

set_seed(1)

O modelo que vamos usar é o bert-base-uncased:

# the model we gonna train, base uncased BERT
# check text classification models here: https://huggingface.co/models?filter=text-classification
model_name = "bert-base-uncased"
# max sequence length for each document/sentence sample
max_length = 512

Carregando o tokenizador:

# load the tokenizer
tokenizer = BertTokenizerFast.from_pretrained(model_name, do_lower_case=True)

Preparação de dados

Vamos agora limpar os NaNvalores das colunas text, authore :title

news_df = news_d[news_d['text'].notna()]
news_df = news_df[news_df["author"].notna()]
news_df = news_df[news_df["title"].notna()]

Em seguida, criando uma função que recebe o conjunto de dados como um dataframe do Pandas e retorna as divisões de trem/validação de textos e rótulos como listas:

def prepare_data(df, test_size=0.2, include_title=True, include_author=True):
  texts = []
  labels = []
  for i in range(len(df)):
    text = df["text"].iloc[i]
    label = df["label"].iloc[i]
    if include_title:
      text = df["title"].iloc[i] + " - " + text
    if include_author:
      text = df["author"].iloc[i] + " : " + text
    if text and label in [0, 1]:
      texts.append(text)
      labels.append(label)
  return train_test_split(texts, labels, test_size=test_size)

train_texts, valid_texts, train_labels, valid_labels = prepare_data(news_df)

A função acima pega o conjunto de dados em um tipo de dataframe e os retorna como listas divididas em conjuntos de treinamento e validação. Definir include_titlepara Truesignifica que adicionamos a titlecoluna ao textque vamos usar para treinamento, definir include_authorpara Truesignifica que também adicionamos o authorao texto.

Vamos garantir que os rótulos e os textos tenham o mesmo comprimento:

print(len(train_texts), len(train_labels))
print(len(valid_texts), len(valid_labels))

Saída:

14628 14628
3657 3657

Tokenização do conjunto de dados

Vamos usar o tokenizer BERT para tokenizar nosso conjunto de dados:

# tokenize the dataset, truncate when passed `max_length`, 
# and pad with 0's when less than `max_length`
train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=max_length)
valid_encodings = tokenizer(valid_texts, truncation=True, padding=True, max_length=max_length)

Convertendo as codificações em um conjunto de dados PyTorch:

class NewsGroupsDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {k: torch.tensor(v[idx]) for k, v in self.encodings.items()}
        item["labels"] = torch.tensor([self.labels[idx]])
        return item

    def __len__(self):
        return len(self.labels)

# convert our tokenized data into a torch Dataset
train_dataset = NewsGroupsDataset(train_encodings, train_labels)
valid_dataset = NewsGroupsDataset(valid_encodings, valid_labels)

Carregando e Ajustando o Modelo

Usaremos BertForSequenceClassificationpara carregar nosso modelo de transformador BERT:

# load the model
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)

Definimos num_labelscomo 2, pois é uma classificação binária. A função abaixo é um retorno de chamada para calcular a precisão em cada etapa de validação:

from sklearn.metrics import accuracy_score

def compute_metrics(pred):
  labels = pred.label_ids
  preds = pred.predictions.argmax(-1)
  # calculate accuracy using sklearn's function
  acc = accuracy_score(labels, preds)
  return {
      'accuracy': acc,
  }

Vamos inicializar os parâmetros de treinamento:

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=1,              # total number of training epochs
    per_device_train_batch_size=10,  # batch size per device during training
    per_device_eval_batch_size=20,   # batch size for evaluation
    warmup_steps=100,                # number of warmup steps for learning rate scheduler
    logging_dir='./logs',            # directory for storing logs
    load_best_model_at_end=True,     # load the best model when finished training (default metric is loss)
    # but you can specify `metric_for_best_model` argument to change to accuracy or other metric
    logging_steps=200,               # log & save weights each logging_steps
    save_steps=200,
    evaluation_strategy="steps",     # evaluate each `logging_steps`
)

Eu configurei o per_device_train_batch_sizepara 10, mas você deve defini-lo o mais alto que sua GPU possa caber. Definindo o logging_stepse save_stepspara 200, o que significa que vamos realizar a avaliação e salvar os pesos do modelo em cada 200 etapas de treinamento.

Você pode verificar  esta página  para obter informações mais detalhadas sobre os parâmetros de treinamento disponíveis.

Vamos instanciar o treinador:

trainer = Trainer(
    model=model,                         # the instantiated Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=valid_dataset,          # evaluation dataset
    compute_metrics=compute_metrics,     # the callback that computes metrics of interest
)

Treinando o modelo:

# train the model
trainer.train()

O treinamento leva algumas horas para terminar, dependendo da sua GPU. Se você estiver na versão gratuita do Colab, deve levar uma hora com o NVIDIA Tesla K80. Aqui está a saída:

***** Running training *****
  Num examples = 14628
  Num Epochs = 1
  Instantaneous batch size per device = 10
  Total train batch size (w. parallel, distributed & accumulation) = 10
  Gradient Accumulation steps = 1
  Total optimization steps = 1463
 [1463/1463 41:07, Epoch 1/1]
Step	Training Loss	Validation Loss	Accuracy
200		0.250800		0.100533		0.983867
400		0.027600		0.043009		0.993437
600		0.023400		0.017812		0.997539
800		0.014900		0.030269		0.994258
1000	0.022400		0.012961		0.998086
1200	0.009800		0.010561		0.998633
1400	0.007700		0.010300		0.998633
***** Running Evaluation *****
  Num examples = 3657
  Batch size = 20
Saving model checkpoint to ./results/checkpoint-200
Configuration saved in ./results/checkpoint-200/config.json
Model weights saved in ./results/checkpoint-200/pytorch_model.bin
<SNIPPED>
***** Running Evaluation *****
  Num examples = 3657
  Batch size = 20
Saving model checkpoint to ./results/checkpoint-1400
Configuration saved in ./results/checkpoint-1400/config.json
Model weights saved in ./results/checkpoint-1400/pytorch_model.bin

Training completed. Do not forget to share your model on huggingface.co/models =)

Loading best model from ./results/checkpoint-1400 (score: 0.010299865156412125).
TrainOutput(global_step=1463, training_loss=0.04888018785440506, metrics={'train_runtime': 2469.1722, 'train_samples_per_second': 5.924, 'train_steps_per_second': 0.593, 'total_flos': 3848788517806080.0, 'train_loss': 0.04888018785440506, 'epoch': 1.0})

Avaliação do modelo

Como load_best_model_at_endestá definido como True, os melhores pesos serão carregados quando o treinamento for concluído. Vamos avaliá-lo com nosso conjunto de validação:

# evaluate the current model after training
trainer.evaluate()

Saída:

***** Running Evaluation *****
  Num examples = 3657
  Batch size = 20
 [183/183 02:11]
{'epoch': 1.0,
 'eval_accuracy': 0.998632759092152,
 'eval_loss': 0.010299865156412125,
 'eval_runtime': 132.0374,
 'eval_samples_per_second': 27.697,
 'eval_steps_per_second': 1.386}

Salvando o modelo e o tokenizer:

# saving the fine tuned model & tokenizer
model_path = "fake-news-bert-base-uncased"
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

Uma nova pasta contendo a configuração do modelo e pesos aparecerá após a execução da célula acima. Se você deseja realizar a previsão, basta usar o from_pretrained()método que usamos quando carregamos o modelo e pronto.

Em seguida, vamos fazer uma função que aceite o texto do artigo como argumento e retorne se é falso ou não:

def get_prediction(text, convert_to_label=False):
    # prepare our text into tokenized sequence
    inputs = tokenizer(text, padding=True, truncation=True, max_length=max_length, return_tensors="pt").to("cuda")
    # perform inference to our model
    outputs = model(**inputs)
    # get output probabilities by doing softmax
    probs = outputs[0].softmax(1)
    # executing argmax function to get the candidate label
    d = {
        0: "reliable",
        1: "fake"
    }
    if convert_to_label:
      return d[int(probs.argmax())]
    else:
      return int(probs.argmax())

Peguei um exemplo de test.csvque o modelo nunca viu fazer inferência, eu verifiquei, e é um artigo real do The New York Times:

real_news = """
Tim Tebow Will Attempt Another Comeback, This Time in Baseball - The New York Times",Daniel Victor,"If at first you don’t succeed, try a different sport. Tim Tebow, who was a Heisman   quarterback at the University of Florida but was unable to hold an N. F. L. job, is pursuing a career in Major League Baseball. <SNIPPED>
"""

O texto original está no ambiente Colab caso queira copiá-lo, pois é um artigo completo. Vamos passar para o modelo e ver os resultados:

get_prediction(real_news, convert_to_label=True)

Saída:

reliable

Apêndice: Criando um arquivo de envio para o Kaggle

Nesta seção, vamos prever todos os artigos test.csvpara criar um arquivo de submissão para ver nossa precisão no teste definido na competição Kaggle :

# read the test set
test_df = pd.read_csv("test.csv")
# make a copy of the testing set
new_df = test_df.copy()
# add a new column that contains the author, title and article content
new_df["new_text"] = new_df["author"].astype(str) + " : " + new_df["title"].astype(str) + " - " + new_df["text"].astype(str)
# get the prediction of all the test set
new_df["label"] = new_df["new_text"].apply(get_prediction)
# make the submission file
final_df = new_df[["id", "label"]]
final_df.to_csv("submit_final.csv", index=False)

Depois de concatenar o autor, título e texto do artigo juntos, passamos a get_prediction()função para a nova coluna para preencher a labelcoluna, então usamos to_csv()o método para criar o arquivo de submissão para o Kaggle. Aqui está a minha pontuação de submissão:

Pontuação de envio

Obtivemos 99,78% e 100% de precisão nas tabelas de classificação privadas e públicas. Fantástico!

Conclusão

Pronto, terminamos o tutorial. Você pode verificar esta página para ver vários parâmetros de treinamento que você pode ajustar.

Se você tiver um conjunto de dados de notícias falsas personalizado para ajuste fino, basta passar uma lista de amostras para o tokenizer como fizemos, você não alterará nenhum outro código depois disso.

Confira o código completo aqui , ou o ambiente Colab aqui .