Elton  Bogan

Elton Bogan

1602896400

Multi-domain text classification via Labelur

Labelur allows you to classify multi-domain text via REST API. Labelur uses a modern zero-shot learning technique for multi-domain text classification without needing to train a custom model.

What is Labelur?

Labelur is an online service that performs multi-domain text classification. It’s effortless to get started! Simply use any programming language of your choice, such as Python, Java, Node, or PHP (who’s still using PHP?) to send a POST request to Labelur’s API server with your text data. Labelur will reply with the label for the given text.

How Labelur is different from other text classification

Most text classification services only work on domain-specific data such as sentiment analysis, legal document classification, or general text classification for a generic topic. If you want to make a classifier for a specific domain, such as real estate, you need to have a training data and create a classifier. Traditionally, if you don’t have data to train a custom model, you cannot produce a good text classifier.

Labelur uses modern advancement in NLP to classify text in most general domains. It uses knowledge distillation and zero-shot learning to classify texts. This is an excellent starting point if you don’t have your own data or don’t want to train and deploy your own ML models.

#ai #machine-learning #python #nlp #api

What is GEEK

Buddha Community

Multi-domain text classification via Labelur

Navigating Between DOM Nodes in JavaScript

In the previous chapters you've learnt how to select individual elements on a web page. But there are many occasions where you need to access a child, parent or ancestor element. See the JavaScript DOM nodes chapter to understand the logical relationships between the nodes in a DOM tree.

DOM node provides several properties and methods that allow you to navigate or traverse through the tree structure of the DOM and make changes very easily. In the following section we will learn how to navigate up, down, and sideways in the DOM tree using JavaScript.

Accessing the Child Nodes

You can use the firstChild and lastChild properties of the DOM node to access the first and last direct child node of a node, respectively. If the node doesn't have any child element, it returns null.

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");
console.log(main.firstChild.nodeName); // Prints: #text

var hint = document.getElementById("hint");
console.log(hint.firstChild.nodeName); // Prints: SPAN
</script>

Note: The nodeName is a read-only property that returns the name of the current node as a string. For example, it returns the tag name for element node, #text for text node, #comment for comment node, #document for document node, and so on.

If you notice the above example, the nodeName of the first-child node of the main DIV element returns #text instead of H1. Because, whitespace such as spaces, tabs, newlines, etc. are valid characters and they form #text nodes and become a part of the DOM tree. Therefore, since the <div> tag contains a newline before the <h1> tag, so it will create a #text node.

To avoid the issue with firstChild and lastChild returning #text or #comment nodes, you could alternatively use the firstElementChild and lastElementChild properties to return only the first and last element node, respectively. But, it will not work in IE 9 and earlier.

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");
alert(main.firstElementChild.nodeName); // Outputs: H1
main.firstElementChild.style.color = "red";

var hint = document.getElementById("hint");
alert(hint.firstElementChild.nodeName); // Outputs: SPAN
hint.firstElementChild.style.color = "blue";
</script>

Similarly, you can use the childNodes property to access all child nodes of a given element, where the first child node is assigned index 0. Here's an example:

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");

// First check that the element has child nodes 
if(main.hasChildNodes()) {
    var nodes = main.childNodes;
    
    // Loop through node list and display node name
    for(var i = 0; i < nodes.length; i++) {
        alert(nodes[i].nodeName);
    }
}
</script>

The childNodes returns all child nodes, including non-element nodes like text and comment nodes. To get a collection of only elements, use children property instead.

Example

<div id="main">
    <h1 id="title">My Heading</h1>
    <p id="hint"><span>This is some text.</span></p>
</div>

<script>
var main = document.getElementById("main");

// First check that the element has child nodes 
if(main.hasChildNodes()) {
    var nodes = main.children;
    
    // Loop through node list and display node name
    for(var i = 0; i < nodes.length; i++) {
        alert(nodes[i].nodeName);
    }
}
</script>

#javascript 

Adaline  Kulas

Adaline Kulas

1594162500

Multi-cloud Spending: 8 Tips To Lower Cost

A multi-cloud approach is nothing but leveraging two or more cloud platforms for meeting the various business requirements of an enterprise. The multi-cloud IT environment incorporates different clouds from multiple vendors and negates the dependence on a single public cloud service provider. Thus enterprises can choose specific services from multiple public clouds and reap the benefits of each.

Given its affordability and agility, most enterprises opt for a multi-cloud approach in cloud computing now. A 2018 survey on the public cloud services market points out that 81% of the respondents use services from two or more providers. Subsequently, the cloud computing services market has reported incredible growth in recent times. The worldwide public cloud services market is all set to reach $500 billion in the next four years, according to IDC.

By choosing multi-cloud solutions strategically, enterprises can optimize the benefits of cloud computing and aim for some key competitive advantages. They can avoid the lengthy and cumbersome processes involved in buying, installing and testing high-priced systems. The IaaS and PaaS solutions have become a windfall for the enterprise’s budget as it does not incur huge up-front capital expenditure.

However, cost optimization is still a challenge while facilitating a multi-cloud environment and a large number of enterprises end up overpaying with or without realizing it. The below-mentioned tips would help you ensure the money is spent wisely on cloud computing services.

  • Deactivate underused or unattached resources

Most organizations tend to get wrong with simple things which turn out to be the root cause for needless spending and resource wastage. The first step to cost optimization in your cloud strategy is to identify underutilized resources that you have been paying for.

Enterprises often continue to pay for resources that have been purchased earlier but are no longer useful. Identifying such unused and unattached resources and deactivating it on a regular basis brings you one step closer to cost optimization. If needed, you can deploy automated cloud management tools that are largely helpful in providing the analytics needed to optimize the cloud spending and cut costs on an ongoing basis.

  • Figure out idle instances

Another key cost optimization strategy is to identify the idle computing instances and consolidate them into fewer instances. An idle computing instance may require a CPU utilization level of 1-5%, but you may be billed by the service provider for 100% for the same instance.

Every enterprise will have such non-production instances that constitute unnecessary storage space and lead to overpaying. Re-evaluating your resource allocations regularly and removing unnecessary storage may help you save money significantly. Resource allocation is not only a matter of CPU and memory but also it is linked to the storage, network, and various other factors.

  • Deploy monitoring mechanisms

The key to efficient cost reduction in cloud computing technology lies in proactive monitoring. A comprehensive view of the cloud usage helps enterprises to monitor and minimize unnecessary spending. You can make use of various mechanisms for monitoring computing demand.

For instance, you can use a heatmap to understand the highs and lows in computing visually. This heat map indicates the start and stop times which in turn lead to reduced costs. You can also deploy automated tools that help organizations to schedule instances to start and stop. By following a heatmap, you can understand whether it is safe to shut down servers on holidays or weekends.

#cloud computing services #all #hybrid cloud #cloud #multi-cloud strategy #cloud spend #multi-cloud spending #multi cloud adoption #why multi cloud #multi cloud trends #multi cloud companies #multi cloud research #multi cloud market

Daron  Moore

Daron Moore

1598404620

Hands-on Guide to Pattern - A Python Tool for Effective Text Processing and Data Mining

Text Processing mainly requires Natural Language Processing( NLP), which is processing the data in a useful way so that the machine can understand the Human Language with the help of an application or product. Using NLP we can derive some information from the textual data such as sentiment, polarity, etc. which are useful in creating text processing based applications.

Python provides different open-source libraries or modules which are built on top of NLTK and helps in text processing using NLP functions. Different libraries have different functionalities that are used on data to gain meaningful results. One such Library is Pattern.

Pattern is an open-source python library and performs different NLP tasks. It is mostly used for text processing due to various functionalities it provides. Other than text processing Pattern is used for Data Mining i.e we can extract data from various sources such as Twitter, Google, etc. using the data mining functions provided by Pattern.

In this article, we will try and cover the following points:

  • NLP Functionalities of Pattern
  • Data Mining Using Pattern

#developers corner #data mining #text analysis #text analytics #text classification #text dataset #text-based algorithm

Ramya M

Ramya M

1608022599

Top 10 Multi vendor Marketplace Platform Providers 2022

Great evolution has happened in the buying and selling process due to the advent of ecommerce. There is exponential growth in the field of online business and selling and buying happens at the doorstep. The multi seller ecommerce platform has become the next level in the ecommerce niche.

The ecommerce marketplace platforms like Amazon, Flipkart, and eBay have already succeeded in the industry and have set a milestone on sales and revenue.

This fact has inspired many aspiring entrepreneurs and has made them transfer their brick and mortar stores to multi vendor platform.

What is Online Multi vendor Marketplace?

Multi vendor marketplace platform is connect a multiple sellere or vendors to display and sell their products through the platform by agreeing with the terms mentioned by the admin of the platform. They can have their way of promoting their products.

Here is a list of Top 10 Best Turnkey Multi vendor Marketplace Platforms:

Now that you have a better understanding of the key features for Multi vendor marketplace, let’s compare ten of the top Multi vendor providers.

1. Zielcommerce – All in One Multi vendor eCommerce Marketplace Platform

Visit Website

This is image title
Zielcommerce is a white label enabled enterprise grade online marketplace software. The multi vendor platform comes with a one time payment option and it is completely customizable and also scalable. 

Platform Highlights

  • The feature-rich UI and UX have never missed attracting the users towards the multi vendor ecommerce.
  • The platform is user-friendly and it is a perfect device compatible.
  • It’s a convenient marketplace solution that provides all round service required for a perfect multi vendor marketplace platform.
  • This also supports easy brand recognition and you can get more visitors to your ecommerce platform.

Zielcommerce provides its users with a secured environment through its SSL certified marketplace software and gains the trust of the users. You can be easily promoted online with this SEO-optimized platform. Stay connected with your customers all the time with the in-build communication channels.

The pleasing features of this multi vendor marketplacce solution

  • Multilingual and multiple currency support
  • Multiple payment options to facilitate buyers
  • Real-time tracking feature to track orders online
  • Wide delivery option for buyers’ convenience
  • A dedicated mobile application that will suit your business demands
  • Review and rating system to enhance the performance of the platform
  • 24/7 technical support from our end.

Best Use Cases

Client’s Rating

  • Ease of use: 4.5/5
  • Customer service – 4.7/5
  • Overall: 4.6/5

Explore Zielcommerce Multi vendor Ecommerce Platform

2. X-cart – a well-known multi vendor marketplace platform solution

This is image title
X-cart is a standalone online marketplace solution for your online business needs. You can get the complete comprehensive features within this multi vendor marketplace software that can meet the customers’ expectations. A genuine approach is maintained and the users trust X-cart for its outstanding functionalities that satisfy the multi vendor market demands.

Platform Highlights

  • Xcart stands alone in the market by providing best multi vendor marketplace solutions that will facilitate users to increase their online credibility.
  • If you prefer to build a multi vendor marketplace platform like amazon or eBay then Xcart is the perfect choice.

It has gained the trust of thousands of users and people who use Xcart as their online multi vendor marketplace software has given the best review about the product.

The salient features of this multi vendor ecommerce platform solution

  • One-time payment to purchase the marketplace software
  • In-built marketing and promotion tools to promote the multi vendor marketplace software
  • Trusted payment gateways integrated with the website
  • Reliable order management system.
  • On-time delivery management

Best Use Cases

  • Hyperlocal Ecommerce Platform
  • Jewellery e-commerce Store
  • Electronics Ecommerce Platform
  • Furniture Ecommerce Platform

Client’s Rating:

  • Ease of use : 3.5/5
  • Customer service – 3.4/5
  • Overall : 3.5/5

Explore Xcart Multi vendor Marketplace Software

3. Cs cart – a perfect multi vendor marketplace solution

This is image title

CS-Cart has never disappointed its users and it comes with the complete ecommerce marketplace solution for all your business demands. You can gain perfect control over the online multi vendor marketplace platform and can personalize the platform to suit your business needs. You will get higher visibility and can easily attract your target audience with the CS-Cart marketplace solution.

Platform Highlights

  • CS-cart is the most reliable best ecommerce marketplace platform that gives a user-friendly platform for the user.
  • The interface is easily understandable and no technical knowledge is needed to maintain the multi vendor marketplace software.

You can gain the attention of global audiences through its multilingual support and can take your brand all over the world and build a strong branding with the help of CS cart.

The key features of this multi vendor ecommerce website solution

  • SEO- friendly platform that will help you to get top ranking in all search engines.
  • Mobile –friendly and will get you more mobile users as your customers
  • Get genuine customer care support
  • Well-integrated with all third-party software.
  • The online multi vendor ecommerce platform will have social media logins

Best use cases

Client’s Rating :

  • Ease of use : 4.1/5
  • Customer service – 4.3/5
  • Overall :4.2/5

[Explore Cscart Online Marketplace Software](https://www.cs-cart.com “Explore Cscart Online Marketplace Software”)

4. Arcadier – Superlative Multi vendor marketplace platform

This is image title

Arcadier is the SaaS (Software-as-a-Service) provider that allows businesses, SMEs, local communities, government agencies and entrepreneurs to manage their online multi vendor marketplace platform more efficiently and affectionately. Arcadier has many attractive features that can grab the attention of vendors.

Platform Highlights

  • When you have multiple vendors only then you can call your platform as a best ecommerce marketplace platform.
  • This is quite easy when you go for Arcadier.
  • The genuine support that you get with this online multi vendor enterprise marketplace platform will retain your vendors and also your buyers and provides better multi vendor marketplace software to your business needs.

Apart from other SaaS online marketplace platforms on the market that offer a temporary solution for all purposes, Arcadier allows users to choose between multiple options in buying and selling products or services to rental spaces and other business models.

**The Prominent features of this online multi vendor marketplace software solution

  • The seller can manage to add variations to each listing, and also placing images and surcharges on each and every variant.
  • User-friendly platform to get the top ranking in all search engines.
  • Manually configure specific dates and hours of your ads on the calendar.
  • A Complete customer support assistance

Best Use Case

  • Clothing and accessories multi vendor ecommerce platform
  • Handicrafts Online marketplace platform
  • Hair stylist scheduling software
  • On demand movies online marketplace platform

**Client’s Rating: **

  • Ease of use : 3.2/5
  • Customer service – 3/5
  • Overall : 3.1/5

Explore Arcadier Multi vendor Marketplace Platform

5. Bigcommerce - Exquisite Multi Vendor Marketplace Software

Multi vendor Marketplace that converts your single admin online store into Multi vendor Marketplace. It provides of adding vendors and maintain the track record of their order and sales. Apart from vendor features, Bigcommerce gives best buyer features that will impress buyers and make them decide on buying products in your online multi vendor ecommerce platform.

Platform Highlights

  • Offers and discount features are available that will delight buyers and will make them refer more customers to your best ecommerce marketplace platform.
  • You can also easily retain your customers by keeping them about new arrivals and offers.
  • As a store admin, you have background access and control and manage the products, orders, vendors and their products.

It comes with an option which, without the approval of the vendor admin the product would not be visible in the forefront. This online multi vendor marketplace platform is excellent features and creating various plans for vendors, a payment management system for vendors.

Impressing features of this online multi vendor marketplace software solution

  • Flexible Functionality Product approval
  • An admin can have full access to the seller’s profile, products, manage
  • Synchronizing products and orders from the “Bigcommerce store” to the "market.
  • Without any issue, the admin can create a “Payment” for the seller, once a product is out of stock.

Best Use Case

  • best online salon scheduling software for salon owners
  • Educational books online marketplace platform
  • Food delivery multi seller ecommerce platform
  • Fashion and clothing ecommerce platform

Explore Bigcommerce Multi vendor Ecommerce Platform

6. IXXO - Ideal Multi Vendor marketplace software

This is image title
Ixxo is an ideal marketplace solution for those who want to open and manage a high-volume marketplace as IXXO online Multi Vendor ecommerce platform offers unlimited product and unlimited vendor capacity. The marketplace owners can configure vendor privileges purely based on vendors. this help the multi vendor marketplace software owner to provide the basic vendor features, where the vendors dont have much ecommerce experience and privileges.

Platform Highlights

This will ensure that the delivery is taking place in the right way. If there is any delay then through a proper messaging system the buyer will get intimation regarding the delay. This feature impresses the customer and makes the platform the best one.

Splendid features of this best ecommerce marketplace platform solution

  • Simple Checkout Process
  • A wide range of payment options
  • Responds promptly and friendlily.
  • Feature-rich provider dashboard.

Best Use Cases

**Client’s Rating: **

  • Ease of use : 4.1/5
  • Customer service – 4.3/5
  • Overall :4.2/5

Explore IXXO cart Online best marketplace softwarebest marketplace software

7. Sharetribe - Structured Multi vendor marketplace solutions

Sharetribe is one of the excellent SaaS platforms for building and launching a online multi vendor marketplace software. Easy setting changes to your color theme and photos, instantly.

Platform Highlights

  • It is merged with a integration process, the marketplace allows users to sell products or services online without any technical support.
  • Sharetribe has all in-built marketing tools that will easily promote your brand globally with less effort.

This online multi vendor marketplace platform gives a perfect shopping experience to customers and also satisfied selling experience to vendors. Users can trust sharetribe for their business requirement and can get a trustworthy marketplace solution that will leverage their business to greater levels.

Core features of this best ecommerce marketplace platform

  • It is a comprehensive tool for customizing your marketplace.
  • Responsive Design for users, optimized for every screen.
  • More conversions rate and decrease in bounce rate.
  • A comprehensive content management system to maintain an active market with visual content.

Best Use Cases

**Client’s Rating:**

  • Ease of use : 3.9/5
  • Customer service – 3.7/5
  • Overall :3.8/5

Explore Sharetribe Multi vendor Marketplace Software

8. Appdupe - Intuitive Multi vendor marketplace software

A online Multi vendor marketplace platform is an online marketplace where many sellers can sign up, create their profiles and add products and sell when they want. One of the best examples of multi vendor platforms right now is Amazon, and so on. Well, the ecommerce marketplace software has multiple benefits for its users and vendors.

Platform Highlights

  • Appdupe gives customers to share their reviews and give ratings for the product they have purchased.
  • Vendors can also read the reviews written by their customers and this will help them to enhance their online multi vendor marketplace platform in a better way.
  • Appdupe also supports multiple revenue models and users can select the one that perfectly suits their business and can get better returns with minimum investment.

Impressing feature of this best ecommerce marketplace platform

  • It provides a hassle-free process
  • Easily download and handle their products in a simple way.
  • Separate dashboard for seller and buyer data formatting may go all the way.
  • Analysis and enhanced the ROI

Best Use Cases

9. Miva - Outbreaking Multi vendor marketplace platform

Is a flexible multi seller ecommerce platform that can be easily modified as their business evolves with more conversions rate, better integrations, with complete solutions for all aspects of online sales, This online multi vendor marketplace software help them generate revenue and increasing the average order value and with less operating costs.

Platform Highlights

Miva suits to any business model and business size. This online multi vendor marketplace platform is very cost-effective and even a startup who plans to start an online store with minimum investment can easily go for Miva.

The online ecommerce marketplace software looks like it has been built from scratch. It inherits all essential features that are needed to run a multi vendor marketplace platform successfully. All you need is to buy the platform and launch the marketplace and can start earning instantly.

Intuitive feature of this best marketplace software

  • Admin can control and manage the review and approval of new products
  • Flexible commissions for every vendor sale based on subscription plans
  • Separate dashboard for a vendor to manage their own product listing
  • The separate seller has a unique profile on the marketplace and products limits based on membership plans

Best Use Cases

**Client’s Rating: **

  • Ease of use:4/5
  • Customer service – 3.7/5
  • Overall: 3.9/5

10. Quick eSelling – A Proven Multi vendor Marketpalce platform

Quick eSelling is a popular multi vendor online marketplace platform with upgrade features and a more comfortable platform for global merchants and seller to start their own online store. Quick eSelling is an online store feature for Customer Engagement and Retention. This platform has been designed to help you significantly increase your sales and save time.

Platform Highlights

  • Quick e-selling marketplace platform allows you to set commission plan for every individual vendor.
  • You can easily analyze their performance through proper analytics and reports.
  • You can boost the poor performing vendor by providing less commission percentage and boost their sale.

his will satisfy vendors and will make them stay with your best multi vendor marketplace software for a long time. You can get complete support from the technical team round the clock. Whenever customization needed the technical team will guide you in designing your own online multi seller ecommerce platform.

The essential feature of this Multi vendor marketplace software

  • SEO-friendly for ecommerce web development and essential to ensure high traffic
  • Vendors can check sales trends through graphs and data information inputs for building strategies
  • A secured platform for merchants and customers’ transitional communication.
  • It makes it easy for you to launch your online business effectively

Best Use Cases

  • Fashion and accessories marketplace platform
  • E-Book marketplace software
  • Food and beverages ecommerce platform
  • Jewellery online multi vendor ecommerce platform.

Client’s Rating:

  • Ease of use : 3.5/5
  • Customerservice – 3.3/5
  • Overall :3.4/5

11. Smartstore Z – a multifaceted marketplace platform

This is the most recommended online marketplace platform that holds thousands of active users. The advanced security feature supports users to store customers’ data in a secured way. The platform can stand against all malware attacks as it contains SSL certification.

Platform highlights

The extraordinary inventory management system will let sellers maintain their stock in an effective way. The order process will never be interrupted due to a shortage of stock. Proper notifications will be sent to respective sellers whenever their stock hits the minimum value.

This multi vendor software supports thousands of templates and plugins that can be used by users to customize their marketplace to meet their business demands. Being a user-friendly marketplace platform, no technical knowledge is needed to manage the platform. The dedicated dashboard will support the admin to know the exact working condition of their business.

Features of this multi vendor ecommerce marketplace platform.

  • The platform is perfectly scalable and supports future business expansion and can hold a huge customer database for a long time.
  • Buyers will be benefited as they can utilize multiple delivery options. They can select their convenient delivery slot and can order products.
  • Users can integrate their existing software that is used for their business operations with this multi vendor ecommerce platform as it supports any third-party API.
  • Sellers are allowed to have unlimited product listings that will help them to reach their customers easily.

 Best use cases

  • On-demand cab booking marketplace platform
  • Online ticket booking platform
  • Refurbished good selling platform
  • Hyperlocal multi vendor eCommerce platform

Client’s Rating: 

  • Ease of use: 3.8/5
  • Customer service – 3.4/5
  • Overall: 3.6/5

Smartstore Z is the most compatible multi vendor eCommerce store that will fit into any business model and business size.

5.     Brainview – a commendable online marketplace platform

Brainview has the most enchanting UI & UX that can gain the attention of your target audience and will make them buy products on your platform. Users can get globally connected as the marketplace supports multiple languages and multiple currencies. Advanced technologies are implemented just to provide a seamless shopping experience to customers.

Platform highlights

Excellent customer support is offered by this multi vendor platform and customers can have direct communication with the concerned seller and they can get more details about the product or service before they buy. This will reduce returns and refunds. Many customer-attracting features are available in this marketplace platform like loyalty programs, referral programs, and many more.

Acquiring more customers is not at all a challenge for this best multi vendor marketplace software. Several revenue streams like commission fee, subscription fee, affiliate modules, advertisements, and many more are integrated with this online marketplace platform.

Features of this multi vendor eCommerce platform

  • Supports a better authentication process that will avoid spam users entering the marketplace.
  • A dedicated mobile application is available and users of both Android and iOS can use and get a better user experience.
  • The platform offers multiple payment options that will facilitate customers to pay and buy online through secured payment gateways.
  • Customers can enjoy offers and discounts for all products they buy through this robust multi-vendor platform and this will motivate them to refer others to your marketplace.
  • Supports social media login and sharing as it benefits both sellers and buyers to login through social media credentials and share products in their social media pages.

Best use cases

  • Grocery eCommerce marketplace platform
  • Cosmetics and fashion accessories e-stores
  • Online movies download eCommerce platform
  • Hyperlocal multi-vendor eCommerce platform
  • Online jewelry marketplace platform

Client’s Rating: 

·       Ease of use: 4/5

·       Customer service – 4.2/5

·       Overall: 4.1/5

Brainview is the significant multi vendor eCommerce store that give 100% customization and scalability to users.

Types of Multi vendor Marketplace Platform

There are several types of multi vendor marketplace software in the market. One needs to understand all the types and should know to identify which type of marketplace platform suits his business well.

  • Vertical marketplace – this type of best marketplace software concentrates only on one particular service and you cannot find a wide range of services in these platforms. Etsy is a good example of a vertical marketplace where the platform sells handmade crafts alone.
  • Horizontal marketplace – this platform is opposite to a vertical marketplace where you can find several types of services under one roof. Amazon is a perfect example of this type of marketplace.
  • Product-based marketplace – you can find a wide range of products in this marketplace. The products can be physical goods or even digital goods. Amazon and Flipkart are the product-based marketplaces.
  • Service-based marketplace – service providers will list their services like plumbing, personal care, pest control, and many more. Upwork and Fiverr are service-based marketplaces.

Start a Ecommerce Business with the Best Marketplace Platforms Provider

The million-dollar question that has arisen in the minds of every budding entrepreneur is how to start a online multi vendor ecommerce platform. Full attention is needed while building a ecommerce marketplace software. It is not as simple as you think. Only through this multi vendor ecommerce platform, you are going to be recognized by the vendors and the buyers. This multi vendor platform is going to earn you money so it cannot have any flaws.

One way of building a online multi vendor marketplace software is to build it from scratch. First, you need to hire a reputed multi vendor ecommerce platform development company that has ample knowledge about this field. Then you need to explain to them about your requirements and expectations.

They will develop and will show you the demo. During the demo session, you can let them know your modifications and they will also clarify your doubts. At last, your multi seller ecommerce platform will be ready to launch and you can start promoting your multi vendor marketplace software.

The major fact to be noted is, when you build a online multi vendor ecommerce platform from scratch you need to wait for a long time and you need to spend more on the development. If you are okay with it then you can proceed. Else you have another option to go with.

Another option is buying ready made online multi vendor marketplace software that will have all the essential features that are required to run the platform successfully. The software will be tested and proved so there will not be any flaws. You can instantly launch the software after purchasing.

You can get an instant solution to building a multi vendor online marketplace software. This method is quite very cost-effective and it is highly advisable for the startups that are new to this field. You can also customize the software to suit your business needs.

Must have Features in a multi vendor marketplace Platform

The features that are built in the online multi vendor ecommerce platform will determine the user experience and will gain customer satisfaction. Now let us check out the comprehensive features that are too in a multi vendor ecommerce platform.

  • Easy customization – the buying behaviors of the customers keep changing so the multi vendor ecommerce website should keep changing over a while. So customization is a default expectation in any multi vendor ecommerce platform.
  • Advanced search and navigation tool – the best user-interface will provide easy navigation and will let the buyers find the product in a simple way.
  • Payment gateways – the online multi vendor ecommerce website should have multiple payment gateways integrated with it. This will provide more convenience to the buyers
  • Secured website – the online multi vendor ecommerce platform should have an SSL configuration and should provide a secured transaction to the buyers and the vendors.
  • Multi-lingual and multi-currencies support – for reaching a global audience the online multi vendor ecommerce website should support multiple languages and currencies.
  • Review and ratings – the buyers expect this feature to be in the online multi vendor ecommerce platform they purchase the product.
  • Simple checkout – a complicated checkout process will make the buyers abandon the site. You need to have hassle-free checkout procedures in your online multi vendor ecommerce platform.

Revenue generation channels on a multi vendor marketplace Platform:

The main objective of building a online multi vendor ecommerce platform is to earn profit and generate more sales. This will be the ultimate motive for any entrepreneur. We need to know what are the revenue sources that a multi vendor marketplace software provides to the admin of the platform.

  • Commission fee – this is a mutual agreement made between the admin and the vendor where the vendor agrees to pay a certain percentage of commission on all products he sells through the online best multi vendor marketplace software. The commission percentage can vary from vendor to vendor.
  • Subscription fee – the admin can set a subscription fee and can make the vendors subscribe with the online multi vendor marketplace software and become a paid member of the platform. The membership needs to be renewed over some time.
  • Listing fee – the vendors will be charged when they want their products to be listed on the marketplace solutions.
  • Advertisement fee – you can allot some space in your online ecommerce marketplace software for advertisement alone and can allow third-party to post their ads in the allotted space and you can charge them accordingly.\

How Products and Services are delivered in a multi vendor marketplace?

The multi vendor ecommerce platform will follow a hassle-free shipping and delivery process. This is where you can gain the maximum trust of your buyers and will also help you retain your customers effectively.

  • Once the order is placed by the buyer, the notification is sent to the concerned vendor from the multi vendor ecommerce platform.
  • The vendor will check the availability of the product and will arrange for shipping and delivery. In some cases, the admin of the online multi vendor ecommerce platform will take care of shipping and delivery.
  • The online multi vendor ecommerce platform will be integrated with shipping logistics and the logistic people will come and collect the product from the seller and will deliver it to the customers.
  • If a service is provided instead of a product then the service provider will get the notification and he will send his technical person to do the service to the customer place.

How it's Beneficial

Online buyers need a one-stop solution to fulfill all their demands. Without searching for several sites to buy several products, users can just visit one online store and find a variety of products and it is quite a time saving process. A single platform that contains several vendors and their products is called a multi vendor ecommerce platform.

With the evolution of technology, multi vendor marketplace platforms have taken a new dimension that can easily predict customers’ buying behavior. The key pillars involved in any multi vendor marketplace are the admin, vendors and buyers. The platform will act as a bridge to connect all of them and facilitate users to be benefited. Let us analyze the benefits of a multi vendor ecommerce platform in detail.

Conclusion

By all means, building best multi vendor marketplace software will surely benefit us and will let to get more returns. We should also accept the fact that there are many challenges that a online marketplace software will face at its early stage. But still the future of this industry is unbeatable and you can firmly set your mind in starting your own multi seller ecommerce platform.

Understanding the importance and the functioning of a online multi vendor marketplace software will help you to build a flawless multi vendor ecommerce software. When you build a multi vendor marketplace platform with utmost perfection then you can easily win the market and can gain your audience’s attention with less effort.

#multi seller ecommerce platform #multi vendor marketplace platform #multi vendor marketplace software #best multi vendor marketplace platform #multi vendor ecommerce platform #online multi vendor software

Как создать детектор фейковых новостей на Python

Обнаружение фейковых новостей в Python

Изучение набора данных фейковых новостей, выполнение анализа данных, таких как облака слов и энграммы, а также тонкая настройка преобразователя BERT для создания детектора фейковых новостей в Python с использованием библиотеки трансформаторов.

Фейковые новости — это преднамеренная трансляция ложных или вводящих в заблуждение заявлений в качестве новостей, где заявления намеренно лживы.

Газеты, таблоиды и журналы были вытеснены цифровыми новостными платформами, блогами, лентами социальных сетей и множеством мобильных новостных приложений. Новостные организации выиграли от более широкого использования социальных сетей и мобильных платформ, предоставляя подписчикам самую свежую информацию.

Потребители теперь имеют мгновенный доступ к последним новостям. Эти цифровые медиа-платформы приобрели известность благодаря своей легкой связи с остальным миром и позволяют пользователям обсуждать и делиться идеями и обсуждать такие темы, как демократия, образование, здравоохранение, исследования и история. Поддельные новости на цифровых платформах становятся все более популярными и используются для получения прибыли, например политической и финансовой выгоды.

Насколько велика эта проблема?

Поскольку Интернет, социальные сети и цифровые платформы широко используются, любой может распространять неточную и предвзятую информацию. Предотвратить распространение фейковых новостей практически невозможно. Наблюдается огромный всплеск распространения ложных новостей, который не ограничивается одним сектором, таким как политика, но включает спорт, здравоохранение, историю, развлечения, науку и исследования.

Решение

Очень важно распознавать и различать ложные и точные новости. Один из методов заключается в том, чтобы эксперт принимал решение и проверял каждую часть информации, но это требует времени и опыта, которым нельзя поделиться. Во-вторых, мы можем использовать машинное обучение и инструменты искусственного интеллекта для автоматизации выявления фейковых новостей.

Новостная онлайн-информация включает в себя различные данные в неструктурированном формате (такие как документы, видео и аудио), но здесь мы сосредоточимся на новостях в текстовом формате. С развитием машинного обучения и обработки естественного языка мы теперь можем распознавать вводящий в заблуждение и ложный характер статьи или заявления.

Проводится несколько исследований и экспериментов для обнаружения фейковых новостей во всех средах.

Наша основная цель этого урока:

  • Изучите и проанализируйте набор данных Fake News.
  • Создайте классификатор, который сможет различать фейковые новости с максимально возможной точностью.

Вот оглавление:

  • Введение
  • Насколько велика эта проблема?
  • Решение
  • Исследование данных
    • Распределение классов
  • Очистка данных для анализа
  • Исследовательский анализ данных
    • Облако одного слова
    • Самая частая биграмма (комбинация из двух слов)
    • Самая частая триграмма (комбинация из трех слов)
  • Создание классификатора путем тонкой настройки BERT
    • Подготовка данных
    • Токенизация набора данных
    • Загрузка и тонкая настройка модели
    • Оценка модели
  • Приложение: Создание файла отправки для Kaggle
  • Заключение

Исследование данных

В этой работе мы использовали набор данных о фальшивых новостях от Kaggle , чтобы классифицировать ненадежные новостные статьи как фальшивые новости. У нас есть полный набор обучающих данных, содержащий следующие характеристики:

  • id: уникальный идентификатор новостной статьи
  • title: название новостной статьи
  • author: автор новостной статьи
  • text: текст статьи; может быть неполным
  • label: метка, помечающая статью как потенциально ненадежную и обозначаемая цифрой 1 (ненадежная или поддельная) или 0 (надежная).

Это проблема бинарной классификации, в которой мы должны предсказать, является ли конкретная новость достоверной или нет.

Если у вас есть учетная запись Kaggle, вы можете просто загрузить набор данных с веб-сайта и извлечь ZIP-файл.

Я также загрузил набор данных в Google Drive, и вы можете получить его здесь , или использовать gdownбиблиотеку для автоматической загрузки в блокноты Google Colab или Jupyter:

$ pip install gdown
# download from Google Drive
$ gdown "https://drive.google.com/uc?id=178f_VkNxccNidap-5-uffXUW475pAuPy&confirm=t"
Downloading...
From: https://drive.google.com/uc?id=178f_VkNxccNidap-5-uffXUW475pAuPy&confirm=t
To: /content/fake-news.zip
100% 48.7M/48.7M [00:00<00:00, 74.6MB/s]

Распаковка файлов:

$ unzip fake-news.zip

В текущем рабочем каталоге появятся три файла: train.csv, test.csv, и submit.csv, которые мы будем использовать train.csvв большей части урока.

Установка необходимых зависимостей:

$ pip install transformers nltk pandas numpy matplotlib seaborn wordcloud

Примечание. Если вы находитесь в локальной среде, убедитесь, что вы установили PyTorch для GPU, перейдите на эту страницу для правильной установки.

Давайте импортируем необходимые библиотеки для анализа:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Корпуса и модули NLTK должны быть установлены с помощью стандартного загрузчика NLTK:

import nltk
nltk.download('stopwords')
nltk.download('wordnet')

Набор данных фейковых новостей включает в себя оригинальные и вымышленные заголовки и тексты статей разных авторов. Давайте импортируем наш набор данных:

# load the dataset
news_d = pd.read_csv("train.csv")
print("Shape of News data:", news_d.shape)
print("News data columns", news_d.columns)

Выход:

 Shape of News data: (20800, 5)
 News data columns Index(['id', 'title', 'author', 'text', 'label'], dtype='object')

Вот как выглядит набор данных:

# by using df.head(), we can immediately familiarize ourselves with the dataset. 
news_d.head()

Выход:

id	title	author	text	label
0	0	House Dem Aide: We Didn’t Even See Comey’s Let...	Darrell Lucus	House Dem Aide: We Didn’t Even See Comey’s Let...	1
1	1	FLYNN: Hillary Clinton, Big Woman on Campus - ...	Daniel J. Flynn	Ever get the feeling your life circles the rou...	0
2	2	Why the Truth Might Get You Fired	Consortiumnews.com	Why the Truth Might Get You Fired October 29, ...	1
3	3	15 Civilians Killed In Single US Airstrike Hav...	Jessica Purkiss	Videos 15 Civilians Killed In Single US Airstr...	1
4	4	Iranian woman jailed for fictional unpublished...	Howard Portnoy	Print \nAn Iranian woman has been sentenced to...	1

У нас есть 20 800 строк с пятью столбцами. Посмотрим немного статистики textстолбца:

#Text Word startistics: min.mean, max and interquartile range

txt_length = news_d.text.str.split().str.len()
txt_length.describe()

Выход:

count    20761.000000
mean       760.308126
std        869.525988
min          0.000000
25%        269.000000
50%        556.000000
75%       1052.000000
max      24234.000000
Name: text, dtype: float64

Статистика по titleколонке:

#Title statistics 

title_length = news_d.title.str.split().str.len()
title_length.describe()

Выход:

count    20242.000000
mean        12.420709
std          4.098735
min          1.000000
25%         10.000000
50%         13.000000
75%         15.000000
max         72.000000
Name: title, dtype: float64

Статистика для тренировочного и тестового наборов выглядит следующим образом:

  • Атрибут textимеет более высокое количество слов, в среднем 760 слов, а 75% имеют более 1000 слов.
  • Атрибут titleпредставляет собой короткое утверждение, в среднем состоящее из 12 слов, а 75% из них составляют около 15 слов.

Наш эксперимент будет с текстом и заголовком вместе.

Распределение классов

Графики подсчета для обеих меток:

sns.countplot(x="label", data=news_d);
print("1: Unreliable")
print("0: Reliable")
print("Distribution of labels:")
print(news_d.label.value_counts());

Выход:

1: Unreliable
0: Reliable
Distribution of labels:
1    10413
0    10387
Name: label, dtype: int64

Распространение этикеток

print(round(news_d.label.value_counts(normalize=True),2)*100);

Выход:

1    50.0
0    50.0
Name: label, dtype: float64

Количество ненадежных статей (фейк или 1) — 10413, а количество заслуживающих доверия статей (надежных или 0) — 10387. Почти 50% статей фейковые. Таким образом, метрика точности будет измерять, насколько хорошо работает наша модель при построении классификатора.

Очистка данных для анализа

В этом разделе мы очистим наш набор данных, чтобы провести некоторый анализ:

  • Удалите неиспользуемые строки и столбцы.
  • Выполните вменение нулевого значения.
  • Удалите специальные символы.
  • Удалить стоп-слова.
# Constants that are used to sanitize the datasets 

column_n = ['id', 'title', 'author', 'text', 'label']
remove_c = ['id','author']
categorical_features = []
target_col = ['label']
text_f = ['title', 'text']
# Clean Datasets
import nltk
from nltk.corpus import stopwords
import re
from nltk.stem.porter import PorterStemmer
from collections import Counter

ps = PorterStemmer()
wnl = nltk.stem.WordNetLemmatizer()

stop_words = stopwords.words('english')
stopwords_dict = Counter(stop_words)

# Removed unused clumns
def remove_unused_c(df,column_n=remove_c):
    df = df.drop(column_n,axis=1)
    return df

# Impute null values with None
def null_process(feature_df):
    for col in text_f:
        feature_df.loc[feature_df[col].isnull(), col] = "None"
    return feature_df

def clean_dataset(df):
    # remove unused column
    df = remove_unused_c(df)
    #impute null values
    df = null_process(df)
    return df

# Cleaning text from unused characters
def clean_text(text):
    text = str(text).replace(r'http[\w:/\.]+', ' ')  # removing urls
    text = str(text).replace(r'[^\.\w\s]', ' ')  # remove everything but characters and punctuation
    text = str(text).replace('[^a-zA-Z]', ' ')
    text = str(text).replace(r'\s\s+', ' ')
    text = text.lower().strip()
    #text = ' '.join(text)    
    return text

## Nltk Preprocessing include:
# Stop words, Stemming and Lemmetization
# For our project we use only Stop word removal
def nltk_preprocess(text):
    text = clean_text(text)
    wordlist = re.sub(r'[^\w\s]', '', text).split()
    #text = ' '.join([word for word in wordlist if word not in stopwords_dict])
    #text = [ps.stem(word) for word in wordlist if not word in stopwords_dict]
    text = ' '.join([wnl.lemmatize(word) for word in wordlist if word not in stopwords_dict])
    return  text

В блоке кода выше:

  • Мы импортировали NLTK, известную платформу для разработки приложений Python, взаимодействующих с человеческим языком. Далее мы импортируем reдля регулярного выражения.
  • Мы импортируем стоп-слова из nltk.corpus. При работе со словами, особенно при рассмотрении семантики, нам иногда приходится исключать общеупотребительные слова, которые не добавляют существенного значения высказыванию, например "but", "can", "we", и т. д.
  • PorterStemmerиспользуется для определения основы слов с помощью NLTK. Стеммеры лишают слова их морфологических аффиксов, оставляя только основу слова.
  • Импортируем WordNetLemmatizer()из библиотеки NLTK для лемматизации. Лемматизация намного эффективнее стемминга . Он выходит за рамки сокращения слов и оценивает весь словарный запас языка, чтобы применить морфологический анализ к словам с целью просто удалить флективные окончания и вернуть базовую или словарную форму слова, известную как лемма.
  • stopwords.words('english')позвольте нам взглянуть на список всех английских стоп-слов, поддерживаемых NLTK.
  • remove_unused_c()Функция используется для удаления неиспользуемых столбцов.
  • Мы вменяем нулевые значения с Noneпомощью null_process()функции.
  • Внутри функции clean_dataset()мы вызываем remove_unused_c()и null_process()functions. Эта функция отвечает за очистку данных.
  • Для очистки текста от неиспользуемых символов мы создали clean_text()функцию.
  • Для предобработки будем использовать только удаление стоп-слов. Мы создали nltk_preprocess()функцию для этой цели.

Предварительная обработка textи title:

# Perform data cleaning on train and test dataset by calling clean_dataset function
df = clean_dataset(news_d)
# apply preprocessing on text through apply method by calling the function nltk_preprocess
df["text"] = df.text.apply(nltk_preprocess)
# apply preprocessing on title through apply method by calling the function nltk_preprocess
df["title"] = df.title.apply(nltk_preprocess)
# Dataset after cleaning and preprocessing step
df.head()

Выход:

title	text	label
0	house dem aide didnt even see comeys letter ja...	house dem aide didnt even see comeys letter ja...	1
1	flynn hillary clinton big woman campus breitbart	ever get feeling life circle roundabout rather...	0
2	truth might get fired	truth might get fired october 29 2016 tension ...	1
3	15 civilian killed single u airstrike identified	video 15 civilian killed single u airstrike id...	1
4	iranian woman jailed fictional unpublished sto...	print iranian woman sentenced six year prison ...	1

Исследовательский анализ данных

В этом разделе мы выполним:

  • Одномерный анализ : это статистический анализ текста. Мы будем использовать облако слов для этой цели. Облако слов — это подход к визуализации текстовых данных, при котором наиболее распространенный термин представлен шрифтом самого крупного размера.
  • Двумерный анализ : здесь будут использоваться биграммы и триграммы. Согласно Википедии: « n-грамма представляет собой непрерывную последовательность n элементов из заданного образца текста или речи. Согласно приложению, элементы могут быть фонемами, слогами, буквами, словами или парами оснований. обычно собираются из текстового или речевого корпуса».

Облако одного слова

Наиболее часто встречающиеся слова выделены жирным и крупным шрифтом в облаке слов. В этом разделе будет создано облако слов для всех слов в наборе данных.

Будет использоваться функция библиотеки WordCloudwordcloud() , а для generate()создания изображения облака слов:

from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

# initialize the word cloud
wordcloud = WordCloud( background_color='black', width=800, height=600)
# generate the word cloud by passing the corpus
text_cloud = wordcloud.generate(' '.join(df['text']))
# plotting the word cloud
plt.figure(figsize=(20,30))
plt.imshow(text_cloud)
plt.axis('off')
plt.show()

Выход:

WordCloud для всех данных о фейковых новостях

Облако слов только для достоверных новостей:

true_n = ' '.join(df[df['label']==0]['text']) 
wc = wordcloud.generate(true_n)
plt.figure(figsize=(20,30))
plt.imshow(wc)
plt.axis('off')
plt.show()

Выход:

Облако слов для надежных новостей

Облако слов только для фейковых новостей:

fake_n = ' '.join(df[df['label']==1]['text'])
wc= wordcloud.generate(fake_n)
plt.figure(figsize=(20,30))
plt.imshow(wc)
plt.axis('off')
plt.show()

Выход:

Облако слов для фейковых новостей

Самая частая биграмма (комбинация из двух слов)

N-грамма — это последовательность букв или слов. Униграмма символов состоит из одного символа, а биграмма состоит из последовательности из двух символов. Точно так же словесные N-граммы состоят из последовательности n слов. Слово «объединенный» — это 1-грамм (unigram). Сочетание слов «Юнайтед Стейт» — 2-граммовое (биграммное), «Нью-Йорк Сити» — 3-граммовое.

Давайте построим наиболее распространенную биграмму на достоверных новостях:

def plot_top_ngrams(corpus, title, ylabel, xlabel="Number of Occurences", n=2):
  """Utility function to plot top n-grams"""
  true_b = (pd.Series(nltk.ngrams(corpus.split(), n)).value_counts())[:20]
  true_b.sort_values().plot.barh(color='blue', width=.9, figsize=(12, 8))
  plt.title(title)
  plt.ylabel(ylabel)
  plt.xlabel(xlabel)
  plt.show()
plot_top_ngrams(true_n, 'Top 20 Frequently Occuring True news Bigrams', "Bigram", n=2)

Топ биграмм в фейковых новостях

Самая распространенная биграмма в фейковых новостях:

plot_top_ngrams(fake_n, 'Top 20 Frequently Occuring Fake news Bigrams', "Bigram", n=2)

Топ биграмм в фейковых новостях

Самая частая триграмма (комбинация из трех слов)

Самая распространенная триграмма на достоверных новостях:

plot_top_ngrams(true_n, 'Top 20 Frequently Occuring True news Trigrams', "Trigrams", n=3)

Самая распространенная триграмма в фейковых новостях

Для фейковых новостей сейчас:

plot_top_ngrams(fake_n, 'Top 20 Frequently Occuring Fake news Trigrams', "Trigrams", n=3)

Самые распространенные триграммы в фейковых новостях

Приведенные выше графики дают нам некоторое представление о том, как выглядят оба класса. В следующем разделе мы будем использовать библиотеку transforms для создания детектора фейковых новостей.

Создание классификатора путем тонкой настройки BERT

В этом разделе будет широко использоваться код из руководства по тонкой настройке BERT для создания классификатора поддельных новостей с использованием библиотеки трансформеров. Итак, за более подробной информацией вы можете обратиться к оригинальному туториалу .

Если вы не устанавливали трансформаторы, вам необходимо:

$ pip install transformers

Импортируем необходимые библиотеки:

import torch
from transformers.file_utils import is_tf_available, is_torch_available, is_torch_tpu_available
from transformers import BertTokenizerFast, BertForSequenceClassification
from transformers import Trainer, TrainingArguments
import numpy as np
from sklearn.model_selection import train_test_split

import random

Мы хотим, чтобы наши результаты воспроизводились, даже если мы перезапустим нашу среду:

def set_seed(seed: int):
    """
    Helper function for reproducible behavior to set the seed in ``random``, ``numpy``, ``torch`` and/or ``tf`` (if
    installed).

    Args:
        seed (:obj:`int`): The seed to set.
    """
    random.seed(seed)
    np.random.seed(seed)
    if is_torch_available():
        torch.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
        # ^^ safe to call this function even if cuda is not available
    if is_tf_available():
        import tensorflow as tf

        tf.random.set_seed(seed)

set_seed(1)

Модель, которую мы собираемся использовать, это bert-base-uncased:

# the model we gonna train, base uncased BERT
# check text classification models here: https://huggingface.co/models?filter=text-classification
model_name = "bert-base-uncased"
# max sequence length for each document/sentence sample
max_length = 512

Загрузка токенизатора:

# load the tokenizer
tokenizer = BertTokenizerFast.from_pretrained(model_name, do_lower_case=True)

Подготовка данных

Давайте теперь очистим NaNзначения из text, authorи titleстолбцов:

news_df = news_d[news_d['text'].notna()]
news_df = news_df[news_df["author"].notna()]
news_df = news_df[news_df["title"].notna()]

Затем создадим функцию, которая принимает набор данных в качестве фрейма данных Pandas и возвращает разделение текстов и меток для обучения/проверки в виде списков:

def prepare_data(df, test_size=0.2, include_title=True, include_author=True):
  texts = []
  labels = []
  for i in range(len(df)):
    text = df["text"].iloc[i]
    label = df["label"].iloc[i]
    if include_title:
      text = df["title"].iloc[i] + " - " + text
    if include_author:
      text = df["author"].iloc[i] + " : " + text
    if text and label in [0, 1]:
      texts.append(text)
      labels.append(label)
  return train_test_split(texts, labels, test_size=test_size)

train_texts, valid_texts, train_labels, valid_labels = prepare_data(news_df)

Приведенная выше функция принимает набор данных в виде фрейма данных и возвращает их в виде списков, разделенных на наборы для обучения и проверки. Значение include_titleозначает True, что мы добавляем titleстолбец в столбец, который textбудем использовать для обучения, а значение include_authorозначает , Trueчто мы также добавляем authorего в текст.

Давайте удостоверимся, что метки и тексты имеют одинаковую длину:

print(len(train_texts), len(train_labels))
print(len(valid_texts), len(valid_labels))

Выход:

14628 14628
3657 3657

Токенизация набора данных

Давайте используем токенизатор BERT для токенизации нашего набора данных:

# tokenize the dataset, truncate when passed `max_length`, 
# and pad with 0's when less than `max_length`
train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=max_length)
valid_encodings = tokenizer(valid_texts, truncation=True, padding=True, max_length=max_length)

Преобразование кодировок в набор данных PyTorch:

class NewsGroupsDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {k: torch.tensor(v[idx]) for k, v in self.encodings.items()}
        item["labels"] = torch.tensor([self.labels[idx]])
        return item

    def __len__(self):
        return len(self.labels)

# convert our tokenized data into a torch Dataset
train_dataset = NewsGroupsDataset(train_encodings, train_labels)
valid_dataset = NewsGroupsDataset(valid_encodings, valid_labels)

Загрузка и тонкая настройка модели

Мы будем использовать BertForSequenceClassificationдля загрузки нашей модели трансформатора BERT:

# load the model
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)

Мы установили num_labelsзначение 2, так как это бинарная классификация. Ниже функция представляет собой обратный вызов для расчета точности на каждом этапе проверки:

from sklearn.metrics import accuracy_score

def compute_metrics(pred):
  labels = pred.label_ids
  preds = pred.predictions.argmax(-1)
  # calculate accuracy using sklearn's function
  acc = accuracy_score(labels, preds)
  return {
      'accuracy': acc,
  }

Давайте инициализируем параметры обучения:

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=1,              # total number of training epochs
    per_device_train_batch_size=10,  # batch size per device during training
    per_device_eval_batch_size=20,   # batch size for evaluation
    warmup_steps=100,                # number of warmup steps for learning rate scheduler
    logging_dir='./logs',            # directory for storing logs
    load_best_model_at_end=True,     # load the best model when finished training (default metric is loss)
    # but you can specify `metric_for_best_model` argument to change to accuracy or other metric
    logging_steps=200,               # log & save weights each logging_steps
    save_steps=200,
    evaluation_strategy="steps",     # evaluate each `logging_steps`
)

Я установил per_device_train_batch_sizeзначение 10, но вы должны установить его настолько высоко, насколько это возможно для вашего графического процессора. Установите logging_stepsи save_stepsна 200, что означает, что мы собираемся выполнить оценку и сохранить веса модели на каждом шаге обучения 200.

Вы можете проверить  эту страницу  для получения более подробной информации о доступных параметрах обучения.

Давайте создадим экземпляр тренера:

trainer = Trainer(
    model=model,                         # the instantiated Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=valid_dataset,          # evaluation dataset
    compute_metrics=compute_metrics,     # the callback that computes metrics of interest
)

Обучение модели:

# train the model
trainer.train()

Обучение занимает несколько часов, в зависимости от вашего графического процессора. Если вы используете бесплатную версию Colab, это займет час с NVIDIA Tesla K80. Вот результат:

***** Running training *****
  Num examples = 14628
  Num Epochs = 1
  Instantaneous batch size per device = 10
  Total train batch size (w. parallel, distributed & accumulation) = 10
  Gradient Accumulation steps = 1
  Total optimization steps = 1463
 [1463/1463 41:07, Epoch 1/1]
Step	Training Loss	Validation Loss	Accuracy
200		0.250800		0.100533		0.983867
400		0.027600		0.043009		0.993437
600		0.023400		0.017812		0.997539
800		0.014900		0.030269		0.994258
1000	0.022400		0.012961		0.998086
1200	0.009800		0.010561		0.998633
1400	0.007700		0.010300		0.998633
***** Running Evaluation *****
  Num examples = 3657
  Batch size = 20
Saving model checkpoint to ./results/checkpoint-200
Configuration saved in ./results/checkpoint-200/config.json
Model weights saved in ./results/checkpoint-200/pytorch_model.bin
<SNIPPED>
***** Running Evaluation *****
  Num examples = 3657
  Batch size = 20
Saving model checkpoint to ./results/checkpoint-1400
Configuration saved in ./results/checkpoint-1400/config.json
Model weights saved in ./results/checkpoint-1400/pytorch_model.bin

Training completed. Do not forget to share your model on huggingface.co/models =)

Loading best model from ./results/checkpoint-1400 (score: 0.010299865156412125).
TrainOutput(global_step=1463, training_loss=0.04888018785440506, metrics={'train_runtime': 2469.1722, 'train_samples_per_second': 5.924, 'train_steps_per_second': 0.593, 'total_flos': 3848788517806080.0, 'train_loss': 0.04888018785440506, 'epoch': 1.0})

Оценка модели

Поскольку load_best_model_at_endустановлено значение True, лучшие веса будут загружены после завершения тренировки. Давайте оценим это с помощью нашего набора проверки:

# evaluate the current model after training
trainer.evaluate()

Выход:

***** Running Evaluation *****
  Num examples = 3657
  Batch size = 20
 [183/183 02:11]
{'epoch': 1.0,
 'eval_accuracy': 0.998632759092152,
 'eval_loss': 0.010299865156412125,
 'eval_runtime': 132.0374,
 'eval_samples_per_second': 27.697,
 'eval_steps_per_second': 1.386}

Сохранение модели и токенизатора:

# saving the fine tuned model & tokenizer
model_path = "fake-news-bert-base-uncased"
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

Новая папка, содержащая конфигурацию модели и веса, появится после запуска указанной выше ячейки. Если вы хотите выполнить прогнозирование, вы просто используете from_pretrained()метод, который мы использовали при загрузке модели, и все готово.

Далее создадим функцию, которая принимает в качестве аргумента текст статьи и возвращает, фейк это или нет:

def get_prediction(text, convert_to_label=False):
    # prepare our text into tokenized sequence
    inputs = tokenizer(text, padding=True, truncation=True, max_length=max_length, return_tensors="pt").to("cuda")
    # perform inference to our model
    outputs = model(**inputs)
    # get output probabilities by doing softmax
    probs = outputs[0].softmax(1)
    # executing argmax function to get the candidate label
    d = {
        0: "reliable",
        1: "fake"
    }
    if convert_to_label:
      return d[int(probs.argmax())]
    else:
      return int(probs.argmax())

Я взял пример из test.csvтого, что модель никогда не делала вывод, я проверил его, и это реальная статья из The New York Times:

real_news = """
Tim Tebow Will Attempt Another Comeback, This Time in Baseball - The New York Times",Daniel Victor,"If at first you don’t succeed, try a different sport. Tim Tebow, who was a Heisman   quarterback at the University of Florida but was unable to hold an N. F. L. job, is pursuing a career in Major League Baseball. <SNIPPED>
"""

Исходный текст находится в среде Colab , если вы хотите его скопировать, так как это полная статья. Давайте передадим его в модель и посмотрим на результаты:

get_prediction(real_news, convert_to_label=True)

Выход:

reliable

Приложение: Создание файла отправки для Kaggle

В этом разделе мы предскажем все статьи в test.csvфайле отправки, чтобы увидеть нашу точность в тестовом наборе на конкурсе Kaggle :

# read the test set
test_df = pd.read_csv("test.csv")
# make a copy of the testing set
new_df = test_df.copy()
# add a new column that contains the author, title and article content
new_df["new_text"] = new_df["author"].astype(str) + " : " + new_df["title"].astype(str) + " - " + new_df["text"].astype(str)
# get the prediction of all the test set
new_df["label"] = new_df["new_text"].apply(get_prediction)
# make the submission file
final_df = new_df[["id", "label"]]
final_df.to_csv("submit_final.csv", index=False)

После того, как мы объединим автора, заголовок и текст статьи, мы передаем get_prediction()функцию в новый столбец, чтобы заполнить labelстолбец, а затем используем to_csv()метод для создания файла отправки для Kaggle. Вот моя оценка подачи:

Оценка подачи

Мы получили точность 99,78% и 100% в частных и публичных списках лидеров. Это потрясающе!

Заключение

Хорошо, мы закончили с учебником. Вы можете проверить эту страницу , чтобы увидеть различные параметры тренировки, которые вы можете настроить.

Если у вас есть собственный набор данных фальшивых новостей для тонкой настройки, вам просто нужно передать список образцов в токенизатор, как это сделали мы, после этого вы не будете изменять какой-либо другой код.

Проверьте полный код здесь или среду Colab здесь .