The history of Machine Comprehension (MC) has its origins along with the birth of first concepts in Artificial Intelligence (AI). The brilliant Allan Turing proposed in his famous article “Computing Machinery and Intelligence” what is now called the Turing test as a criterion of intelligence. Almost 70 years later, Question Answering (QA), a sub-domain of MC, is still one of the most difficult tasks in AI.
However, since last year, the field of Natural Language Processing (NLP) has experienced a fast evolution thanks to the development in Deep Learning research and the advent of Transfer Learning techniques. Powerful pre-trained NLP models such as OpenAI-GPT, ELMo, BERT and XLNet have been made available by the best researchers of the domain.
With such progress, several improved systems and applications to NLP tasks are expected to come out. One of such systems is the cdQA-suite, a package developed by some colleagues and me in a partnership between Telecom ParisTech, a French engineering school, and BNP Paribas Personal Finance, a European leader in financing for individuals.
When we think about QA systems we should be aware of two different kinds of systems: open-domain QA (ODQA) systems and closed-domain QA(CDQA) systems.
Open-domain systems deal with questions about nearly anything, and can only rely on general ontologies and world knowledge. One example of such a system is DrQA, an ODQA developed by Facebook Research that uses a large base of articles from Wikipedia as its source of knowledge. As these documents are related to several different topics and subjects we can understand why this system is considered an ODQA.
On the other hand, closed-domain systems deal with questions under a specific domain (for example, medicine or automotive maintenance), and can exploit domain-specific knowledge by using a model that is fitted to a unique-domain database. The cdQA-suite was built to enable anyone who wants to build a closed-domain QA system easily.
The cdQA-suite is comprised of three blocks:
I will explain how each module works and how you can use it to build your QA system on your own data.
The cdQA architecture is based on two main components: the Retriever and the Reader. You can see below a schema of the system mechanism.
Mechanism of cdQA pipeline
When a question is sent to the system, the Retriever selects a list of documents in the database that are the most likely to contain the answer. It is based on the same retriever of DrQA, which creates TF-IDF features based on uni-grams and bi-grams and compute the cosine similarity between the question sentence and each document of the database.
After selecting the most probable documents, the system divides each document into paragraphs and send them with the question to the Reader, which is basically a pre-trained Deep Learning model. The model used was the Pytorch version of the well known NLP model BERT, which was made available by HuggingFace. Then, the Reader outputs the most probable answer it can find in each paragraph. After the Reader, there is a final layer in the system that compares the answers by using an internal score function and outputs the most likely one according to the scores.
Before starting using the package, let’s install it. You can install it using
pip install cdqa , but for this tutorial, I will install it from the source so I can run a script that downloads pre-trained models and the BNP dataset (a dataset with articles extracted from their public news webpage).
# Setting up cdQA package git clone https://github.com/cdqa-suite/cdQA.git && cd cdQA && pip install . # Download models and BNP dataset python download.py
Now, you can open a jupyter notebook and follow the steps below to see how cdQA works:
You should have something like the following as output:
The output of a QAPipeline prediction
You can notice that the system not only outputs an answer, but also theparagraph where the answer was found and the title of the document / article.
In the snippet above, the preprocessing / filtering steps were needed to transform the BNP Paribas dataframe to the following structure:
Structure of the Dataframe that should be sent to cdQA pipeline
If you use your own dataset, please be sure that your dataframe has such structure.
When using the CPU version of the model, each prediction takes between 10 and 20 seconds to be done. This moderate execution time is due to the BERT Reader, which is a very large deep learning model (~110M parameters). If you have a GPU, you can use directly the GPU version of the model
models/bert<em>qa</em>vGPU-sklearn.joblib. These pre-trained models are also available on the releases page of cdQA github: https://github.com/cdqa-suite/cdQA/releases
You can also improve the performance of the pre-trained Reader, which was pre-trained on SQuAD 1.1 dataset. If you have an annotated dataset (that can be generated by the help of the cdQA-annotator) in the same format as SQuAD dataset you can fine-tune the reader on it:
# Put the path to your json file in SQuAD format here path_to_data = './data/SQuAD_1.1/train-v1.1.json' cdqa_pipeline.fit_reader(path_to_data)
Please be aware that such fine-tuning should be performed using GPU as the BERT model is too large to be trained with CPU.
You can also check out other ways to do the same steps on the official tutorials: https://github.com/cdqa-suite/cdQA/tree/master/examples
In order to facilitate the data annotation, the team has built a web-based application, the cdQA-annotator.
In order to use it, you should have your dataset transformed to a JSON file with SQuAD-like format:
from cdqa.utils.converters import df2squad # Converting dataframe to SQuAD format json_data = df2squad(df=df, squad_version='v1.1', output_dir='.', filename='dataset-name.json')
Now you can install the annotator and run it:
# Clone the repo git clone https://github.com/cdqa-suite/cdQA-annotator # Install dependencies cd cdQA-annotator npm install # Start development server cd src vue serve
Now you can go to http://localhost:8080/ and after loading your JSON file you will see something like this:
To start annotating question-answer pairs you just need to write a question, highlight the answer with the mouse cursor (the answer will be written automatically), and then click on
Annotating question-answer pairs with cdQA-annotator
After the annotation, you can download it and use it to fine-tune the BERT Reader on your own data as explained in the previous section.
The team also has provided a web-based user interface to couple with cdQA. In this section, I will describe how you can use de UI linked to the back-end of
First, you have to deploy a
cdQA REST API by executing on your shell (be sure you run it on
export dataset_path = 'path-to-dataset.csv' export reader_path = 'path-to-reader-model' FLASK_APP=api.py flask run -h 0.0.0.0
Second, you should proceed to the installation of the cdQA-ui package:
git clone https://github.com/cdqa-suite/cdQA-ui && cd cdQA-ui && npm install
Then, you start the develpoment server:
npm run serve
You can now access the web application on http://localhost:8080/. You will see something like the figure below:
Web application of cdQA-ui
As the application is well connected to the back-end, via the REST API, you can ask a question and the application will display an answer, the passage context where the answer was found and the title of the article:
Demonstration of the web application running
If you want to couple the interface on your website you just need do the following imports in your Vue app:
import Vue from 'vue' import CdqaUI from 'cdqa-ui' Vue.use(CdqaUI) import Vue from 'vue' import BootstrapVue from "bootstrap-vue" Vue.use(BootstrapVue) import "bootstrap/dist/css/bootstrap.css" import "bootstrap-vue/dist/bootstrap-vue.css"
Then you insert the cdQA interface component:
You can also check out a demo of the application on the official website: https://cdqa-suite.github.io/cdQA-website/#demo
In this article, I presented
cdQA-suite, a software suite for the deployment of an end-to-end Closed Domain Question Answering System.
If you are interested in learning more about the project, feel free to check out the official GitHub repository: https://github.com/cdqa-suite. Do not hesitate to star and to follow the repositories if you liked the project and consider it valuable for you and your applications.
#python #machine-learning #deep-learning
Welcome to my Blog , In this article, you are going to learn the top 10 python tips and tricks.
#python #python hacks tricks #python learning tips #python programming tricks #python tips #python tips and tricks #python tips and tricks advanced #python tips and tricks for beginners #python tips tricks and techniques #python tutorial #tips and tricks in python #tips to learn python #top 30 python tips and tricks for beginners
Welcome to my Blog, In this article, we will learn python lambda function, Map function, and filter function.
Lambda function in python: Lambda is a one line anonymous function and lambda takes any number of arguments but can only have one expression and python lambda syntax is
Syntax: x = lambda arguments : expression
Now i will show you some python lambda function examples:
#python #anonymous function python #filter function in python #lambda #lambda python 3 #map python #python filter #python filter lambda #python lambda #python lambda examples #python map
Are you preparing for a job interview or an exam that involves knowledge about Python? Or do you want to quickly go through common topics of Python?
Here is a list of 50 interview questions with answers. The list is in no particular order.
I hope you enjoy it.
#python #data-science #software-development #50 python interview questions and answers #interview questions and answers #python interview questions and answers
No programming language is pretty much as diverse as Python. It enables building cutting edge applications effortlessly. Developers are as yet investigating the full capability of end-to-end Python development services in various areas.
By areas, we mean FinTech, HealthTech, InsureTech, Cybersecurity, and that's just the beginning. These are New Economy areas, and Python has the ability to serve every one of them. The vast majority of them require massive computational abilities. Python's code is dynamic and powerful - equipped for taking care of the heavy traffic and substantial algorithmic capacities.
Programming advancement is multidimensional today. Endeavor programming requires an intelligent application with AI and ML capacities. Shopper based applications require information examination to convey a superior client experience. Netflix, Trello, and Amazon are genuine instances of such applications. Python assists with building them effortlessly.
Python can do such numerous things that developers can't discover enough reasons to admire it. Python application development isn't restricted to web and enterprise applications. It is exceptionally adaptable and superb for a wide range of uses.
Python is known for its tools and frameworks. There's a structure for everything. Django is helpful for building web applications, venture applications, logical applications, and mathematical processing. Flask is another web improvement framework with no conditions.
Web2Py, CherryPy, and Falcon offer incredible capabilities to customize Python development services. A large portion of them are open-source frameworks that allow quick turn of events.
Simple to read and compose
Python has an improved sentence structure - one that is like the English language. New engineers for Python can undoubtedly understand where they stand in the development process. The simplicity of composing allows quick application building.
The motivation behind building Python, as said by its maker Guido Van Rossum, was to empower even beginner engineers to comprehend the programming language. The simple coding likewise permits developers to roll out speedy improvements without getting confused by pointless subtleties.
Utilized by the best
Alright - Python isn't simply one more programming language. It should have something, which is the reason the business giants use it. Furthermore, that too for different purposes. Developers at Google use Python to assemble framework organization systems, parallel information pusher, code audit, testing and QA, and substantially more. Netflix utilizes Python web development services for its recommendation algorithm and media player.
Massive community support
Python has a steadily developing community that offers enormous help. From amateurs to specialists, there's everybody. There are a lot of instructional exercises, documentation, and guides accessible for Python web development solutions.
Today, numerous universities start with Python, adding to the quantity of individuals in the community. Frequently, Python designers team up on various tasks and help each other with algorithmic, utilitarian, and application critical thinking.
Python is the greatest supporter of data science, Machine Learning, and Artificial Intelligence at any enterprise software development company. Its utilization cases in cutting edge applications are the most compelling motivation for its prosperity. Python is the second most well known tool after R for data analytics.
The simplicity of getting sorted out, overseeing, and visualizing information through unique libraries makes it ideal for data based applications. TensorFlow for neural networks and OpenCV for computer vision are two of Python's most well known use cases for Machine learning applications.
Thinking about the advances in programming and innovation, Python is a YES for an assorted scope of utilizations. Game development, web application development services, GUI advancement, ML and AI improvement, Enterprise and customer applications - every one of them uses Python to its full potential.
The disadvantages of Python web improvement arrangements are regularly disregarded by developers and organizations because of the advantages it gives. They focus on quality over speed and performance over blunders. That is the reason it's a good idea to utilize Python for building the applications of the future.
#python development services #python development company #python app development #python development #python in web development #python software development
Python is awesome, it’s one of the easiest languages with simple and intuitive syntax but wait, have you ever thought that there might ways to write your python code simpler?
In this tutorial, you’re going to learn a variety of Python tricks that you can use to write your Python code in a more readable and efficient way like a pro.
Swapping value in Python
Instead of creating a temporary variable to hold the value of the one while swapping, you can do this instead
>>> FirstName = "kalebu" >>> LastName = "Jordan" >>> FirstName, LastName = LastName, FirstName >>> print(FirstName, LastName) ('Jordan', 'kalebu')
#python #python-programming #python3 #python-tutorials #learn-python #python-tips #python-skills #python-development