Running Spark NLP in Docker Container for Named Entity Recognition

Running Spark NLP in Docker Container for Named Entity Recognition

Using Spark NLP with Jupyter notebook for natural language processing in Docker environment. As described in [7], Docker is a tool that allows us to easily deploy applications (e.g., Spark NLP) in a sandbox (called container) to run on any Docker supported host operating system (i.e., Mac).

As described in [1], natural language processing (NLP) is a common research subfield shared by many research fields such as linguistics, computer science, information engineering, and artificial intelligence, etc. NLP is concerned with the interactions between computers and human natural languages in general and in particular how to use computers to process and analyze natural language data (e.g., text, voice, etc.). Some of the major challenges in NLP include speech recognition, natural language understanding (e.g., text understanding), and natural language generation.

One of the early applications of machine learning in text understanding is email and message spam detection [1]. With the advancement of deep learning, many new advanced language understanding methods have been published such as the deep learning method BERT (see [2] for an example of using MobileBERT for question and answer).

The other popular method in NLP is Named Entity Recognition (NER). The main purpose of NER is to extract named entities (e.g., personal names, organization names, location names, product names, etc.) from unstructured text. There are many open source NLP libraries/tools with NER support such as NLTK and SpaCy [3]. Recently Spark NLP [4] gets more and more attention due to its more complete list of supported NLP features [5][6].

It seems to me that the development of Spark NLP [4] is based on Ubuntu Linux and OpenJDK. Thus it’s straight forward to setup environment for Spark NLP in Colab (see instructions and code examples) since Colab uses Ubuntu operating system. However, I noticed that it’s difficult to set up a local environment on Mac for Spark NLP due to the following known exception:

Image for post

To avoid the problem, this article demonstrates how to set up a Docker environment [7] to run Spark NLP for NER and other NLP features in a Docker container. Such a Docker environment can serve as a basis for establishing a Spark NLP microservices platform.

1. Introduction to Docker

As described in [7], Docker is a tool that allows us to easily deploy applications (e.g., Spark NLP) in a sandbox (called container) to run on any Docker supported host operating system (i.e., Mac).

The basic concepts of Docker are:

  • Dockerfile:
  • Docker image
  • Docker container
1.1 Dockerfile

Dockerfile [7] is a simple text file that contains a list of commands (similar to Linux commands) for creating a Docker image. It’s a way to automate the Docker image creation process.

1.2 Docker Image

docker image [7] is a read-only template that contains a set of instructions for creating a Docker container that can run on the Docker platform. It provides a convenient way to package up applications and preconfigured server environments.

A Docker image is built from a Dockerfile.

spark ner towards-data-science nlp docker

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

What Are The Advantages and Disadvantages of Data Science?

Online Data Science Training in Noida at CETPA, best institute in India for Data Science Online Course and Certification. Call now at 9911417779 to avail 50% discount.

50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.

Data Science With Python Training | Python Data Science Course | Intellipaat

🔵 Intellipaat Data Science with Python course: https://intellipaat.com/python-for-data-science-training/In this Data Science With Python Training video, you...

Applications Of Data Science On 3D Imagery Data

The agenda of the talk included an introduction to 3D data, its applications and case studies, 3D data alignment and more.

Data Science Course in Dallas

Become a data analysis expert using the R programming language in this [data science](https://360digitmg.com/usa/data-science-using-python-and-r-programming-in-dallas "data science") certification training in Dallas, TX. You will master data...