In

this article, I wish to share my thoughts on what challenging data science problems we can solve which have business value amid Covid-19. Covid-19 is the Novel Coronavirus disease of 20**19 **[1]. This article is useful for both the data science enthusiasts to identify, formulate, solve the business problems, and to the leaders to instruct their teams to work on the data science problems relevant for their business. I observed_two main trend_s in the industry.

○ We have a business problem but not sure what data helps to solve it?

○ We have data but what business problems to solve with that data?

In both these cases, the starting point is different. However, both business problems and relevant data to solve it are equally important in the data science world. The focus of the article is:

What business problems to solve?

What are the different data sets relevant to COVID-19 available?

What techniques of Machine Learning / Deep Learning / Statistical techniques can be used to solve these problems and challenges involved?

Summarize the problems to solve specific to industry verticals

As mentioned in my earlier article [2], there are 7 types of data namely, **numerical, categorical, text, image, video, speech, and signals**irrespective of the industry/domain to build the data science problems.

Problems to solve with Numerical and Categorical data:

Table-1 summarizes the type of problems to solve with _numerical, categorical _types of data, what data science techniques to use, and challenges in solving those problems. The core business problems to solve are “What is the impact of Covid-19 on my business?”, “How risky we are to get Covid-19 virus?”. These business problems are formulated as multi-step data science problems as listed in Table-1.

Table-1: List of problems to solve with Numerical and Categorical data amid Covid-19

Here is the list of available data sets to solve the above set of problems. Along with these data sets, you may require to use data specific to your organization which you will have access to. You can directly load the raw open-source data in your code (python notebook) or download .csv files and then load them for further processing.

  • Novel Coronavirus (COVID-19) Cases, provided by Center for Systems Science and Engineering (CSSE) of John Hopkins University (JHU) :

https://github.com/CSSEGISandData/COVID-19

  • Day level information on Covid-19 affected cases:

https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset

  • Covid-19 India specific data can be accessed from this URL, which is updated on daily basis. You may refer to your specific country-level data for further analysis.

https://api.covid19india.org/

  • Time series data of confirmed, deaths, recovery cases:

https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases

There are interesting articles to process your loaded data as referred to in [3].

  • GDP Growth rate/ recession data references:

https://datahub.io/core/gdp#data

https://tradingeconomics.com/country-list/gdp-growth-rate

https://data.worldbank.org/topic/economy-and-growth

  • **Consumer Price Index (CPI) **data can be downloaded from:

https://datahub.io/core/cpi#data

  • Employment data reference:

https://datahub.io/core/employment-us#data

  • Population data can be downloaded from:

https://datahub.io/core/population

  • Gold historical price:

https://datahub.io/core/gold-prices

  • Weather data set:

https://www.kaggle.com/muthuj7/weather-dataset

  • Real estate / House price data:

https://datahub.io/core/house-prices-us#data

  • COVID-19 Vulnerability Index data and reference paper [4]:

https://github.com/closedloop-ai/cv19index

  • Drug discovery data set for COVID-19:

https://www.kaggle.com/jaisimha34/covid19-drug-discovery/data

Problems to solve with Text data using Natural Language Processing:

If you are comfortable with text processing, then these problems may be of interest to you. Table-2 summarizes a list of problems, the corresponding text data, techniques to solve those problems along with the list of challenges. Recent advances in Bidirectional Encoder Representations from Transformers (BERT) are playing a crucial role in solving these kinds of problems.

Table-2: List of problems to solve with Text data amid Covid-19

The available data set links are as follows:

  • COVID-19 Open Research Dataset (CORD-19) data from Kaggle with 50,000 + articles:

https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

  • Government measures data set:

https://www.acaps.org/covid19-government-measures-dataset

  • **Fake news detector **data and model can be accessed from:

https://github.com/yaqingwang/EANN-KDD18

#ai-in-covid #business-problem-in-covid #covid19 #data-science-in-covid #data-science-problems #data science

DS4Covid-19: What Problems to solve with Data Science amid Covid-19 ?
1.85 GEEK