Abstract:

The Center for Medicare and Medicaid Services publishes a set of quality ratings for each nursing home in the country that participates in the Medicare or Medicaid program. These ratings help families in understanding the differences between different nursing homes and their quality. They also help in making important decisions like choosing the nursing home of their choice (Centers for Medicare and Medicaid Services, 2020). The current paper aims to answer certain research questions related to the data set. It uses Machine Learning techniques to model the data and understand the relationships between different features. Given a set of attributes, a Machine Learning model is trained to predict the overall rating of a nursing home. The performance of different models is evaluated and the answers to the research questions are provided.

Introduction:

The Center for Medicare and Medicaid Services has introduced a ‘star’ based rating system that quantifies the quality of a nursing home. The ‘Overall Rating’ of a nursing home is calculated based on its performance on three domains which in turn is a rating.

**Health inspections: **Annual inspection surveys are conducted and the deficiencies in the nursing homes are noted. A rating is assigned by taking into consideration, the severity, and the number of deficiencies identified during these inspections for the past three years. It also includes the number of revisits that were required by the department to check whether a nursing home has corrected the faults identified during the inspections.

**Staffing: **RN is an abbreviation for Registered Nurse and LPN is an abbreviation of licensed practical nurse. The rating is based on the total RN number of hours per resident per day and the total number of hours of nurse staffing per resident per day.

**Quality Measures: **There are 15 Quality Measures present on the Nursing Home Compare website which comprise 9 long stay and 9 short stay evaluations.

As there are multiple features and it is hard to comprehend and analyze all of them, there exists a single feature called the ‘Overall Rating’ that rates a nursing home on a scale of 1 to 5 (lowest to highest) respectively. Explicit details about the three domains are found in (Centers for Medicare and Medicaid Services, 2020).

Dataset Description:

The dataset is extracted from the DATA.GOV website (Centers for Medicare and Medicaid Services, 2019). It consists of 86 columns and 15,437 records. Each record is associated with a nursing home. Some of the important columns describing the nursing home are,

Federal Provider Number — A unique number provided by the federal government to a nursing home.

Provider Name — The name of the nursing home.

Provider Address — The address of the nursing home.

Provider City — The city where the nursing home exists.

Provider State — The state where the nursing home exists.

Provider Zip Code — The zip code associated with the nursing home.

Provider Phone Number — The phone number of the nursing home.

Provider SSA County — The county in which the nursing home exists.

Provider County Name — The name of the county the nursing home belongs to.

Ownership Type — Describes whether it is a for-profit or government or non-profit entity.

Provider Type — Describes whether the nursing home is Medicare or Medicaid or Medicare and Medicaid.

Provider Resides in Hospital — A Boolean value indicating TRUE or FALSE to denote whether the provider resides in the hospital or not.

Legal Business Name — The legal business of the nursing home.

There are other features related to the measures described in the introduction section of the paper. For example, some of the features related to ‘Health inspections’ include ‘Rating Cycle 1 Total Number of Health Deficiencies’, ‘Rating Cycle 1 Number of Standard Health Deficiencies’, ‘Rating Cycle 1 Health Deficiency Score’, etc. The features related to **‘Staffing’ **include ‘Reported Nurse Aide Staffing Hours per Resident per Day’, ‘Reported LPN Staffing Hours per Resident per Day’, ‘Reported RN Staffing Hours per Resident per Day’, etc. The features related to ‘Quality Measures’ include ‘QM Rating’, ‘QM Rating Footnote’, ‘Long-Stay QM Rating’, etc.

The summary statistics of the data are shown in Table 1. Since there are many columns, only a few of them are displayed in the table.

Table 1. Summary Statistics for Features in the Dataset

The number of values for the feature ‘Total Amount of Fines in Dollars’ and ‘Number of Certified Beds’ are the highest. The mean, standard deviation, and 75% confidence interval values are maximum for the ‘Total Amount of Fines in Dollars’ feature. The 25% and 50% confidence interval values are maximum for ‘Number of Certified Beds’ and ‘Average Number of Residents Per Day’ columns.

Exploratory Data Analysis:

Figure 1 depicts the ‘Total Weighted Health Survey Score’ by the State. It can be observed that California and Texas have a higher total weighted health survey score. Places like Vermont, Puerto Rico, and Guam rank have a lower total weighted health survey score.

Fig. 1. Bar plot depicting Total Weighted Health Survey Score by State

Figure 2 shows a heatmap visualizing the sum of the ‘Overall Rating’ of nursing homes by state. It can be observed that California, Texas, Ohio, and Florida have the highest overall rating for nursing homes by state. Considering only 50 states, Alaska, Vermont, Delaware, Wyoming have lower ‘Overall Rating’ for nursing homes by state.

Fig. 2. Heatmap showing the sum of Overall Ratings of all Nursing Homes by State

#data-analysis #machine-learning #research #government #healthcare

Prediction of Overall Rating of a Nursing Home using Machine Learning
1.25 GEEK