Importing the Dataset

As usual, the first step is importing the dataset of matches downloaded using the instructions in my last article (Part I).

import pandas as pd
#importing dataset
X = pd.read_csv('/content/drive/My Drive/Colab Notebooks/Projects/20200526_Chess Openings/chess1.csv')

Unfortunately, the data is not structured at all. I will need to preprocess all the data to create a dataset that holds a single chess move for every column.

n_moves = X.shape[0]

#break every game in individual moves
moves = [[] for x in range(n_moves-1)]
for _ in range(n_moves-1):
  game = X['0'][_].split(".")
  game.remove('1')
  #print(game)
  for move in game:
    try:
      player_move = move.split(" ")
      #print(player_move)
      moves[_].append(player_move[1]) #add white move
      moves[_].append(player_move[2]) #add black move
    except:
      #if error occurs
      print(_, move)

With this algorithm, I took all the content of column 0 and broken down every string (therefore every match). For example, I broke down “1. d4 Nf6 2. Nf3 g6 3. Bf4 Bg7 4. e3 O-O 5. h3…” into individual moves. It will create a dataset called moves that will contain every individual move as a string placed in a different column.OutputThis function will also return the rows in which it will encounter errors, and will avoid adding them to the dataset.

3302  
3303  
3304  
3305  
3306   
3307   
3308   
3309  
3310  
5280  
5284  
5285  

***For a tiny mistake, the number will repeat themselves once in the code, but without any consequence for our resultNow, let us see the final result:

moves = pd.DataFrame(moves)
moves

#towards-data-science #python #big-data #clustering #ai #programming

Clustering: chess openings classifier
4.05 GEEK