Cluster Analysis Using N-gram Statistics for Daihinmin Programs and Performance Evaluations

Cluster Analysis Using N-gram Statistics for Daihinmin Programs and Performance Evaluations

Seiya Okubo (School of Management & Information, University of Shizuoka, Shizuoka, Japan), Takaaki Ayabe (Graduate School of Electro-Communications, University of Electro-Communications, Chofu, Japan) and Tetsuro Nishino (Graduate School of Informatics and Engineering, University of Electro-Communications, Chofu, Japan)
Copyright: © 2016 |Pages: 25
DOI: 10.4018/IJSI.2016040103
OnDemand PDF Download:
No Current Special Offers


In this paper, the authors elucidate the characteristics of the computer game Daihinmin, a popular Japanese card game that uses imperfect information. They first propose a method to extract feature values using n-gram statistics and a cluster analysis method that employs feature values. By representing the program card hands as several symbols, and the order of hands as simplified symbol strings, they obtain data that is suitable for feature extraction. The authors then evaluate the effectiveness of the proposed method through computer experiments. In these experiments, they apply their method to ten programs that were used in the UEC Computer Daihinmin Convention. In addition, the authors evaluate the robustness of the proposed method and apply it to recent programs. Finally, they show that their proposed method can successfully cluster Daihinmin programs with high probability.
Article Preview

2. Preliminary

Various studies have been conducted on the similarity of programs, such as comparisons of source code and of feature quantity behavior and extraction (Kikuchi, Goto, Wakatsuki & Nishino, 2014; Nakamura, 1997; Yamada & Mizuno, 2014). In this study, we use n-gram statistics and Ward’s method.

2.1. N-gram Statistics

N-gram statistics comprise a linguistic model of the type and emergence ratio of n items of element strings appearing next to a sequence of words or letters. N-gram statistics are typically used to extract idioms and identify authors in the area of natural language processing. In addition, these statistics are widely applied to extensive areas of research. By utilizing them in game studies, we can extract procedures with different lengths and identify established procedures, such as behavioral choices unconsciously made by players, without placing constraints on the game.

In studies of perfect information games, n-gram statistics are used to automatically extract established patterns (Nakamura, 1997). For Daihinmin, for example, no clustering or other detailed analyses have been made to date.

2.2 .Cluster Analysis

In this study, we use three types of distance concepts to compute distances. We then use Ward’s method for clustering. The computation methods for each distance concept are defined below.

  • Manhattan distance: Manhattan distance can be obtained by measuring the distance between two points along orthogonal coordinate axes. It is defined as Equation 1:


  • Euclidean distance: It is a distance that is applicable in Euclidean space. It is defined as Equation 2:


  • Chebyshev distance: This distance defines the greatest difference between two vectors in a dimension as the distance. It is designed as Equation 3:


We use Ward’s method for hierarchical cluster analysis, which herein refers to the following.

Complete Article List

Search this Journal:
Open Access Articles
Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2021)
Volume 8: 4 Issues (2020)
Volume 7: 4 Issues (2019)
Volume 6: 4 Issues (2018)
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing