First Y-Short Tandem Repeat Categorical Dataset for Clustering Applications // Dataset Papers in Biology, Volume 2013 (2013), Article ID 364725, 9 pages
http://dx.doi.org/10.7167/2013/364725Seman et al.
The Y-chromosome short tandem repeat (Y-STR) data are mainly collected for a performance benchmarking result in clustering methods. There are six Y-STR dataset items, divided into two categories: Y-STR surname and Y-haplogroup data presented here. The Y-STR data are categorical, unique, and different from the other categorical data. They are composed of a lot of similar and almost similar objects. This characteristic of the Y-STR data has caused certain problems of the existing clustering algorithms in clustering them.
http://www.hindawi.com/dpis/biology/2013/364725/