A Framework For Enhancing The Accuracy Of K-Means Clustering Algorithm With Linear Data Structures By Removing The Outliers

Document Type : Primary Research paper


Dept. of Computer Applications,Bishop Heber College (Autonomous), Tiruchirappalli, India (Affiliated to Bharathidasan University, Tiruchirappalli)


Clustering is a common technique for statistical data analysis, which can be used in various fields, like data mining, machine learning,pattern recognition, bioinformatics and image analysis.It is the method of grouping associateddata objects fromdissimilarsets, and it partitionsdatasetsas subsets.So that the data object of each subset rendering to the defined distance degree. K-means is a very well-known clustering algorithm for its nature of simplicity and the power of computational efficiency. Similarity of data objects in K-means algorithm is identified using the measure of distance which leads to implement robust algorithms in both the functionalities of classification and clustering.The measures of distance play a vital role in the overwhelming performance of K-means algorithm. The crucial functionality of distance metrics is to measure the distance between data objects in a dataset.The K-means algorithm calculates the distance between the centroids and data objects. The clusters are formed by grouping the data objects to centroids with minimum distance based on the resultant values [Nasooti et al. 2015]. Therefore, the calculation of distance plays a major role in the process of clustering. Choosing a proper technique for distance calculation is totally dependent on the type of the data.