Logical analysis of built-in DBSCAN Functions in Popular Data Science Programming Languages

Authors

  • Md Amiruzzaman West Chester University, West Chester, PA, USA
  • Rashik Rahman University of Asia Pacific, Dhaka, Bangladesh
  • Md Rajibul Islam University of Asia Pacific, Dhaka, Bangladesh
  • Rizal Mohd Nor International Islamic University Malaysia, Kuala Lumpur, Malaysia

Keywords:

Clustering, DBSCAN, Geo-coordinates, Machine learning, Spatial

Abstract

DBSCAN algorithm is a location-based clustering approach; it is used to find relationships and patterns in geographical data. Because of its widespread application, several data science-based programming languages include the DBSCAN method as a built-in function. Researchers and data scientists have been clustering and analyzing their study data using the built-in DBSCAN functions. All implementations of the DBSCAN functions require user input for radius distance (i.e., eps) and a minimum number of samples for a cluster (i.e., min_sample). As a result, the result of all built-in DBSCAN functions is believed to be the same. However, the DBSCAN Python built-in function yields different results than the other programming languages those are analyzed in this study. We propose a scientific way to assess the results of DBSCAN built-in function, as well as output inconsistencies. This study reveals various differences and advises caution when working with built-in functionality.

MIJST, Vol. 10(1), June 2022: 25-32

Abstract
47
PDF
51

Downloads

Published

2022-07-20

How to Cite

Amiruzzaman, M., Rahman, R., Islam, M. R., & Nor, R. M. (2022). Logical analysis of built-in DBSCAN Functions in Popular Data Science Programming Languages. MIST International Journal of Science and Technology, 10(1), 25–32. Retrieved from https://www.banglajol.info/index.php/MIJST/article/view/60817

Issue

Section

Articles