KB: Using DBSCAN in R with precomputed distance

File “points.txt” contains the X, Y coordinates of the points. We want to customize the distance calculation and feed into DBSCAN as a distance object.

Data file – points.txt

x    y
5    8
6    7
6    5
2    4
3    4
5    4
7    4
9    4
3    3
8    2
7    5

R Script

library(dbscan)

data=read.table("d:/temp/points.txt", sep="\t", header=TRUE)
distance=matrix(,nrow=nrow(data),ncol=nrow(data))

for(i in 1:nrow(data)){
    for(j in i:nrow(data)){
        dx=abs(data$x[i]-data$x[j])^0.5
        dy=abs(data$y[i]-data$y[j])^0.5
        distance[i,j]=(dx+dy)^2
        distance[j,i]=(dx+dy)^2
    }
}

result=dbscan(as.dist(distance), eps=4, minPts=3)
result$cluster

 

HOWTO: Basic data processing in R

Reading CSV data without header

MyData = read.csv(“D:/temp/iris.txt”,header=FALSE)

colnames(MyData) <- c(“col1″,”col2″,”col3″,”col4”)

 

Reading Tab-delimited data without header

MyData = read.table(“D:/temp/iris.txt”,sep=”\t”,header=FALSE)

colnames(MyData) <- c(“col1″,”col2″,”col3″,”col4”)

 

Viewing table

View(MyData)

 

Removing column from table

MyData$col4<-NULL

 

List memory content

ls()

 

Remove data

rm(MyData)