2

How can I increase the speed of the following loop operation, if the actual loop range is in 1000's? In the code below:

DF3= Dataframe

OP and k are two columns in DF3 dataframe. Here, k takes a value from 1 to 10.

l1 <- seq(1, 10, 1)
E<-matrix(data=0, nrow=10, ncol=10)
for (i in seq_along(l1)){
  for (j in seq_along(l1)){
    E[i,j]=sum(ifelse (DF3$OP[DF3$k==i]<DF3$OP[DF3$k==j],1,0))
  }
}

DF3 example:

k   OP
1   60
1   30
1   38
1   46
2   29
2   35
2   13
2   82
3   100
3   72
3   63
3   45
2
  • 2
    Here are minor suggestions for better simplicity: (1) Instead seq(1, 2552, 1), you can use 1:2552; (2) Instead of seq_along(l1), you can just use l1; (3), Instead of sum(ifelse ( ... ,1,0)), just drop the ifelse() function from it because summing logical TRUE/FALSE values is the same as summing 1's and 0's. Commented Dec 19, 2020 at 0:52
  • 2
    I think you'll scare away a lot of people trying to help by having a problem that's too big to show and/or just "play" with. While I know you want to expand this to much larger sizes, it might be helpful to show on a much smaller matrix, on the scale of 10x10 instead 2552x2552.
    – r2evans
    Commented Dec 19, 2020 at 0:55

3 Answers 3

4

Perhaps you can simplify your nested for loops via using combn, which computes the values for upper triangular matrix only (but they are sufficient to obtain the values in the whole matrix)

E <- matrix(data = 0, nrow = max(DF3$k), ncol = max(DF3$k))
v <- split(DF3$OP, DF3$k)
E[lower.tri(E)] <- combn(v, 2, FUN = function(x) sum(do.call("-", x) < 0))
E[upper.tri(E)] <- max(lengths(v)) - t(E)[upper.tri(E)]
E <- t(E)

and finally you will get

> E
     [,1] [,2] [,3]
[1,]    0    2    3
[2,]    2    0    3
[3,]    1    1    0

Data

> dput(DF3)
structure(list(k = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L), OP = c(60L, 30L, 38L, 46L, 29L, 35L, 13L, 82L, 100L,
72L, 63L, 45L)), class = "data.frame", row.names = c(NA, -12L
))
1

The logical comparison

DF3$OP[DF3$k==i]<DF3$OP[DF3$k==j]

implies that across all groups of k that there are an equal number of results. If there were uneven amounts of records between groups, there would be issues with the calculation.

In your dataset, you have 4 records in each group. That suggests that instead of working with a dataframe, we may be better off working with a matrix.

ncols = length(unique(DF3$k))
mat = matrix(DF3$OP, ncol = ncols)
E = matrix(0L, ncols, ncols)

for (i in seq_len(ncols)) {
  x = mat[, i]
  for (j in seq_len(ncols)) {
    E[i, j] = sum(x < mat[, j])
  }
}
E

##      [,1] [,2] [,3]
## [1,]    0    2    3
## [2,]    2    0    3
## [3,]    1    1    0
1

Since you need pairwise blocks of DF3$OP by DF3$k groups with reverse duplicates for greater and less than comparison, you are essentially filling in upper and lower triangle of a square matrix. Therefore, consider splitting data frame into OP blocks with by and pass into combn for matrix fill.

OP_list <- by(DF3, DF3$k, function(sub) sub$OP)
OP_list
# DF3$k: 1
# [1] 60 30 38 46
# ------------------------------------------------------------ 
# DF3$k: 2
# [1] 29 35 13 82
# ------------------------------------------------------------ 
# DF3$k: 3

E <- matrix(data=0, nrow=max(DF3$k), ncol=max(DF3$k))

# COMPARE ACROSS ALL COMBINATIONS OF K-GROUP VECTORS
E[upper.tri(E)] <- combn(OP_list, 2, function(x) sum(x[[1]] < x[[2]]))
E[lower.tri(E)] <- combn(OP_list, 2, function(x) sum(x[[1]] > x[[2]]))

E
#      [,1] [,2] [,3]
# [1,]    0    2    3
# [2,]    2    0    3
# [3,]    1    1    0

Should work for small or large sets. See: Online Demo

Not the answer you're looking for? Browse other questions tagged or ask your own question.