I would like to solve this problem in R without using SQL.

How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?

Sure, I could use sqldf to do it, but there must be a cool apply method in R to do it, too?

    I don't know what "distinct" does, but x[which.max(x$var),] does the other part.
    – Frank
    Commented May 18, 2013 at 18:28
    cool -- thanks! I'd never used which.max. nice!
    – Don
    Commented May 18, 2013 at 20:06

Setup data First read in the data:

Lines <- "id  home  datetime  player   resource
1   10   04/03/2009  john    399 
2   11   04/03/2009  juliet  244
5   12   04/03/2009  borat   555
3   10   03/03/2009  john    300
4   11   03/03/2009  juliet  200
6   12   03/03/2009  borat   500
7   13   24/12/2008  borat   600
8   13   01/01/2009  borat   700
DF <- read.table(text = Lines, header = TRUE)
DF$datetime <- as.Date(DF$datetime, format = "%d/%m/%Y")

1) base - by There are many ways to process this using various packages but here we will show a base solution first:

> do.call("rbind", by(DF, DF$home, function(x) x[which.max(x$datetime), ]))
   id home   datetime player resource
10  1   10 2009-03-04   john      399
11  2   11 2009-03-04 juliet      244
12  5   12 2009-03-04  borat      555
13  8   13 2009-01-01  borat      700

1a) base - ave and a variation (also only using the base of R):

FUN <- function(x) which.max(x) == seq_along(x)
is.max <- ave(xtfrm(DF$datetime), DF$home, FUN = FUN) == 1
DF[is.max, ]

2) sqldf and here it is using sqldf just in case:

> library(sqldf)
> sqldf("select id, home, max(datetime) datetime, player, resource 
+        from DF 
+        group by home")
  id home   datetime player resource
1  1   10 2009-03-04   john      399
2  2   11 2009-03-04 juliet      244
3  5   12 2009-03-04  borat      555
4  8   13 2009-01-01  borat      700

I do not use SQL as well, so I would do it in this way.


df <- read.table("your file", "your options") # I leave this to you


row_with_max_value <- max(which(df$values & df$group_column=="desired_group"))

"row_with_max_value" contents the row number of your data frame (df), in which you find the maximum value of the column "values" (df$values) grouped by "group_column". If "group_column" is not of type character, remove the quotes and use the corresponding text format.

If you need the value, than


Probably it is not the most elegant way, but you do not need SQL and it works (at least for me ;)

