IT干货网

R:按最新日期划分

lidabo 2025年05月04日 编程设计 71 0

我有:

Keyword   Date   Pos   Bid 
a       4/11/14   1   5.00 
a       4/13/14   1   5.00 
a       4/14/14   1   5.00 
b        6/2/14   3   9.00 
b        7/2/14   4   9.00   
b        8/2/14   4   9.00 
c       8/29/14   2   3.00 
c       8/30/14   2   3.00 
c       8/31/14   2   3.00 

我需要子集,以便仅保留最新日期的行:
Keyword   Date   Pos   Bid 
a       4/14/14   1   5.00 
b        8/2/14   4   9.00 
c       8/31/14   2   3.00 

我试过了:
Latest = ddply( df,  
                'Keyword',  
                function(x) c ( 
                    Date = max(as.Date(x$Date, '%m/%d/%y')),  
                    Pos = x$Pos[which(x$Date == max(as.Date(x$Date, '%m/%d/%y')))],  
                    Bid = x$Bid[which(x$Date == max(as.Date(x$Date, '%m/%d/%y')))] 
                ) 
         ) 


Latest = subset( x,  
                 Date = max(as.Date(Date, '%m/%d/%y')),  
                 select = c('Identity', 'Date', 'Round.Avg.Pos.', 'Search.Bid') 
         ) 

但是这些要么给我错误,要么不给我我想要的。我想念什么?

谢谢。

请您参考如下方法:

你可以试试

 library(dplyr) 
 library(tidyr) 
 
  df %>%  
     mutate(Date=as.Date(Date, format= "%m/%d/%y"))%>%  
     group_by(Keyword) %>%   
     arrange(desc(Date)) %>% 
     slice(1) 
 
  #   Keyword       Date Pos Bid 
  #1       a 2014-04-14   1   5 
  #2       b 2014-08-02   4   9 
  #3       c 2014-08-31   2   3 

或者
   df %>%  
      group_by(Keyword) %>% 
      mutate(Date=as.Date(Date, format= "%m/%d/%y"))%>%  
      filter(Date==max(Date)) 

或使用 base R
  indx <- with(df, ave(as.Date(Date, format="%m/%d/%y"), Keyword, FUN=max)) 
  df[with(df, as.Date(Date, format='%m/%d/%y')==indx),] 
  #  Keyword    Date Pos Bid 
  #3       a 4/14/14   1   5 
  #6       b  8/2/14   4   9 
  #9       c 8/31/14   2   3 

或使用 ddply
  ddply(df, .(Keyword), function(x) { 
                  Date=as.Date(x$Date, '%m/%d/%y') 
                  x[Date==max(Date),]}) 
 
  #  Keyword    Date Pos Bid 
  #1       a 4/14/14   1   5 
  #2       b  8/2/14   4   9 
  #3       c 8/31/14   2   3 

数据
df <- structure(list(Keyword = c("a", "a", "a", "b", "b", "b", "c",  
 "c", "c"), Date = c("4/11/14", "4/13/14", "4/14/14", "6/2/14",  
 "7/2/14", "8/2/14", "8/29/14", "8/30/14", "8/31/14"), Pos = c(1L,  
1L, 1L, 3L, 4L, 4L, 2L, 2L, 2L), Bid = c(5, 5, 5, 9, 9, 9, 3,  
3, 3)), .Names = c("Keyword", "Date", "Pos", "Bid"), class = "data.frame", row.names = c(NA,  
-9L)) 


评论关闭
IT干货网

微信公众号号:IT虾米 (左侧二维码扫一扫)欢迎添加!