我正在尝试使用mutate_if基于变量名称执行计算。例如,如果变量名称包含“demo”,则计算平均值,如果变量名称包含“meas”,则计算中位数:

library(tidyverse) 
library(stringr) 
 
exm_data <- data_frame( 
  group = sample(letters[1:5], size = 50, replace = TRUE), 
  demo_age = rnorm(50), 
  demo_height = runif(50, min = 48, max = 80), 
  meas_score1 = rnorm(50), 
  meas_score2 = rnorm(50) 
) 
exm_data 
#> # A tibble: 50 x 5 
#>    group    demo_age demo_height  meas_score1 meas_score2 
#>    <chr>       <dbl>       <dbl>        <dbl>       <dbl> 
#>  1     a -1.46539563    58.22435 -0.760692567   0.1077901 
#>  2     b  1.90983770    56.57976  0.262933462  -1.0186600 
#>  3     c  0.58502114    66.26322  2.283491647   0.3215542 
#>  4     b -0.97228337    74.82932  2.447551824  -0.4763201 
#>  5     a  0.65814161    72.19627 -0.592671739  -0.0521247 
#>  6     c -0.62133706    75.49976  0.005813255  -0.4195284 
#>  7     b  0.40650836    60.99083  0.809183477  -0.1127530 
#>  8     c -0.48251421    50.94077 -1.171749420   1.7268231 
#>  9     b  1.24476630    71.39803  1.786950340   0.7980217 
#> 10     c -0.09704469    69.52001 -0.511872217  -1.1465523 
#> # ... with 40 more rows 
 
 
exm_data %>% 
  mutate_if(str_detect(colnames(.), "demo"), mean) %>% 
  mutate_if(str_detect(colnames(.), "meas"), median) 
#> # A tibble: 50 x 5 
#>    group    demo_age demo_height meas_score1 meas_score2 
#>    <chr>       <dbl>       <dbl>       <dbl>       <dbl> 
#>  1     a -0.03250753    64.31412 -0.09909911   0.1307904 
#>  2     b -0.03250753    64.31412 -0.09909911   0.1307904 
#>  3     c -0.03250753    64.31412 -0.09909911   0.1307904 
#>  4     b -0.03250753    64.31412 -0.09909911   0.1307904 
#>  5     a -0.03250753    64.31412 -0.09909911   0.1307904 
#>  6     c -0.03250753    64.31412 -0.09909911   0.1307904 
#>  7     b -0.03250753    64.31412 -0.09909911   0.1307904 
#>  8     c -0.03250753    64.31412 -0.09909911   0.1307904 
#>  9     b -0.03250753    64.31412 -0.09909911   0.1307904 
#> 10     c -0.03250753    64.31412 -0.09909911   0.1307904 
#> # ... with 40 more rows 

如您所见,这项工作按预期进行。但是,我想按组进行这些计算,当我添加 group_by语句时,它会中断:

exm_data %>% 
  group_by(group) %>% 
  mutate_if(str_detect(colnames(.), "demo"), mean) %>% 
  mutate_if(str_detect(colnames(.), "meas"), median) 
#> Error: length(.p) == length(vars) is not TRUE 

有没有一种方法可以在使用列名的分组tibble上使用 mutate_if

请您参考如下方法:

您可以按以下方式将mutate_at连同contains中的dplyr一起使用,

library(dplyr) 
 
 exm_data %>%  
  group_by(group) %>%  
  mutate_at(vars(contains('demo')), funs(mean)) %>%  
  mutate_at(vars(contains('meas')), funs(median)) 

这使,

# A tibble: 50 x 5 
# Groups:   group [5] 
   group    demo_age demo_height meas_score1 meas_score2 
   <chr>       <dbl>       <dbl>       <dbl>       <dbl> 
 1     d  0.12916082    60.26550   0.1932882  -0.5356818 
 2     b -0.31142894    64.50839   0.3219514  -0.4777860 
 3     b -0.31142894    64.50839   0.3219514  -0.4777860 
 4     a -0.34373403    64.84180   0.1929516  -0.3821047 
 5     a -0.34373403    64.84180   0.1929516  -0.3821047 
 6     b -0.31142894    64.50839   0.3219514  -0.4777860 
 7     d  0.12916082    60.26550   0.1932882  -0.5356818 
 8     a -0.34373403    64.84180   0.1929516  -0.3821047 
 9     d  0.12916082    60.26550   0.1932882  -0.5356818 
10     c -0.05963747    59.07845  -0.2395409  -0.4484245 


奖励您不需要加载 stringr


评论关闭
IT干货网

微信公众号号:IT虾米 (左侧二维码扫一扫)欢迎添加!