我有一个数据框:

  routeId  latitude_value  longitude_value 
  r1       28.210216        22.813209 
  r2       28.216103        22.496735 
  r3       28.161786        22.842318 
  r4       28.093110        22.807081 
  r5       28.220370        22.503500 
  r6       28.220370        22.503500 
  r7       28.220370        22.503500 

从这我想生成一个数据帧 df2 像这样的东西:
routeId    nearest 
  r1         r3         (for example) 
  r2       ...    similarly for all the routes. 

我正在尝试实现的逻辑是

对于每条路线,我应该找到所有其他路线的欧几里得距离。
并在routeId上进行迭代。

有一个用于计算欧式距离的函数。
dist = math.hypot(x2 - x1, y2 - y1) 

但是我对如何构建传递数据帧或使用.apply()的函数感到困惑
def  get_nearest_route(): 
    ..... 
    return df2 

请您参考如下方法:

我们可以使用 scipy.spatial.distance.cdist 或多个for循环,然后将min替换为路由并找到最接近的i

mat = scipy.spatial.distance.cdist(df[['latitude_value','longitude_value']],  
                              df[['latitude_value','longitude_value']], metric='euclidean') 
 
# If you dont want scipy, you can use plain python like  
# import math 
# mat = [] 
# for i,j in zip(df['latitude_value'],df['longitude_value']): 
#     k = [] 
#     for l,m in zip(df['latitude_value'],df['longitude_value']): 
#         k.append(math.hypot(i - l, j - m)) 
#     mat.append(k) 
# mat = np.array(mat) 
 
new_df = pd.DataFrame(mat, index=df['routeId'], columns=df['routeId'])  
new_df的输出
routeId        r1        r2        r3        r4        r5        r6        r7 
routeId                                                                       
r1       0.000000  0.316529  0.056505  0.117266  0.309875  0.309875  0.309875 
r2       0.316529  0.000000  0.349826  0.333829  0.007998  0.007998  0.007998 
r3       0.056505  0.349826  0.000000  0.077188  0.343845  0.343845  0.343845 
r4       0.117266  0.333829  0.077188  0.000000  0.329176  0.329176  0.329176 
r5       0.309875  0.007998  0.343845  0.329176  0.000000  0.000000  0.000000 
r6       0.309875  0.007998  0.343845  0.329176  0.000000  0.000000  0.000000 
r7       0.309875  0.007998  0.343845  0.329176  0.000000  0.000000  0.000000     
 
#Replace minimum distance with column name and not the minimum with `False`. 
# new_df[new_df != 0].min(),0). This gives a mask matching minimum other than zero.   
closest = np.where(new_df.eq(new_df[new_df != 0].min(),0),new_df.columns,False) 
 
# Remove false from the array and get the column names as list .  
df['close'] = [i[i.astype(bool)].tolist() for i in closest] 
 
 
 routeId  latitude_value  longitude_value         close 
0      r1       28.210216        22.813209          [r3] 
1      r2       28.216103        22.496735  [r5, r6, r7] 
2      r3       28.161786        22.842318          [r1] 
3      r4       28.093110        22.807081          [r3] 
4      r5       28.220370        22.503500          [r2] 
5      r6       28.220370        22.503500          [r2] 
6      r7       28.220370        22.503500          [r2]  

如果您不想忽略零,那么
# Store the array values in a variable 
arr = new_df.values 
# We dont want to find mimimum to be same point, so replace diagonal by nan 
arr[np.diag_indices_from(new_df)] = np.nan 
 
# Replace the non nan min with column name and otherwise with false 
new_close = np.where(arr == np.nanmin(arr, axis=1)[:,None],new_df.columns,False) 
 
# Get column names ignoring false.  
df['close'] = [i[i.astype(bool)].tolist() for i in new_close] 
 
   routeId  latitude_value  longitude_value         close 
0      r1       28.210216        22.813209          [r3] 
1      r2       28.216103        22.496735  [r5, r6, r7] 
2      r3       28.161786        22.842318          [r1] 
3      r4       28.093110        22.807081          [r3] 
4      r5       28.220370        22.503500      [r6, r7] 
5      r6       28.220370        22.503500      [r5, r7] 
6      r7       28.220370        22.503500      [r5, r6] 


评论关闭
IT干货网

微信公众号号:IT虾米 (左侧二维码扫一扫)欢迎添加!