I have 4 columns in my dataframe containing the following data:
Start_latitude<br>
Start_longitude<br>
Stop_latitude<br>
Stop_longitude<br>
I need to compute distance between the latitude longitude pair and create a new column with the distance computed.
I came across a package (geopy) which can do this for me. But I need to pass a tuple to geopy. How do i apply this function (geopy) across the dataframe in pandas for all the records?
I'd recommend you use pyproj instead of geopy. geopy relies on online services whereas pyproj is local (meaning it will be faster and won't rely on an internet connection) and more transparent about its methods (see here for instance), which are based on the Proj4 codebase that underlies essentially all open-source GIS software and, probably, many of the web services you'd use.
#!/usr/bin/env python3
import pandas as pd
import numpy as np
from pyproj import Geod
wgs84_geod = Geod(ellps='WGS84') #Distance will be measured on this ellipsoid - more accurate than a spherical method
#Get distance between pairs of lat-lon points
def Distance(lat1,lon1,lat2,lon2):
az12,az21,dist = wgs84_geod.inv(lon1,lat1,lon2,lat2) #Yes, this order is correct
return dist
#Create test data
lat1 = np.random.uniform(-90,90,100)
lon1 = np.random.uniform(-180,180,100)
lat2 = np.random.uniform(-90,90,100)
lon2 = np.random.uniform(-180,180,100)
#Package as a dataframe
df = pd.DataFrame({'lat1':lat1,'lon1':lon1,'lat2':lat2,'lon2':lon2})
#Add/update a column to the data frame with the distances (in metres)
df['dist'] = Distance(df['lat1'].tolist(),df['lon1'].tolist(),df['lat2'].tolist(),df['lon2'].tolist())
PyProj has some documentation here.