java 在底图上绘制线条
This is the third of four stories that aim to address the issue of identifying disease outbreaks by extracting news headlines from popular news sources.
这是四个故事中的第三个,旨在通过从流行新闻来源中提取新闻头条来解决识别疾病暴发的问题。
This article aims to determine an easy way to view the clusters determined (in the second article) on a global and US-level scale. First, a list of large cities are gathered, and placed with their corresponding latitude and longitude inside a dataset. Next, a function is made that plots the cluster points on a map with different colors for each respective cluster. Lastly, the function is called for the points in the United States, the centers of the clusters in the United States, the points globally, and the centers of the clusters globally.
本文旨在确定一种简单的方法来查看在全球和美国范围内确定的集群(在第二篇文章中)。 首先,收集大城市列表,并将其对应的纬度和经度放置在数据集中。 接下来,创建一个函数,在每个地图上用不同的颜色绘制地图上的聚类点。 最后,该函数用于美国的点,美国的聚类中心,全球的点以及全球的聚类中心。
A detailed explanation is shown below for how this is implemented:
下面显示了有关如何实现的详细说明:
Step 1: Compiling a List of the Largest Cities in the US
步骤1:编制美国最大城市清单
First, the city name, latitude, longitude, and population are extracted from ‘largest_us_cities.csv’, a file containing the cities in the US with a population over 30,000. Cities with a population over 200,000 were added to the dictionary, and Anchorage and Honolulu were excluded as they skewed the positioning of the map. Next, using the haversine distance formula, which determines the distance between pairs of cities, cities close to one another were excluded and used a population heuristic to determine which city should should be kept.
首先,从“ largest_us_cities.csv”中提取城市名称,纬度,经度和人口,该文件包含美国人口超过30,000的城市。 人口超过200,000的城市被添加到词典中,并且由于锚定地图和地图的位置偏斜,因此将安克雷奇和檀香山排除在外。 接下来,使用Haversine距离公式确定两对城市之间的距离,将彼此靠近的城市排除在外,并使用人口启发法确定应保留的城市。
file2 = open('largest_us_cities.csv', 'r')
large_cities = file2.readlines()
large_city_data = {}for i in range(1, len(large_cities)):
large_city_values = large_cities[i].strip().split(';')
lat_long = large_city_values[-1].split(',')if ((int(large_city_values[-2]) >= 200000) and (large_city_values[0] != "Anchorage") and (large_city_values[0] != "Honolulu") and (large_city_values[0] != "Greensboro")):
large_city_data[large_city_values[0]] = [lat_long[0], lat_long[1], large_city_values[-2]]def haversine(point_a, point_b):
lon1, lat1 = point_a[0], point_a[1]
lon2, lat2 = point_b[0], point_b[1]
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 6371return c * rfor i in list(large_city_data.keys()):for j in list(large_city_data.keys()):if ((i != j) and haversine((float(large_city_data[i][0]), float(large_city_data[i][1])), (float(large_city_data[j][0]),
float(large_city_data[j][1]))) < 80.0):if (large_city_data[j][2] > large_city_data[i][2]):
large_city_data[i] = [np.nan, np.nan, large_city_data[i][2]]else:
large_city_data[j] = [np.nan, np.nan, large_city_data[j][2]]
large_city_data['Chicago'] = [41.8781136, -87.6297982, 2718782]
Step 2: Plotting K-Means Clusters and Cluster Centers Using Basemap
步骤2:使用底图绘制K均值聚类和聚类中心
First, a function is created with seven parameters: df1, num_cluster, typeof, path, size, add_large_city, and figsize. Using the basemap library, depending on the typeof parameter, geographic models of the US and world are generated. Furthermore, the figsize parameter changes the model size depending on its value. A dictionary is created where the keys are the cluster labels, subdivided by latitude and longitude. The values contain the latitude and longitude for each headline for each cluster label.
首先,使用七个参数创建一个函数:df1,num_cluster,typeof,路径,大小,add_large_city和figsize。 使用底图库,根据参数的typeof,可以生成美国和世界的地理模型。 此外,figsize参数根据其值更改模型大小。 将创建一个字典,其中键是聚类标签,按纬度和经度细分。 该值包含每个群集标签的每个标题的纬度和经度。
A list of colors is intitialized, and specific colors are assigned to each cluster label. The latitude and longitude points are plotted using these color values on the geographic models made above. If the add_large_city parameter is true, the largest cities will also be added to the graph. The figure is saved to a “.png” file using the path parameter.
颜色列表被初始化,并且特定的颜色分配给每个群集标签。 使用这些颜色值在上述地理模型上绘制纬度和经度点。 如果add_large_city参数为true,则最大的城市也将添加到图形中。 使用path参数将图形保存到“ .png”文件中。
def print_k_means(df1, num_cluster, typeof, path, size, add_large_city, figsize):if (typeof == "US"):
map_plotter = Basemap(projection='lcc', lon_0=-95, llcrnrlon=-119, llcrnrlat=22, urcrnrlon=-64, urcrnrlat=49, lat_1=33, lat_2=45)else:
map_plotter = Basemap()if (figsize):
fig = plt.figure(figsize = (24,16))else:
fig = plt.figure(figsize = (12,8))
coordinates = []for index in df1.index:
coordinates.append([df1['latitude'][index], df1['longitude'][index], df1['cluster_label'][index]])
cluster_vals = {}for i in range(num_cluster):
cluster_vals[str(i)+"_long"] = []
cluster_vals[str(i)+"_lat"] = []for index in df1.index:
cluster_vals[str(df1['cluster_label'][index])+'_long'].append(float(df1['longitude'][index]))
cluster_vals[str(df1['cluster_label'][index])+'_lat'].append(float(df1['latitude'][index]))
num_list = [i for i in range(num_cluster)]
color_list = ['rosybrown', 'lightcoral', 'indianred', 'brown',
'maroon', 'red', 'darksalmon', 'sienna', 'chocolate', 'sandybrown', 'peru',
'darkorange', 'burlywood', 'orange', 'tan', 'darkgoldenrod', 'goldenrod', 'gold', 'darkkhaki',
'olive', 'olivedrab', 'yellowgreen', 'darkolivegreen', 'chartreuse',
'darkseagreen', 'forestgreen', 'darkgreen', 'mediumseagreen', 'mediumaquamarine',
'turquoise', 'lightseagreen', 'darkslategrey', 'darkcyan',
'cadetblue', 'deepskyblue', 'lightskyblue', 'steelblue', 'lightslategrey',
'midnightblue', 'mediumblue', 'blue', 'slateblue', 'darkslateblue', 'mediumpurple', 'rebeccapurple',
'thistle', 'plum', 'violet', 'purple', 'fuchsia', 'orchid', 'mediumvioletred', 'deeppink', 'hotpink',
'palevioletred']
colors = [color_list[i] for i in range(num_cluster+1)]for target,color in zip(num_list, colors):
map_plotter.scatter(cluster_vals[str(target)+'_long'], cluster_vals[str(target)+'_lat'], latlon=True, s = size, c = color)
map_plotter.shadedrelief()if (add_large_city):for index in list(large_city_data.keys()):if (large_city_data[index][1] != np.nan):
x, y = map_plotter(large_city_data[index][1], large_city_data[index][0])
plt.plot(x, y, "ok", markersize = 4)
plt.text(x, y, index, fontsize = 16)
plt.show()
fig.savefig(path)
Step 3: Running the Function
步骤3:运行功能
The print_k_means function is run on the df_no_us dataframe to make a scatterplot of the latitude and longitudes for headlines pertaining to the US. Next, a geographic center to each cluster is determined and stored in another dataframe called df_center_us. The print_k_means function is run on the df_center_us dataframe and adds large cities to determine the cities closest to the disease outbreak centers. Additionally, the size is increased for easier readability. A similar process is run for df_no_world. Each of the dataframes are stored in a “.csv” file.
在df_no_us数据帧上运行print_k_means函数,以制作与美国相关的标题的经度和纬度散点图。 接下来,确定每个群集的地理中心,并将其存储在另一个名为df_center_us的数据框中。 print_k_means函数在df_center_us数据帧上运行,并添加大城市以确定最接近疾病爆发中心的城市。 此外,增加了大小以更易于阅读。 df_no_world运行类似的过程。 每个数据帧都存储在“ .csv”文件中。
print_k_means(df_no_us, us_clusters, "US", "corona_disease_outbreaks_us.png", 50, False, False)
df_no_us.to_csv("corona_disease_outbreaks_us.csv")
df_center_us = {'latitude': [], 'longitude':[] , 'cluster_label': []}for i in range(us_clusters):
df_1 = df_no_us.loc[df_no_us['cluster_label'] == i]
df_1 = df_1.reset_index()del df_1['index']
latitude = []
longitude = []for index in df_1.index:
latitude.append(float(df_1['latitude'][index]))
longitude.append(float(df_1['longitude'][index]))
df_1['latitude'] = latitude
df_1['longitude'] = longitude
sum_latitude = df_1['latitude'].sum()
sum_longitude = df_1['longitude'].sum()if (len(df_1['latitude']) >= 20):
df_center_us['latitude'].append(sum_latitude/(len(df_1['latitude'])))
df_center_us['cluster_label'].append(i)
df_center_us['longitude'].append(sum_longitude/(len(df_1['longitude'])))
df_center_us = pd.DataFrame(data = df_center_us)for index in df_center_us.index:
df_center_us['cluster_label'][index] = index
print_k_means(df_center_us, len(df_center_us['latitude']), "US", "corona_disease_outbreaks_us_centers.png", 500, True, True)
df_center_us.to_csv("corona_disease_outbreaks_us_centers.csv")
df_center_world = {'latitude': [], 'longitude':[] , 'cluster_label': []}for i in range(world_clusters):
df_1 = df_no_world.loc[df_no_world['cluster_label'] == i]
df_1 = df_1.reset_index()del df_1['index']
latitude = []
longitude = []for index in df_1.index:
latitude.append(float(df_1['latitude'][index]))
longitude.append(float(df_1['longitude'][index]))
df_1['latitude'] = latitude
df_1['longitude'] = longitude
sum_latitude = df_1['latitude'].sum()
sum_longitude = df_1['longitude'].sum()if (len(df_1['latitude']) >= 10):
df_center_world['latitude'].append(sum_latitude/(len(df_1['latitude'])))
df_center_world['cluster_label'].append(i)
df_center_world['longitude'].append(sum_longitude/(len(df_1['longitude'])))
df_center_world = pd.DataFrame(data = df_center_world)for index in df_center_world.index:
df_center_world['cluster_label'][index] = index
print_k_means(df_center_world, len(df_center_world['latitude']), "world", "corona_disease_outbreaks_world_centers.png", 500, False, True)
df_center_us.to_csv("corona_disease_outbreaks_world_centers.csv")
Click this link for access to the Github repository for a detailed explanation of the code: Github.
单击此链接可访问Github存储库,以获取代码的详细说明: Github 。
翻译自: https://medium.com/@neuralyte/using-basemap-and-geonamescache-to-plot-k-means-clusters-995847513fc2
java 在底图上绘制线条
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。 如若转载,请注明出处:http://www.pswp.cn/news/389972.shtml 繁体地址,请注明出处:http://hk.pswp.cn/news/389972.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!