开始实时主题聚类

我们最近公布了这是迄今为止最大的功能展示．在这个版本中，我们引入了通过实时集群的相似性将新闻文章分组的功能。

这个新特性使News API用户能够:

确定新闻领域中正在发展的感兴趣的话题
识别世界新闻内容中正在发展的感兴趣的事件
去重新闻流中的内容

您可以阅读更多关于完整特性发布版的信息在这里．

什么是实时集群?

real-time-clustering

实时集群是一种专有的集群模型，它利用丰富数据和其他信号的组合来实时地将涉及同一事件或主题的文章分组。

这种能力使News API用户可以通过以下方式大大提高应用程序的效率和准确性:

识别新闻中重要事件/感兴趣的话题
发现在新闻中展开的趋势事件/主题
追踪突发新闻事件或话题
总结事件/话题细节
询问和调查这些事件/主题
彻底删除新闻文章
屏蔽显示给用户/分析师的feed中的噪音

什么是集群?

集群是故事的集合，根据它们的相似程度分组在一起。在News API中，集群是一个JSON对象，它提供一个集群ID以及关于该集群中新闻的元数据。

集群具有以下属性:

News API中的唯一ID
一个或多个与之相关的故事
一个故事永远只属于一个集群
故事的预测地点
最早和最新故事的时间戳
一个具有代表性的“英雄”故事，最好地总结了集群所涉及的事件

开始使用实时集群

建立和运行实时集群非常容易，在News API中有许多检索集群的方法，我们将在下面指导您。更多的技术描述和代码，请参阅我们的集群的文档．

集群需要高级或企业许可密钥。开始免费试用获取API证书或联系销售升级您的帐户。

使用clusters端点检索集群

集群端点允许您搜索由在特定时间框架内发布的文章组成的集群—这对于监视“突发”新闻事件或获得一定程度报道的新兴主题特别有用。

每个集群都附带关于组成该集群的故事数量的元数据，这使得发现和识别新的或正在增长的集群变得很容易。通常情况下，集群的大小可以是一个新的或重要的事件展开的强有力的指示器。其他筛选选项包括源位置，允许您轻松地构建本地化集群搜索。

我们的集群端点用作发现过程中的第一步，以确定对您重要的集群。一旦检索到集群对象，就可以使用Stories端点的id查询它们，以收集与每个集群关联的故事(文章)。下面是使用Python SDK查询集群端点的示例。

在这个查询中，我们对Clusters端点进行一个简单的调用，查找过去6小时内发布的10个或更多故事的集群。

导入操作系统从aylien_news_api导入aylien_news_api。导入pprint配置= aylien_news_api.Configuration()配置。api_key['X-AYLIEN-NewsAPI-Application-ID'] = os. environment .get('NEWSAPI_APP_ID')配置。api_key['X-AYLIEN-NewsAPI-Application-Key'] = os. environment .get('NEWSAPI_APP_KEY') client = aylien_news_api.ApiClient(configuration) api_instance = aylien_news_api.DefaultApi(client) try: api_response = api_instance。list_clusters(time_end='NOW-6HOURS'， story_count_min=10) pprint(api_response) except ApiException as e: print("调用DefaultApi->list_clusters: %s " % e)

使用趋势端点检索集群

你也可以使用趋势端点检索集群信息。Trends端点允许您根据集群中包含的故事过滤集群。例如，您可以筛选包含具有特定类别标签的故事、提及特定实体或甚至具有特定情感评分的集群。

这种方法对于实时识别关于特定主题或实体的事件非常有用。

Trends端点返回按与每个集群关联的故事计数排序的集群ID。有了集群ID后，就可以从集群端点获得集群元数据，然后从stories端点获得每个集群的故事。

一个故事永远只属于一个集群
故事和集群之间的关系不会改变—以后它不会被重新分配到另一个集群

请记住，使用这种方法，您的查询将被限制在前100个集群(按集群大小)。如果希望运行一个需要实时监控的进程，应该确保查询非常具体，并且覆盖足够小的时间间隔，以检索所有相关集群。

From __future__ import print_function import time import aylien_news_api From aylien_news_api。rest import ApiException from pprint import pprint configuration = aylien_news_api.Configuration() #配置API密钥授权:app_id配置。api_key['X-AYLIEN-NewsAPI-Application-ID'] = 'YOUR_API_KEY' configuration = aylien_news_api.Configuration() #配置API密钥授权:app_key配置。api_key['X-AYLIEN-NewsAPI-Application-Key'] = 'YOUR_API_KEY'配置。创建API类的一个实例api_instance = aylien_news_api.DefaultApi(aylien_news_api.ApiClient(configuration))list_trends(field='clusters'， categores_taxonomy ='iptc-subjectcode'， categores_id =['11000000']， publishhed_at_end ='NOW-12HOURS'， entities_body_links_dbpedia=['http://dbpedia.org/resource/United_States_Congress'])返回[条目]。响应项的值。trends] """返回一个给定集群的代表性故事、故事数量和时间值""" def get_cluster_metadata(cluster_id): response = api_instance。List_clusters (id=[cluster_id]) clusters = response。集群如果集群是None或len(集群)== 0:返回None first_cluster =集群[0]返回{"cluster": first_cluster。“representative_story”:first_cluster id。“story_count”:first_cluster representative_story。“时间”:first_cluster story_count。time} def get_top_stories(cluster_id): """从排名最高的发布者返回与集群关联的3个故事""" response = api_instance。list_stories(集群= [cluster_id], sort_by = " source.rankings.alexa.rank。， per_page=3)返回响应。在cluster_ids中为cluster_id的集群= {}cluster_ids = get_cluster_from_trends(): metadata = get_cluster_metadata(cluster_id)如果元数据不是None: stories = get_top_stories(cluster_id) metadata["stories"] = stories pprint(metadata) else: print("{} empty".format(cluster_id))

使用Stories端点检索集群

使用Stories端点，您可以收集与查询匹配的新闻文章流或列表，并使用附加到故事的集群ID通过集群对它们进行分组。在使用信息流或报时，这对于重复删除新闻文章很有用，因为您可以“崩溃”正在监控的实时信息流中的故事。

例如，下面的片段检索了最近6个小时内发表的提到唐纳德·特朗普的故事，并将所有故事按它们被分配到的集群分组。

From __future__ import print_function import time import aylien_news_api From aylien_news_api。rest import ApiException from pprint import pprint configuration = aylien_news_api.Configuration() #配置API密钥授权:app_id配置。api_key['X-AYLIEN-NewsAPI-Application-ID'] = 'YOUR_API_KEY' configuration = aylien_news_api.Configuration() #配置API密钥授权:app_key配置。api_key['X-AYLIEN-NewsAPI-Application-Key'] = 'YOUR_API_KEY'配置。host = "https://api.www.daredevilro.net/news" #创建一个API类的实例api_instance = aylien_news_api.DefaultApi(aylien_news_api.ApiClient(配置))def get_stories(): """返回一个故事对象列表""" response = api_instance。list_stories(title='Donald Trump'， publishhed_at_start ='NOW-6HOURS'， per_page=100)返回回复。故事故事= get_stories() clustered_stories = {} cluster = [] for故事中的故事:if len(story.clusters) > 0: cluster =故事。clustered_stories[cluster] =[故事。输出(cluster, len(clustered_stories[cluster])， clustered_stories[cluster][0])