Automation of news and its entities extraction
Implementation of an automatic download of the news feeds from different sources, extraction of their entities through NLP and their subsequent visualization in DOMO.
Automation of the news search process from different sources and integration into a single space, as well as the location of the entities within them for subsequent viewing.
The challenge to be faced is to reduce the time it takes to search for news in different media, facilitating the location of news related to topics of interest automatically without having to navigate through the different pages of the different newspapers, by extracting news content and headline entities.
The creation of new news alerts on topics that may arouse interest is also proposed as an objective.
A search and display solution for news from different media.
From the RSS connectors provided by DOMO, it has been possible to automatically download news, giving rise to data sets that will be processed to store the data that interests us.
This Dataset will then feed an imported NLP algorithm into a Python script executed in the Jupyter Workspace provided by DOMO, to obtain the entities of each news item in another output Dataset.
Finally, a dashboard is created in which the news statistics and content are shown, allowing filters by the date of publication, source of the news, by the entities detected...
In addition, DOMO also allows the creation of alerts that could notify, for example, when news is added on a topic of our interest.
Panel's experience in the field of process integration and automation has allowed to maximize the use of DOMO platform.
Realization of dashboards
DOMO | python | RSS Connectors | NLP
Panel Data Science Methodology
POINTS OF INTEREST
The easy loading of feeds from the sources of interest and their automation.
Reduction of the time necessary to carry out the news extraction process due to the automation capacity that DOMO allows us.
The possibility of integrating the connection to the urls of the sources, the subsequent processing of the news through the ETL and the Jupyter notebooks and the visualization of the results, all within the same tool.
The creation of alerts that allows reducing dependence on the dashboards, being able to be reviewed only when the alerts of interest are generated.
Examples of KPIs used:
- Total number of news downloaded daily.
- Number of relevant news (according to predefined criteria) downloaded daily.
- Percentage of downloaded news with at least one relevant entity detected.
- Average news processing time (from download to display).
- Number of alerts generated by topics of interest.
- Number of active alerts at a given time.
- Percentage of alerts that result in the display of relevant news.
- Number of entities detected per news item.
- Number of news sources connected to the system.
- Number of views of news and entity statistics in the dashboard.