

Given all of the news recently about the American troop withdrawal from Afghanistan, we will focus on news about the United States for this project. Scrape a Set of Articles From Different News Sources

Now that we see the possibilities here, let’s begin to make our own word clouds. As we can see from the word cloud of this paper, we can guess that the encyclical concerns matters of the planet and humanity, that there is some sort of problem and something must be done to help the planet, maybe for the good of ‘us’ or humanity, maybe for God as well. As we see, creating a word cloud with the help of Newspaper3k and data analysis can give us a lot of information about a text in a single picture. The document is about 250 pages, however we can very quickly get the gist of what the encyclical is about by looking at the 100 most-used words in the paper:ĭepending on the text we are analyzing, we can maybe even determine the basic theme or arguments of the paper just from looking at a word cloud. To take it further, companies could combine this with sentiment analysis to find out which of their products are written about the most and how positively or negatively viewed they are.įor example, here is a word cloud from ‘Laudato Si’, a Vatican encyclical put out 6 years ago. For example, if you are able to get text of speeches or writings of a public figure, you can easily visualize the most important topics that are covered with a word cloud.

Word cloudsmay not be the most penetrating way to analyze text data but, can be a very engaging and simple means for analyzing text data and discovering words or common word patterns that frequently appear. In this article, we are going to scrape a series of articles from several different news sources and once we have extracted the keywords from each of the articles we can create a word cloud that displays the most important topics of the day from the keywords obtained from each article using Newspaper3k. You can check out the full code on GitHub here. Here I’m going to demonstrate for you a project which takes articles from a set of different news agencies, picks out the most used words from them, and shows a word cloud of the results with the help of NLP and Matplotlib. To carry on from our introduction to Newspaper3k, we can now take our basic knowledge and realize the possibilities of what we can do with this library.
