Visualizing Textual Data in R

INTRODUCTION

Technology has allowed us to digitize text, and this has led to the development of tools to analyze and visualize information stored in text. One of the questions that comes to our mind is how do we utilize the information stored in text. The following are some of the applications and examples of visualization:

In this document, we will study how to quickly generate a word cloud in R. A word cloud is simply a graphical representation in which the size of the font used for the word corresponds to its frequency relative to others. Bigger the size of the word, higher is its frequency. Color, in this document, does not have any interpretation. The text is not formatted or processed using techniques such as stemming or by removal of stop words.

Getting ready

In order to generate a simple word cloud, we will first install the necessary packages in R using the install.packages()and library() functions:

install.packages(c("wordcloud","RColorBrewer","tm"))

library(wordcloud)

library(RColorBrewer)

library(tm)

The RColorBrewer package provides us with a range of color palettes that can be used via the brewer.pal() function in our word cloud:

pal = brewer.pal(6,"RdGy")

 

Getting the word cloud

Finally, we can generate a word cloud by passing the text as an argument in the wordcloud() function:

wordcloud("I also want to thank all the members of Congress and my administration who are here today for the wonderful work that
they do. I want to thank Mayor Gray and everyone here at THEARC for having me.", min.freq = 1, scale=c(2,0.5), random.color = TRUE, color = pal)

The first step in our recipe is to load the required packages in R. We then create a color palette using the RcolorBrewer package. The brewer.pal() function has two arguments; the first argument is the number of different colors we would like to use in our visualization and the second argument is the color palette. In order to view a list of all available color palettes, please refer to the RColorBrewer package manual at http://cran.rproject.org/web/packages/RColorBrewer/RColorBrewer.pdf. We are now ready to plot a word cloud using the wordcloud() function. The first argument in the wordcloud() function is the text itself. In the next few recipes, we will learn how we
can use an actual document in place of actual text. The second argument is min.freq, which allows us to put a lower limit on the number of words to be plotted. We have used min.freq=1 to plot all the words. The scale argument allows us to provide a range for the size of words, and finally, the col argument is used to pass the color palette generated via the brewer.pal() function.

You can produce a word cloud by using a document. To do this see https://cran.r-project.org/web/packages/wordcloud/wordcloud.pdf.

Critical thinking

Interpret the word cloud.