Produces a wordcloud which represents the level of importance of each word (across different text groups) within a text document, according to a specified measure.
word_imp(textdoc, metric= "tf", words_to_filter=NULL)
textdoc | An |
---|---|
metric | (character) The measure for determining the level of
importance of each word within the text document. Options
include |
words_to_filter | A pre-defined vector of words (terms) to
filter out from the DTD prior to highlighting words importance.
default: |
Graphical representation of words importance
according to a specified metric. A wordcloud is used
to represent words importance if tf
is specified, while
facet wrapped histogram is used if tf-idf
is specified.
A wordcloud is represents each word with a size corresponding
to its level of importance. In the facet wrapped histograms
words are ranked in each group (histogram) in their order
of importance.
The function determines the most important words
across various grouping of a text document. The measure
options include the tf
and tf-idf
. The idea of tf
is to rank words in the order of their number of occurrences
across the text document, whereas tf-idf
finds words that
are not used very much, but appear across
many groups in the document.
Silge, J. and Robinson, D. (2016) tidytext: Text mining and analysis using tidy data principles in R. Journal of Open Source Software, 1, 37.
#words to filter out wf <- c("police","policing") output <- word_imp(textdoc = policing_dtd, metric= "tf", words_to_filter= wf)