Opinion score of a digital text document (DTD)

Given a DTD, this function computes the overall opinion score based on the proportion of text records classified as expressing positive, negative or a neutral sentiment. The function first transforms the text document into a tidy-format dataframe, described as the observed sentiment document (OSD) (Adepeju and Jimoh, 2021), in which each text record is assigned a sentiment class based on the summation of all sentiment scores expressed by the words in the text record.

opi_score(textdoc, metric = 1, fun = NULL)

Arguments

textdoc	An `n` x `1` list (dataframe) of individual text records, where `n` is the total number of individual records.
metric	(an integer) Specify the metric to utilize for the calculation of opinion score. Valid values include `1, 2, ...,5`. Assuming `P`, `N` and `O` represent positive, negative, and neutral record sentiments, respectively, the followings are the details of the opinion score function represented by the numerical arguments above: `1`: Polarity (percentage difference) `((P - N)/(P + N))100`, (Bound: -100%, +100%); `2`: Polarity (proportional difference) `((abs(P - N) / (P + N + O))100`, (Bound: 0, +100%); `3`: Positivity `(P/ (P + N + O))100`, (Bound: 0, +100%); `4`: Negativity `(N / (P + N + O))100`, (Bound: 0, +100%) (Malshe, A. 2019; Lowe et al. 2011). `5`: To pass a user-defined opinion score function (also see the `fun` parameter below.
fun	A user-defined function given that `metric` parameter (above) is set equal to `5`. For example, given a defined opinion score function `myfun` <- `function(P, N, O){` `("some tasks to do")`; `return("a value")}`, the input argument of `fun` parameter then becomes `fun = myfun`. Default: `NULL`.

Value

Returns an opi_object containing details of the opinion measures from the text document.

Details

An opinion score is derived from all the sentiments (i.e. positive, negative (and neutral) expressed within a text document. We deploy a lexicon-based approach (Taboada et al. 2011) using the AFINN lexicon (Nielsen, 2011).

References

(1) Adepeju, M. and Jimoh, F. (2021). An Analytical Framework for Measuring Inequality in the Public Opinions on Policing – Assessing the impacts of COVID-19 Pandemic using Twitter Data. https://doi.org/10.31235/osf.io/c32qh (2) Malshe, A. (2019) Data Analytics Applications. Online book available at: https://ashgreat.github.io/analyticsAppBook/index.html. Date accessed: 15th December 2020. (3) Taboada, M.et al. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), pp.267-307. (4) Lowe, W. et al. (2011). Scaling policy preferences from coded political texts. Legislative studies quarterly, 36(1), pp.123-155. (5) Razorfish (2009) Fluent: The Razorfish Social Influence Marketing Report. Accessed: 24th February, 2021. (6) Nielsen, F. A. (2011), “A new ANEW: Evaluation of a word list for sentiment analysis in microblogs”, Proceedings of the ESWC2011 Workshop on 'Making Sense of Microposts': Big things come in small packages (2011) 93-98.

Examples

# Use police/pandemic posts on Twitter
# Experiment with a standard metric (e.g. metric 1)
score <- opi_score(textdoc = policing_dtd, metric = 1, fun = NULL)
#print result
print(score)
#> $sentiments
#> 
#> 
#> |sentiment | No_of_text_records|
#> |:---------|------------------:|
#> |negative  |                 45|
#> |neutral   |                  1|
#> |positive  |                 40|
#> 
#> $opiscore
#> [1] "-5.88%"
#> 
#> $metric
#> [1] "Polarity (Percentage Difference)"
#> 
#> $equation
#> [1] "((#Positive - #Negative)/(#Positive + #Negative))*100%"
#> 
#> $OSD
#> Warning: `...` is not empty.
#> 
#> We detected these problematic arguments:
#> * `needs_dots`
#> 
#> These dots only exist to allow future extensions and should be empty.
#> Did you misspecify an argument?
#> # A tibble: 86 x 2
#>       ID sentiment
#>    <int> <chr>    
#>  1     1 positive 
#>  2     2 positive 
#>  3     3 negative 
#>  4     4 negative 
#>  5     5 positive 
#>  6     6 positive 
#>  7    10 positive 
#>  8    11 positive 
#>  9    12 positive 
#> 10    13 negative 
#> # ... with 76 more rows
#> 

#Example using a user-defined opinion score -
#a demonstration with a component of SIM opinion
#Score function (by Razorfish, 2009). The opinion
#function can be expressed as:

myfun <- function(P, N, O){
  score <- (P + O - N)/(P + O + N)
return(score)
}

#Run analysis
score <- opi_score(textdoc = policing_dtd, metric = 5, fun = myfun)
#print results
print(score)
#> $sentiments
#> 
#> 
#> |sentiment | No_of_text_records|
#> |:---------|------------------:|
#> |negative  |                 45|
#> |neutral   |                  1|
#> |positive  |                 40|
#> 
#> $opiscore
#> [1] -0.04651163
#> 
#> $metric
#> [1] "User-defined"
#> 
#> $equation
#> function(P, N, O){
#>   score <- (P + O - N)/(P + O + N)
#> return(score)
#> }
#> <environment: 0x00000000476af450>
#> 
#> $OSD
#> Warning: `...` is not empty.
#> 
#> We detected these problematic arguments:
#> * `needs_dots`
#> 
#> These dots only exist to allow future extensions and should be empty.
#> Did you misspecify an argument?
#> # A tibble: 86 x 2
#>       ID sentiment
#>    <int> <chr>    
#>  1     1 positive 
#>  2     2 positive 
#>  3     3 negative 
#>  4     4 negative 
#>  5     5 positive 
#>  6     6 positive 
#>  7    10 positive 
#>  8    11 positive 
#>  9    12 positive 
#> 10    13 negative 
#> # ... with 76 more rows
#>