This function simulates the expectation distribution of the observed opinion score (computed using the opi_score function). The resulting tidy-format dataframe can be described as the expected sentiment document (ESD) (Adepeju and Jimoh, 2021).

opi_sim(osd_data, nsim=99, metric = 1, fun = NULL, quiet=TRUE)

Arguments

osd_data

A list (dataframe). An n x 3 OSD, in which n represents the length of the text records that have been successfully classified as expressing positive, negative or a neutral sentiment. Column 1 of the OSD is the text record ID, column 2 shows the sentiment classes (i.e. positive, negative, or neutral), while column 3 contains two variables: present and absent indicating records that include and records that do not include any of the specified theme keywords, respectively.

nsim

(an integer) Number of replicas (ESD) to simulate. Recommended values are: 99, 999, 9999, and so on. Since the run time is proportional to the number of replicas, a moderate number of simulation, such as 999, is recommended. Default: 99.

metric

(an integer) Specify the metric to utilize for the calculation of the opinion score. Default: 1. See details in the documentation of opi_score function. The input argument here must correspond to that of opi_score function in order to compute a statistical significance value (p-value).

fun

A user-defined function given that parameter metric is set equal to 5. See details in the documentation of the opi_score function.

quiet

(TRUE or FALSE) To suppress processing messages. Default: TRUE.

Value

Returns a list of expected opinion scores with length equal to the number of simulation (nsim) specified.

Details

Employs non-parametric randomization testing approach in order to generate the expectation distribution of the observed opinion scores (see details in Adepeju and Jimoh 2021).

References

(1) Adepeju, M. and Jimoh, F. (2021). An Analytical Framework for Measuring Inequality in the Public Opinions on Policing – Assessing the impacts of COVID-19 Pandemic using Twitter Data. https://doi.org/10.31235/osf.io/c32qh

Examples

#Prepare an osd data from the output #of `opi_score` function. score <- opi_score(textdoc = policing_dtd, metric = 1, fun = NULL) #extract OSD OSD <- score$OSD #note that `OSD` is shorter in length #than `policing_dtd`, meaning that some #text records were not classified #Bind a fictitious indicator column osd_data2 <- data.frame(cbind(OSD, keywords = sample(c("present","absent"), nrow(OSD), replace=TRUE, c(0.35, 0.65)))) #generate expected distribution exp_score <- opi_sim(osd_data2, nsim=99, metric = 1, fun = NULL, quiet=TRUE)
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#> Adding missing grouping variables: `keywords`
#preview the distribution hist(exp_score)