Fidelity of Statistical Reporting in 10 Years of Cyber Security User Studies

Groß, Thomas

doi:10.1007/978-3-030-55958-8_1

Fidelity of Statistical Reporting in 10 Years of Cyber Security User Studies

Thomas Groß¹⁰

Conference paper
First Online: 10 May 2021

454 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11739))

Abstract

Studies in socio-technical aspects of security often rely on user studies and statistical inferences on investigated relations to make their case. They, thereby, enable practitioners and scientists alike to judge on the validity and reliability of the research undertaken.

To ascertain this capacity, we investigated the reporting fidelity of security user studies.

Based on a systematic literature review of 114 user studies in cyber security from selected venues in the 10 years 2006–2016, we evaluated fidelity of the reporting of 1775 statistical inferences using the R package statcheck. We conducted a systematic classification of incomplete reporting, reporting inconsistencies and decision errors, leading to multinomial logistic regression (MLR) on the impact of publication venue/year as well as a comparison to a compatible field of psychology.

We found that half the cyber security user studies considered reported incomplete results, in stark difference to comparable results in a field of psychology. Our MLR on analysis outcomes yielded a slight increase of likelihood of incomplete tests over time, while SOUPS yielded a few percent greater likelihood to report statistics correctly than other venues.

In this study, we offer the first fully quantitative analysis of the state-of-play of socio-technical studies in security. While we highlight the impact and prevalence of incomplete reporting, we also offer fine-grained diagnostics and recommendations on how to respond to the situation.

Preregistered at the Open Science Framework: osf.io/549qn/.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
osf.io/549qn/.

References

American Psychological Association (ed.): Publication Manual of the American Psychological Association, 6th revised edn. American Psychological Association (2009)
Google Scholar
Coopamootoo, K.P.L., Groß, T.: Cyber security and privacy experiments: a design and reporting toolkit. In: Hansen, M., Kosta, E., Nai-Fovino, I., Fischer-Hübner, S. (eds.) Privacy and Identity 2017. IAICT, vol. 526, pp. 243–262. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92925-5_17
Chapter Google Scholar
Coopamootoo, K., Groß, T.: Systematic evaluation for evidence-based methods in cyber security. Technical report TR-1528, Newcastle University (2017)
Google Scholar
Coopamootoo, K.P.L., Groß, T.: Evidence-based methods for privacy and identity management. In: Lehmann, A., Whitehouse, D., Fischer-Hübner, S., Fritsch, L., Raab, C. (eds.) Privacy and Identity 2016. IAICT, vol. 498, pp. 105–121. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-55783-0_9
Chapter Google Scholar
Cumming, G.: Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Routledge, New York (2013)
Book Google Scholar
Elson, M., Przybylski, A.K.: The science of technology and human behavior - standards old and new. J. Media Psychol. 29(1), 1–7 (2017). https://doi.org/10.1027/1864-1105/a000212
Article Google Scholar
Epskamp, S., Nuijten, M.B.: statcheck: extract statistics from articles and recompute p values (v1.3.0), May 2018. https://CRAN.R-project.org/package=statcheck
Fox, J., Andersen, R.: Effect displays for multinomial and proportional-odds logit models. Sociol. Methodol. 36(1), 225–255 (2006)
Article Google Scholar
Lakens, D.: Checking your stats, and some errors we make, October 2015. http://daniellakens.blogspot.com/2015/10/checking-your-stats-and-some-errors-we.html
LeBel, E.P., McCarthy, R.J., Earp, B.D., Elson, M., Vanpaemel, W.: A unified framework to quantify the credibility of scientific findings. Adv. Methods Pract. Psychol. Sci. 1(3), 389–402 (2018)
Article Google Scholar
Maxion, R.: Making experiments dependable. In: Jones, C.B., Lloyd, J.L. (eds.) Dependable and Historic Computing. LNCS, vol. 6875, pp. 344–357. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24541-1_26
Chapter Google Scholar
Moher, D., et al.: CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. J. Clin. Epidemiol. 63(8), e1–e37 (2010)
Article Google Scholar
Nuijten, M.B., van Assen, M.A., Hartgerink, C.H., Epskamp, S., Wicherts, J.: The validity of the tool “statcheck” in discovering statistical reporting inconsistencies (2017). https://psyarxiv.com/tcxaj/
Nuijten, M.B., Hartgerink, C.H.J., van Assen, M.A.L.M., Epskamp, S., Wicherts, J.M.: The prevalence of statistical reporting errors in psychology (1985–2013). Behav. Res. Methods 48(4), 1205–1226 (2015). https://doi.org/10.3758/s13428-015-0664-2
Article Google Scholar
Peisert, S., Bishop, M.: How to design computer security experiments. In: Futcher, L., Dodge, R. (eds.) WISE 2007. IAICT, vol. 237, pp. 141–148. Springer, New York (2007). https://doi.org/10.1007/978-0-387-73269-5_19
Chapter Google Scholar
Ripley, B., Venables, W.: nnet: feed-forward neural networks and multinomial log-linear models, February 2016. https://CRAN.R-project.org/package=nnet
Schechter, S.: Common pitfalls in writing about security and privacy human subjects experiments, and how to avoid them (2013). https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/commonpitfalls.pdf
Schmidt, T.: Sources of false positives and false negatives in the STATCHECK algorithm: reply to Nuijten et al. (2016). https://arxiv.org/abs/1610.01010

Download references

Acknowledgment

We would like to thank Malte Elson for the discussions on statcheck, on the corresponding analyses in psychology, and on general research methodology. We thank the anonymous reviewers of STAST 2019 for their discussion and insightful comments, as well as the volume co-editor Theo Tryfonas for offering additional pages to include the requested changes.

This study was in parts funded by the UK Research Institute in the Science of Cyber Security (RISCS) under a National Cyber Security Centre (NCSC) grant on “Pathways to Enhancing Evidence-Based Research Methods for Cyber Security” (Pathway I led by Thomas Groß). The author was in parts funded by the ERC Starting Grant CASCAde (GA n^o716980).

Author information

Authors and Affiliations

Newcastle University, Newcastle upon Tyne, UK
Thomas Groß

Authors

Thomas Groß
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Groß .

Editor information

Editors and Affiliations

Computing, Newcastle University, Newcastle upon Tyne, UK
Thomas Groß
Faculty of Engineering, Bristol University, Bristol, UK
Theo Tryfonas

Appendices

A Details on Qualitative Analysis

1.1 A.1 Errors Committed by statcheck

Parsing Accuracy. In all 34 error cases, statcheck parsed the PDF file correctly, and its raw test representation corresponded to the PDF. In all but two tests, statcheck recognized the test correctly. In said two cases, it mistook a non-standard-reported Shapiro-Wilk test as \(\chi ^2\) test, creating two false positives. There was one case in which the statcheck computed p-value for an independent-samples t-test differed slightly from our own calculation, yet only marginally so, presumably because of a unreported Welch correction.

One-Tailed Tests. In seven cases, statcheck recognized one-tailed tests correctly. For three of those tests, the authors framed the hypotheses as one-tailed. In three other tests, the authors used one-tailed test results without declaring their use. There was one additional case in which the authors seemed to have used a one-tailed test, yet the rounding was so far off the one-tailed result that statcheck did not accept it as “valid if one-tailed” any longer. There was one test marked as “one-tail” which statcheck did not recognize as one-tailed, yet that test also suffered from rounding errors.

Dependent-Samples Tests. There were 7 papers using dependent-samples methods (such as matched-pair tests or mixed-methods regressions). We found that statcheck treated the corresponding dependent-samples statistics correctly.

Multiple Comparison Corrections. In three cases, statcheck did not recognize p-values that were correctly Bonferroni-corrected, counting as three false positives. It is an open point, however, how many paper should have employed multiple-comparison corrections, but have not done so, an analysis statcheck does not perform.

1.2 A.2 Errors Committed by Authors

Typos. We considered 6 to be typos or transcription errors (\(18\%\)). Another 1 error seemed to be a copy-paste error (\(3\%\))

Rounding Errors. Of all 34 reported errors, we found 8 to be rounding errors (\(24\%\)).

Miscalculations. We found 13 cases to be erroneous calculations (\(38\%\)).

1.3 A.3 Composition of Incomplete p-Values

Of 1523 incomplete cases, 134 were declared “non-significant” without giving the actual p-value (\(8.8\%\)). Further, 6 were shown as \(p > .05\). (\(0.394\%\)).

Of the incomplete cases, 102 were reported statistically significant at a .05 significance level (\(6.7\%\)).

Of the incomplete cases, 477 were reported statistically significant at a lower significance level of .01, .001, or .0001 (\(31.3\%\)).

Of 1523 incomplete p-values, 680 gave an exact p-value (\(44.6\%\)). Of those exactly reported p-values, half (367) were claimed statistically significant at a significance level of \(\alpha = .05\) (\(54\%\)). Of those exatly reported p-values, 19 claimed an impossible p-value of \(p = 0\) (\(2.79\%\)).

Online Supplementary Materials

We made the materials of the study (specification of the inputted SLR, included sample, contingency tables) publicly available at its Open Science Framework Repository (see Footnote 1).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Groß, T. (2021). Fidelity of Statistical Reporting in 10 Years of Cyber Security User Studies. In: Groß, T., Tryfonas, T. (eds) Socio-Technical Aspects in Security and Trust. STAST 2019. Lecture Notes in Computer Science(), vol 11739. Springer, Cham. https://doi.org/10.1007/978-3-030-55958-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-55958-8_1
Published: 10 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55957-1
Online ISBN: 978-3-030-55958-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics