Skip to main content

Fidelity of Statistical Reporting in 10 Years of Cyber Security User Studies

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11739))

Abstract

Studies in socio-technical aspects of security often rely on user studies and statistical inferences on investigated relations to make their case. They, thereby, enable practitioners and scientists alike to judge on the validity and reliability of the research undertaken.

To ascertain this capacity, we investigated the reporting fidelity of security user studies.

Based on a systematic literature review of 114 user studies in cyber security from selected venues in the 10 years 2006–2016, we evaluated fidelity of the reporting of 1775 statistical inferences using the R package statcheck. We conducted a systematic classification of incomplete reporting, reporting inconsistencies and decision errors, leading to multinomial logistic regression (MLR) on the impact of publication venue/year as well as a comparison to a compatible field of psychology.

We found that half the cyber security user studies considered reported incomplete results, in stark difference to comparable results in a field of psychology. Our MLR on analysis outcomes yielded a slight increase of likelihood of incomplete tests over time, while SOUPS yielded a few percent greater likelihood to report statistics correctly than other venues.

In this study, we offer the first fully quantitative analysis of the state-of-play of socio-technical studies in security. While we highlight the impact and prevalence of incomplete reporting, we also offer fine-grained diagnostics and recommendations on how to respond to the situation.

Preregistered at the Open Science Framework: osf.io/549qn/.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    osf.io/549qn/.

References

  1. American Psychological Association (ed.): Publication Manual of the American Psychological Association, 6th revised edn. American Psychological Association (2009)

    Google Scholar 

  2. Coopamootoo, K.P.L., Groß, T.: Cyber security and privacy experiments: a design and reporting toolkit. In: Hansen, M., Kosta, E., Nai-Fovino, I., Fischer-Hübner, S. (eds.) Privacy and Identity 2017. IAICT, vol. 526, pp. 243–262. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92925-5_17

    Chapter  Google Scholar 

  3. Coopamootoo, K., Groß, T.: Systematic evaluation for evidence-based methods in cyber security. Technical report TR-1528, Newcastle University (2017)

    Google Scholar 

  4. Coopamootoo, K.P.L., Groß, T.: Evidence-based methods for privacy and identity management. In: Lehmann, A., Whitehouse, D., Fischer-Hübner, S., Fritsch, L., Raab, C. (eds.) Privacy and Identity 2016. IAICT, vol. 498, pp. 105–121. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-55783-0_9

    Chapter  Google Scholar 

  5. Cumming, G.: Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Routledge, New York (2013)

    Book  Google Scholar 

  6. Elson, M., Przybylski, A.K.: The science of technology and human behavior - standards old and new. J. Media Psychol. 29(1), 1–7 (2017). https://doi.org/10.1027/1864-1105/a000212

    Article  Google Scholar 

  7. Epskamp, S., Nuijten, M.B.: statcheck: extract statistics from articles and recompute p values (v1.3.0), May 2018. https://CRAN.R-project.org/package=statcheck

  8. Fox, J., Andersen, R.: Effect displays for multinomial and proportional-odds logit models. Sociol. Methodol. 36(1), 225–255 (2006)

    Article  Google Scholar 

  9. Lakens, D.: Checking your stats, and some errors we make, October 2015. http://daniellakens.blogspot.com/2015/10/checking-your-stats-and-some-errors-we.html

  10. LeBel, E.P., McCarthy, R.J., Earp, B.D., Elson, M., Vanpaemel, W.: A unified framework to quantify the credibility of scientific findings. Adv. Methods Pract. Psychol. Sci. 1(3), 389–402 (2018)

    Article  Google Scholar 

  11. Maxion, R.: Making experiments dependable. In: Jones, C.B., Lloyd, J.L. (eds.) Dependable and Historic Computing. LNCS, vol. 6875, pp. 344–357. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24541-1_26

    Chapter  Google Scholar 

  12. Moher, D., et al.: CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. J. Clin. Epidemiol. 63(8), e1–e37 (2010)

    Article  Google Scholar 

  13. Nuijten, M.B., van Assen, M.A., Hartgerink, C.H., Epskamp, S., Wicherts, J.: The validity of the tool “statcheck” in discovering statistical reporting inconsistencies (2017). https://psyarxiv.com/tcxaj/

  14. Nuijten, M.B., Hartgerink, C.H.J., van Assen, M.A.L.M., Epskamp, S., Wicherts, J.M.: The prevalence of statistical reporting errors in psychology (1985–2013). Behav. Res. Methods 48(4), 1205–1226 (2015). https://doi.org/10.3758/s13428-015-0664-2

    Article  Google Scholar 

  15. Peisert, S., Bishop, M.: How to design computer security experiments. In: Futcher, L., Dodge, R. (eds.) WISE 2007. IAICT, vol. 237, pp. 141–148. Springer, New York (2007). https://doi.org/10.1007/978-0-387-73269-5_19

    Chapter  Google Scholar 

  16. Ripley, B., Venables, W.: nnet: feed-forward neural networks and multinomial log-linear models, February 2016. https://CRAN.R-project.org/package=nnet

  17. Schechter, S.: Common pitfalls in writing about security and privacy human subjects experiments, and how to avoid them (2013). https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/commonpitfalls.pdf

  18. Schmidt, T.: Sources of false positives and false negatives in the STATCHECK algorithm: reply to Nuijten et al. (2016). https://arxiv.org/abs/1610.01010

Download references

Acknowledgment

We would like to thank Malte Elson for the discussions on statcheck, on the corresponding analyses in psychology, and on general research methodology. We thank the anonymous reviewers of STAST 2019 for their discussion and insightful comments, as well as the volume co-editor Theo Tryfonas for offering additional pages to include the requested changes.

This study was in parts funded by the UK Research Institute in the Science of Cyber Security (RISCS) under a National Cyber Security Centre (NCSC) grant on “Pathways to Enhancing Evidence-Based Research Methods for Cyber Security” (Pathway I led by Thomas Groß). The author was in parts funded by the ERC Starting Grant CASCAde (GA no716980).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Groß .

Editor information

Editors and Affiliations

Appendices

A Details on Qualitative Analysis

1.1 A.1 Errors Committed by statcheck

Parsing Accuracy. In all 34 error cases, statcheck parsed the PDF file correctly, and its raw test representation corresponded to the PDF. In all but two tests, statcheck recognized the test correctly. In said two cases, it mistook a non-standard-reported Shapiro-Wilk test as \(\chi ^2\) test, creating two false positives. There was one case in which the statcheck computed p-value for an independent-samples t-test differed slightly from our own calculation, yet only marginally so, presumably because of a unreported Welch correction.

One-Tailed Tests. In seven cases, statcheck recognized one-tailed tests correctly. For three of those tests, the authors framed the hypotheses as one-tailed. In three other tests, the authors used one-tailed test results without declaring their use. There was one additional case in which the authors seemed to have used a one-tailed test, yet the rounding was so far off the one-tailed result that statcheck did not accept it as “valid if one-tailed” any longer. There was one test marked as “one-tail” which statcheck did not recognize as one-tailed, yet that test also suffered from rounding errors.

Dependent-Samples Tests. There were 7 papers using dependent-samples methods (such as matched-pair tests or mixed-methods regressions). We found that statcheck treated the corresponding dependent-samples statistics correctly.

Multiple Comparison Corrections. In three cases, statcheck did not recognize p-values that were correctly Bonferroni-corrected, counting as three false positives. It is an open point, however, how many paper should have employed multiple-comparison corrections, but have not done so, an analysis statcheck does not perform.

1.2 A.2 Errors Committed by Authors

Typos. We considered 6 to be typos or transcription errors (\(18\%\)). Another 1 error seemed to be a copy-paste error (\(3\%\))

Rounding Errors. Of all 34 reported errors, we found 8 to be rounding errors (\(24\%\)).

Miscalculations. We found 13 cases to be erroneous calculations (\(38\%\)).

1.3 A.3 Composition of Incomplete p-Values

Of 1523 incomplete cases, 134 were declared “non-significant” without giving the actual p-value (\(8.8\%\)). Further, 6 were shown as \(p > .05\). (\(0.394\%\)).

Of the incomplete cases, 102 were reported statistically significant at a .05 significance level (\(6.7\%\)).

Of the incomplete cases, 477 were reported statistically significant at a lower significance level of .01, .001, or .0001 (\(31.3\%\)).

Of 1523 incomplete p-values, 680 gave an exact p-value (\(44.6\%\)). Of those exactly reported p-values, half (367) were claimed statistically significant at a significance level of \(\alpha = .05\) (\(54\%\)). Of those exatly reported p-values, 19 claimed an impossible p-value of \(p = 0\) (\(2.79\%\)).

Online Supplementary Materials

We made the materials of the study (specification of the inputted SLR, included sample, contingency tables) publicly available at its Open Science Framework Repository (see Footnote 1).

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Groß, T. (2021). Fidelity of Statistical Reporting in 10 Years of Cyber Security User Studies. In: Groß, T., Tryfonas, T. (eds) Socio-Technical Aspects in Security and Trust. STAST 2019. Lecture Notes in Computer Science(), vol 11739. Springer, Cham. https://doi.org/10.1007/978-3-030-55958-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-55958-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-55957-1

  • Online ISBN: 978-3-030-55958-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics