Introduction

Retraction formally removes a paper from the scientific literature. But retraction does not end the diffusion of a paper or its findings. The continued diffusion of retracted papers has been documented since the 1990s (e.g., Pfeifer and Snodgrass 1990).

Previous research has documented the extent to which the current information environment contributes to continued diffusion of retracted research (Davis 2012). For example, readers may not know of the retraction because they saved a local copy of the paper before it was retracted (Davis 2012). Further, journal publishers often fail to alert readers that an article was retracted. Many databases and search engines, such as Web of Science, Google Scholar and Scopus provide nearly no warning of the retraction in their search results (Teixeira da Silva and Bornemann-Cimenti 2017; van der Vet and Nijveen 2016). In addition, although papers are retracted for different reasons, including self-retraction by the authors and various types of scientific misconduct such as data fabrication, data falsification, unethical author conduct, and plagiarism (Samp et al. 2012), about 10% of retraction notices do not indicate the reason for the retraction (Bozzo et al. 2017; Moylan and Kowalczuk 2016; Vuong 2020), making it difficult for a reader to determine the scope or severity of the problem (Guengerich 2015; Marcus and Oransky 2011).

The rate of retraction rose dramatically from 1997 to 2012, and since 2012, the number of journal article retractions appears to be roughly level at about 4 retractions per 10,000 publications (Brainard 2018). The highest percentages of retractions are found in medicine, life sciences, and chemistry journals (Grieneisen and Zhang 2012). Authors with multiple retractions (Grieneisen and Zhang 2012) and the rising total number of publications (Brainard 2018; Grieneisen and Zhang 2012) impact this rate. Scrutiny is also a factor: Fanelli (2013, p. 6) argues that “journal editors are getting better at identifying and removing papers that are either fraudulent or plainly wrong”.

Such scrutiny, however, does not appear to fall on the bibliographies of papers that cite retracted works. Large numbers of citations may accumulate even after a paper has been retracted. In one analysis, over 30% of the 74,000 citations to 3000 retracted articles accumulated a year or more after retraction (Cor and Sood 2018). Worryingly, scientific problems with a paper do not reduce its post-retraction diffusion; Bar-Ilan and Halevi (2018) found that papers with scientific distortion (errors and misconduct that impact the results of a paper) continued to accrue citations and readers at a faster rate than papers retracted for other reasons (including ethical misconduct such as plagiarism; and administrative error).

A significant limitation in our understanding of post-retraction citation is due to the fact that older studies and longitudinal studies of post-retraction citation are impacted by the rapid changes in the information environment and scholarly practices that have occurred in the past few decades. For instance, it is known that positive post-retraction citations can persist over decades in individual cases—for instance, 24 years for “an article retracted in 1982, but cited in 2006” in the field of psychiatry (Korpela 2010). However, article publication as of 1982 was in print, and up to the early-2000s there was an era of particularly rapid change in scholarly publishing, including the adoption of the Internet for Web-based digital journal publishing: “In 1993, very few scientific, technical, and medical (STM) journals had an electronic version, and yet by 2003, virtually all of them did.” (Renear and Palmer 2009). Studies of scholarly article reading behavior show significant changes from 1977 through 2005 (Tenopir et al. 2009), such as the increased likelihood for scholars to browse across disciplinary lines, and the reduced amount of reading time per paper. This rapid change in paper access and in reading behavior (and likely in citation behavior) means that we still need longitudinal data about positive post-retraction citation in the digital publishing era. Within the digital publishing era, the longest documented instance of positive post-retraction citation we are aware of is 9 years: Bar-Ilan and Halevi documented an article both published and retracted in 2007 receiving only positive post-retraction citations in 2015 and 2016 (Bar-Ilan and Halevi 2017), but clearly marked as retracted, and found by searching ScienceDirect for articles with retraction notices and the word “Retracted” in the title. In general, it is not known to what extent the visibility of retraction relates to or correlates with post-retraction citation (Balhara and Mishra 2014; Kim et al. 2019).

To fill this gap, the present paper uses a case study of long-term post-retraction citation to demonstrate that the current digital publishing and digital library environment communicates retraction status poorly and unevenly. Our case study focuses on a 2005 publication in the field of respiratory medicine (Matsuyama et al. 2005) which was retracted in 2008 (CHEST 2008) for reporting on falsified clinical trial data. We build on, and incorporate data from a previous study (Fulton et al. 2015a), which analyzed persistent post-retraction citation over 6 years, as of 2014. Compared to that prior work, our paper illustrates the potential for extended propagation of misinformation over a citation network; to do this we adopt network analysis methods not used in the previous case study. We extend the prior work with citation context analysis data for 66 additional publications (144 total) from 2006 through 2019 and an updated assessment of the problems with the display of retraction information on 12 digital platforms. We also add a new analysis of the second-generation citations that cite the retracted paper’s direct citations from both a network and citation context perspective, including a second-generation citation context analysis of 152 recent publications likely to spread misinformation from the retracted paper.

Our case study on a paper retracted for presenting fake data from a human clinical trial demonstrates how retracted research can continue to spread and how the current information environment contributes to this problem. This work contributes the longest longitudinal study of continued citation of a retracted paper within the era of digital scientific publication—nearly exclusive (96%) positive post-retraction citation 11 years from the 2008 retraction through 2019. We also demonstrate the possibility for misinformation to spread to a second-generation of publications that do not directly cite the retracted paper itself. The remainder of our paper is organized as follows. First, we discuss related work. Then, we describe our case study and our methods for analyzing the diffusion. We then present our results, discuss them, and finally conclude the paper.

Related work

Citation of retracted papers

Retraction is intended to remove seriously flawed papers from the scientific literature, and papers are not meant to be cited after retraction (Wager et al. 2009, 2019). While intentional post-retraction citation does happen (see e.g., the example on page 13 of Grieneisen and Zhang 2012), citation of research discredited due to misconduct is typically thought to be accidental or careless. Yet continued citation of papers retracted due to misconduct has been repeatedly observed since the 1990s (Bornemann-Cimenti et al. 2016; Fulton et al. 2015a; Garfield and Welljams-Dorof 1990; Kochan and Budd 1992; Korpela 2010; Neale et al. 2007, 2010; Whitely et al. 1994).

A particular challenge in synthesizing research results on post-retraction citation is that different sources of retracted papers and of citations with substantive differences in coverage (Chen et al. 2013) have been used. Post-retraction citation was first studied on the biomedical literature (Garfield and Welljams-Dorof 1990; Pfeifer and Snodgrass 1990; Wright 1991), and in the subsequent decades most citation analyses sourced retracted papers primarily from MEDLINE/PubMed and citations from Web of Science (e.g., Budd et al. 1998; Madlock-Brown and Eichmann 2015; Redman et al. 2008). The most comprehensive identification of retracted papers is probably from Grieneisen and Zhang (2012), which sought to assemble retracted papers from 42 sources, and the Retraction Watch database (Retraction Watch n.d.), which became publicly searchable in 2018. For retrieving citation data, Web of Science has been the most frequent source of citations (e.g., Budd et al. 1998; Garfield and Welljams-Dorof 1990; Madlock-Brown and Eichmann 2015; Pfeifer and Snodgrass 1990; Redman et al. 2008; Wright 1991), with some recent studies using data from Scopus (e.g., Bar-Ilan and Halevi 2018; Budd et al. 2016; Gray et al. 2018; Rubbo et al. 2018; van der Vet and Nijveen 2016), Google Scholar (Jan et al. 2018), or multiple of these sources (e.g., Avenell et al. 2019; Fulton et al. 2015a; Hamilton 2019).

Limited research has been conducted on post-retraction citation beyond biomedicine. Bar-Ilan and Helavi sourced retractions from ScienceDirect (2017, 2018), and in Wray and Andersen’s study of retracted articles published in the journal Science almost all received post-retraction citations (2018). Retracted engineering articles indexed in Web of Science received fewer citations than a set of control articles published in the same journal volume and number (Rubbo et al. 2018). Some arts and humanities articles from Retraction Watch and Scopus have been shown to receive ongoing post-retraction citations as well (Halevi 2019).

The impact of biomedical retractions on clinical research and clinical care has also been analyzed in citation research. In medical retractions, Steen pointed out the particular risks that citation of papers in clinical medicine has for patients and potential study participants (Steen 2011b). Hundreds of secondary research studies with human subjects cited his sample of retracted clinical papers, and 36% of these were post-retraction citations (Steen 2011b). Citation of 12 clinical trials retracted for misconduct has distorted the evidence base in bone health research: conclusions will change for at least 8 of 23 reviews (35%) on nutritional supplements for preventing hip fractures, falls, osteoporosis, and the utility of vitamin D for reducing severity of Parkinson’s disease (Avenell et al. 2019). Conclusions are also affected in five reviews and guidelines from organizations including the American Heart Association, the American College of Physicians, and the U.S. Agency for Healthcare Research and Quality (Avenell et al. 2019). Yet 85% of the post-retraction citations to these same 12 retracted clinical trials “expressed no concern” (Avenell et al. 2019, p. 2). Also potentially propagating clinical error, at least 5 systematic reviews in nursing included and synthesized articles that had already been retracted (Gray et al. 2018). Such citations do not meet consensus medical journal guidelines stating that retracted papers are not to be cited as science (International Committee of Medical Journal Editors 2019).

On average, retracted papers receive fewer post-retraction citations than unretracted papers. Pfeifer and Snodgrass estimated that post-retraction citations were reduced 35%, based on comparing citation profiles of 82 retracted biomedical articles in Index Medicus published from 1973 to 1987 to a cumulative citation curve of all works published in the same years and same journals using ISI SCISEARCH and SCI Journal Citation Reports (Pfeifer and Snodgrass 1990). Furman et al. estimated that retracted papers received 65% fewer citations than the two adjacent papers in the same journal, published immediately before and immediately after the retracted paper, using 677 retracted biomedical articles from 1972 to 2006 from PubMed that could be matched to Web of Science (Furman et al. 2012). Mott et al. found that citations to randomized controlled trials (RCTs) were reduced by 46% 1 year after retraction compared to matched control RCTs published at similar times in the same journal, by using an interrupted time-series analysis of the monthly citations for 218 RCTs with Web of Science citations (Mott et al. 2019). Retraction is also thought to depress research in closely related fields (Azoulay et al. 2015).

Author self-citation of retracted papers

Self-citation has sometimes been associated with post-retraction citation. In an analysis of 740 retracted articles from MEDLINE with citations from Web of Science, Madlock-Brown and Eichmann found that 5% of post-retraction citations (to 135 or 18% of the articles) were self-citations, and only 10% of self-citations also cited the retraction notice (Madlock-Brown and Eichmann 2015). Rubbo et al. analyzed 238 retracted articles in engineering from Web of Science, and compared them to a control group of 236 articles from the same journal volume and number. The retracted articles had more citations on average and more self-citations, and only 14 (5%) of the retraction notices had been cited. Self-citations made up 481/1291 (37%) of the pre-retraction citations and 74/1057 (7%) of the post-retraction citations (Rubbo et al. 2018). Self-citation was more prevalent before retraction (81 or 34%) than post-retraction self-citation (22 or 9%) (Rubbo et al. 2018).

Accessibility and visibility of retracted papers and their retraction notices

In 1991, Wright wrote, “A retraction can…be published or identified in many sources, and still be ignored in all. Librarians, information scientists, and journal editors have found no infallible means of communicating retractions of publication to the users of the literature.” (Wright 1991). Solutions have changed in the transition from printed journals to electronic access: in the early 1990s, libraries’ physical holdings were the focus of retraction alerting (Pfeifer and Snodgrass 1992) while as late as 2000, rubber stamping retractions in print volumes was compelling. Davis advocated a three-pronged approach for reducing citation of retracted papers in 2012: Databases and search engines should alert readers before they read, bibliographic management software should alert authors before writing, and journals should check bibliographies for retracted articles before publishing (Davis 2012).

Davis’ suggestions have still not been completely implemented, 8 years later. The simplest tool for readers, Crossref’s CrossMark, has limited coverage because as of 2020 publishers must pay to participate; CrossMark provides a “Check for updates” button to find retraction status and other corrections directly from PDF or HTML articles (Bornemann-Cimenti et al. 2016; Kim et al. 2019). CrossMark is also only as good as the publisher data, which seems to be lacking. For instance, 3 years after retraction, 22% of MEDLINE-indexed articles retracted in 2008 were not annotated as retracted on publisher websites and PDFs (Decullier et al. 2013). Missing watermarks (Elia et al. 2014) and database errors in Web of Science and PubMed (Schmidt 2018) also limit alerting of readers.

Reference management software packages do not systematically collect or display retraction status, though new tools inside reference management software are emerging, including ReTracker (Cheng et al. 2019) and, more recently, Zotero’s retraction notification based on the Retraction Watch Database (Stillman 2019). As of 2020, not all citation styles indicate how to format a citation to a retracted paper, and manual correction is needed in order to add “[retracted in (citation)]” as required by the American Medical Association Manual of Style when using common reference management software such as EndNote, RefWorks, and Zotero (Suelzer et al. 2019).

Journals do not systematically check bibliographies for retracted articles. When journals do prevent citation of retracted articles it seems noteworthy (Brand et al. 2017). Medical journal guidelines state that “Authors are responsible for checking that none of the references cite retracted articles except in the context of referring to the retraction” (International Committee of Medical Journal Editors 2019) and refer authors to PubMed, which does, however, have some known errors in its retraction indexing (Schmidt 2018).

Correlation between visibility of retraction status and post-retraction citation

An emerging area of research investigates the correlation between how visible the retraction status is, in databases or on retracted articles themselves, and how often articles receive post-retraction citations. As of 2014, 25% of the retraction notices on articles on mental disorders were not available in Web of Science, and the articles without a freely accessible retraction notice were more likely to gain post-retraction citations (Balhara and Mishra 2014). By contrast, indicating retraction in HTML and PDF versions did not significantly impact the number of post-retraction citations published at least 1  year after the retraction notice for 114 retracted articles from KoreaMed, as cited in Scopus and Korea Medical Citation Index (Kim et al. 2019). The reason for the disconcordance of these studies is not immediately obvious. One limitation is that neither of these studies read the text of post-retraction citations to determine whether they were positive. It is particularly important to determine whether positive post-retraction citation and visibility of retraction status are correlated.

Diffusion and persistence of error

Citation analysis is often seen as a means of measuring knowledge flows. Citation contexts, the texts surrounding citations, have been analyzed for a variety of purposes—perhaps the best known applications in the scientometrics community have been to study citation motivation (Small 1982) or a work’s impact and reception (Bornmann et al. 2020). Distortions in information propagation have been studied using similar methods. For example, a study of the “accuracy of quotations” (including paraphrased information) assessed whether the original work was reflected accurately when cited (De Lacey et al. 1985), and subsequent work has suggested that readers may copy reference strings uncritically without reading them (e.g., Stang et al. 2018). Movement of knowledge between languages, including translation error, has been implicated in some cases of misinformation (Wetterer 2006). Persistence in belief has also been studied using citation contexts: a study by Tatsioni et al. (2007) analyzed the persistence of belief in evidence from lower evidence types such as observational studies when these conflict with evidence from randomized controlled trials (RCTs). This relies on the medical hierarchy of evidence (Grimes and Schulz 2002), in which valid RCTs are prioritized above observational studies.Footnote 1 We consider the underlying method, regardless of the motivation or the particular attributes of the text being analyzed, as citation context analysis.

Network science has also been used to study distortions in citation. Greenberg coined the term ‘citation bias’ to refer to selective citation of the literature, and demonstrated how unsubstantiated claims about Alzheimer’s disease were bolstered in papers and in funded NIH grants by citation bias (Greenberg 2009). In such cases, deceptive citation practices may suggest a consensus that does not yet exist. Consensus formation itself has been studied by sociologists of science by examining the extent of segmentation and modularization of the literature, where extreme fragmentation demonstrates “epistemic rivalries” (Shwed and Bearman 2010). Chen and Song (2017) suggest that uncertainty and contradiction need further attention within the study of research fronts and science of science, and propose visualization as a helpful approach.

Combining citation context analysis and network analysis

While some studies primarily analyze citation contexts (e.g., Suelzer et al. 2019) or citation statistics (e.g., Peterson 2013), we are aware of three studies that have drawn on synergies between citation context analysis and network visualization perspectives (Chen et al. 2013; Dinh et al. 2019; van der Vet and Nijveen 2016). The most similar to the present work (van der Vet and Nijveen 2016) presents a case study of citation networks and citation contexts for a single retracted paper that was published in Nature in December 2012 and retracted in February 2014. Van der Vet and Nijveen extracted the closure of the citation network from Scopus twice: shortly after the article was retracted (March 2014, 37 direct citations, 187 total nodes) and again 1 year later (March 2015, 57 direct citations, 1626 total nodes) (van der Vet and Nijveen 2016). Direct citation contexts are categorized by the article sections in which they appear (i.e., Introduction, Methods and Materials, Results, Discussion) and citing articles are classified as review or original contribution; they note that only 2 citations indicated controversy or cited the retraction notice (van der Vet and Nijveen 2016). They also read 10 articles from the indirect citation network, identified using a keyword search for the topic of the retracted paper to look for indirect citations that might spread results from the retracted paper (van der Vet and Nijveen 2016). A particular limitation of van der Vet and Nijveen (2016) is its short timeframe: the 13-month period used is within the washout period for some studies of post-retraction citation. Our study is larger than (van der Vet and Nijveen 2016) in both timeframe (11 years after the retraction notice in our case compared to 13 months in theirs) and in network size (2542 in our second-generation network compared to 1626 in their citation closure network).

By comparison, the present work focuses on the continued direct citation of our case study paper for more than a decade following its 2008 retraction. We analyze both the direct citations and the depth-two citation network, as of 2019, 11 years after retraction. We also systematically assess how visible the retraction is in 12 common digital platforms and document access errors.

Case study

Our case study centers around a single paper (Matsuyama et al. 2005) retracted (CHEST 2008) for presenting fake data from a human clinical trial. The paper was published in December 2005 and retracted in October 2008 because “the university that employs the authors determined that one of the authors, Wataru Matsuyama (now deceased), falsified data” (CHEST 2008). From here on, we will refer to the paper, “Effects of Omega-3 Polyunsaturated Fatty Acids on Inflammatory Markers in COPD” (Matsuyama et al. 2005), as the Matsuyama paper. It is one of 17 papers by Wataru Matsuyama retracted from 2007 to 2010 (Grieneisen and Zhang 2012; Steen 2011a); through mid-2018, he was considered one of the most prolifically retracted authors (Grieneisen and Zhang 2012; The Retraction Watch Leaderboard [Internet Archive, 2018-05-30] 2018).

The Matsuyama paper studied a nutritional intervention for a chronic disease: it purported to show that a diet rich in omega-3 polyunsaturated fatty acids reduced inflammation and improved exercise capacity in persons with chronic obstructive pulmonary disease (COPD). Omega-3 polyunsaturated fatty acids are of particular interest as an adjunct to conventional pharmacotherapies in people with inflammatory diseases such as hypertriglyceridemia and rheumatoid arthritis, due to the cost-effectiveness and favorable safety profile (Calder and Zurier 2001; Chang et al. 2018; Samuel et al. 2011). The design used in the Matsuyama study is the strongest established method for showing treatment efficacy: a randomized controlled trial (RCT). A correctly conducted RCT would better inform clinical practice, as of the time of publication there were only four observational studies investigating the relationship between omega-3 polyunsaturated fatty acids and inflammation in people with COPD (Fulton et al. 2015b), despite promising results in other inflammatory conditions (Calder 2006).

The Matsuyama RCT should have been the seminal RCT on which future clinical research would build and although it is unclear the specific effects of retracting seminal works due to falsified data, it is likely to delay the field of research with researcher focus shifting to other areas. Citations to the Matsuyama paper are particularly problematic. No RCT addressing the same specific aims as the Matsuyama study has been published to date. The evidence base has grown slowly over time and there are now four published RCTs in the general area of omega-3 polyunsaturated fatty acids and COPD; two specifically investigated oral nutritional supplements that contained omega-3 polyunsaturated fatty acids with other supplements such as vitamin D in cachectic COPD patients (Calder et al. 2018; van de Bool et al. 2017), the third focused on COPD exacerbations (Ogasawara et al. 2018) and the fourth was a feasibility study (Fulton et al. 2017). It should be noted that the largest RCT (> 25,000 participants) investigating both vitamin D and fish oil alone and in combination on respiratory symptoms and acute exacerbations in COPD, is expected to be completed in November 2020 (Gold et al. 2016; NCT01728571: LungVITamin D and OmegA-3 Trial (LungVITAL), n.d.). The Matsuyama paper cannot be considered “evidence” since the data was faked. However, at present no other paper at or above this level in the medical evidence hierarchy with the same aim could be cited to support a knowledge claim, for or against the purported (faked) results of the trial. Uncritical citation continues to diffuse the faked data, and implies belief in the efficacy of a yet unvalidated treatment.

Methods and results

Our analysis has three parts, focusing on (1) the direct citations to the Matsuyama paper, (2) the citations to those papers (second-generation citations with respect to the Matsuyama paper), and (3) the visibility of the Matsuyama paper’s retraction status in digital platforms. Data (Schneider and Ye 2019; Ye et al. 2020; Ye and Schneider 2020) and code have also been publicly deposited, and to simplify the presentation, additional details are presented in a supplemental appendix.

Direct citations to the Matsuyama paper

We searched Web of Science and Google Scholar for publications citing the Matsuyama paper. Details are available in the methods supplement to this paper. All forms of publications, published before December 31, 2019, in any language were included.

Network analysis

Our network analysis covers 148 direct citations to the Matsuyama paper (Matsuyama et al. 2005). Articles in fourteen languages cited the Matsuyama paper: Chinese, English, French, German, Greek, Italian, Japanese, Norwegian, Polish, Russian, Serbian, Spanish, Thai, Ukrainian.

Figure 1 shows the year-by-year citation pattern. For the purpose of deeming articles as post-retraction, we use a 2-month washout period after the retraction notice was issued in October 2008 (CHEST 2008). This gives us 32 pre-retraction citations (before January 2009, with the first citations found in 2006) and 116 post-retraction citations (from January 2009 through December 31, 2019).

Fig. 1
figure 1

Number of citations to the Matsuyama paper by year, blue for pre-retraction and washout citations (pre-retraction 2006–October 2008; washout October–December 2008) and red for post-retraction citations 2009-2019. (Color figure online)

Citation context analysis

Of the 148 direct citations to the Matsuyama paper (Matsuyama et al. 2005), ultimately, we examined and report on citation contexts of 144 items (112 post-retraction).

We examined bibliographies to verify that each publication cited the Matsuyama paper and to determine how it was cited in the text. We manually extracted all citation contexts referring to the Matsuyama paper. Primarily we used Google Translate to translate citation contexts.

Following (Fulton et al. 2015a), we annotated three aspects of the citation contexts: whether they were positive or negative; whether they described methods or results of the Matsuyama paper as opposed to citing a general concept (as shown in Table 1); and, for post-retraction publications, whether they mentioned the retraction by using words such as “retraction” or “retracted” in the citation context or cited the retraction notice (CHEST 2008). The data supplement to the present paper provides citation contexts and annotations for the 66 new citations not previously reported in Fulton et al. (2015a).

Table 1 Sample annotations of citation contexts that described methods or results of the Matsuyama paper and that cited a general concept

There were only 5 negative citations, i.e., that refer to the article as poor research (5/144; 3.5%). These are a survey of retracted drug literature (Samp et al. 2012); three articles we (Fulton/Hill) authored, namely a protocol (Fulton et al. 2013), the previous study of post-citation retraction (Fulton et al. 2015a), and a systematic review (Fulton et al. 2015b); and a Cochrane Review (Abdelhamid et al. 2018). We deemed that not mentioning the article in-text but including it in the bibliography counted as positive citation; this applied to five items: four U.S. patents (Jackowski et al. 2012a, b, 2014, 2015) and one Spanish textbook chapter (Lozano et al. 2011) with no in-text citation that listed the Matsuyama paper and that did not cite its retraction notice (CHEST 2008) or mention its retracted status.

Only 5 publications mentioned the retraction (5/112; 4.5%)—these were exactly the same as the negative citations mentioned above. The remaining 107 post-retraction citations would not meet the International Committee of Medical Journal Editors’ guidelines, to not cite retracted work as science (International Committee of Medical Journal Editors 2019).

Table 2 shows how many of the citations were annotated as positive/negative overall and mentioned/not mentioned out of the 144 citations we examined from 2006 to 2019.

Table 2 Number of citations from 2006 to 2019 in the categories (mentioned/not mentioned/published before retraction or in 2-month washout period) and (negative/positive)

Table 3 reports on how in-depth the description of the Matsuyama paper was in the 144 citations from 2006 to 2019. Overall 44 of the 112 post-retraction citations we examined described the methods or results of the Matsuyama paper without mentioning the retraction.

Table 3 We distinguish whether articles describe methods or results of the Matsuyama paper versus cite a general concept versus cite the Matsuyama paper only in their bibliography with no in-text citation

Diffusion of unsubstantiated content

Diffusion into clinical venues is particularly troublesome. This includes a French nutrition treatise (Pison et al. 2016) and a Japanese translation of the European Society for Clinical Nutrition and Metabolism’s Life-long Learning Programme in Clinical Nutrition and Metabolism (Nagahama et al. 2013). Likewise, a book promoting Omega-3 oils contains two chapters with post-retraction citations of the Matsuyama paper (Fitzpatrick 2011; Monk et al. 2011). Most shocking is the 2017 version of textbook materials for the European Society for Clinical Nutrition and Metabolism’s Life-long Learning Programme in Clinical Nutrition and Metabolism (Schols 2017).

We focus next on the potential for diffusion of misinformation into a second generation of citations.

Analysis of second-generation citations to the Matsuyama paper

We searched Web of Science and Google Scholar for publications citing the 148 direct citations to the Matsuyama paper. All forms of publications, published before December 31, 2019, in any language were included. This resulted in the identification of 2542 articles.

Network analysis

Figure 2 shows the network of 148 direct and 2542 second-generation citing articles of the Matsuyama paper. For the second-generation, Fig. 4 represents the worst case: that the faked data in the Matsuyama paper impacts its direct citations as well as each publication citing one of the direct citations.

Fig. 2
figure 2

Citation network centered around the 2005 Matsuyama paper (large black circle, partly obscured). There are 148 direct citations, published from 2006 to 2019 (blue squares) and 2542 second-generation citations to them, published from 2006 to 2019 (small red circles). Some nodes overlap due to the density of the network. (Color figure online)

Second-generation articles in thirty-two languages cited an article that cited the Matsuyama paper: Bosnian, Chinese, Croatian, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Icelandic, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Mongolian, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovenian, Spanish, Swedish, Thai, Turkish, and Ukrainian.

Figure 3 shows the historical growth of the direct and second-generation citation network pre-retraction (including a 2-month washout), from the Matsuyama paper’s publication year (2005) through the end of 2008 (2 months after the retraction notice). The second-generation network has continued to grow after retraction, as shown in Fig. 4.

Fig. 3
figure 3

Growth per year in the pre-retraction citation network of the Matsuyama paper (large black circle), from first year of citation (2006) through 2008 (retraction notice + 2 months). Totals and publications shown are cumulative (2006-given year) for direct citations (blue squares) but limited to new (published in the given year) for second-generation citations (red circles). Arrows point from citing article to cited article. (Color figure online)

Fig. 4
figure 4

Growth per year in the post-retraction citation network of the Matsuyama paper (large black circle, partly obscured), from 2009 through the end of our study, December 31, 2019. Totals and publications shown are cumulative (2006-given year) for direct citations (blue squares) but limited to new (published in the given year) for second-generation citations (red circles). Arrows point from citing article to cited article. (Color figure online)

Diffusion of misinformation from publications discussing methods and results of the Matsuyama paper

Our approach to investigating the possible diffusion of misinformation to a second generation of citations is shown in Fig. 5. This is predicated on the idea that misinformation is most likely to be transmitted by citing a direct citation that describes the methods and results of the Matsuyama paper.

Fig. 5
figure 5

Analysis of second-generation citations that cite a direct citation that describes Matsuyama’s methods and results but doesn’t mention the retraction. The starting point is the full networks shown in Fig. 2

We show two network diagrams. Figure 6 covers all 60 articles describe the methods and results of the Matsuyama paper that do not (post-retraction) or could not (pre-retraction) cite the Matsuyama paper’s retraction notice; there are 1481 second-generation citations to these 60 direct citations. Figure 7 shows the smaller post-retraction network that we sought to analyze using a second-generation citation context analysis. To make a second-generation citation context analysis feasible, we limited our attention to the most recent 35 direct (2010–2019) citations that do not mention the retraction but do mention methods or results of the Matsuyama paper, and the 161 second-generation citations associated with them.

Fig. 6
figure 6

Network of 60 direct citations (blue squares) with citation contexts describing methods and/or results of the Matsuyama paper (large black circle, partly obscured), and their 1481 citations (red circles). These direct citations were published from 2006 to 2019 and second-generation citations were published from 2007 to 2019. Arrows point from citing article to cited article. (Color figure online)

Fig. 7
figure 7

Network corresponding to the second-generation citation context analysis. The Matsuyama paper is shown as a large black circle. Its direct citations are shown as blue squares. The focus is 35 direct citations representing articles from 2010-2019 with citation contexts describing the Matsuyama paper’s methods and/or results, and their 161 citations [red circles, except for 4 second generation citations that are also direct citations: F009, F026, F047, F064 in our supplemental data) hence blue squares] published from 2011 to 2019. Arrows point from citing article to cited article. (Color figure online)

Citation context analysis applied to selected second-generation citations to publications discussing methods and results of the Matsuyama paper

There were 44 direct post-retraction (2009–2019) citations citing methods and results of the Matsuyama paper and their 1481 citations. We limited our attention to the most recent 35 direct (2010–2019) citations that do not mention the retraction but do mention methods or results of the Matsuyama paper. Of their 161 citations—which are second-generation citations from the perspective of the Matsuyama paper—we were able to access 152 second-generation citations. Of these we marked 23 as possibly spreading misinformation, i.e., relying on information in one of Matsuyama’s direct citations that seemed to have come at least in part from the Matsuyama paper. They were spread by 4 different review articles, with the bulk spread by Giudetti and Cagnazzo (2012), cited in 18 of the 23 cases of misinformation, with the other reviews being cited in 1, 2, and 2 papers. These 23 examples are given in the data supplement and we next discuss four of them in detail.

A government research bulletin from Nepal (Jha 2016) cites a book chapter “Health benefits of flaxseed” (Fitzpatrick 2011): “Whilst it is true that very little ALA converts to the long chain polyunsaturated omega-3 found in marine oils, it does have beneficial effects itself (Fitzpatrick 2011).” The Matsuyama study provided ALA (alpha linolenic acid)-rich nutritional support and its retracted results are described in Fitzpatrick (2011) as “evidence” of the anti-inflammatory impact of flax.

As another example, an Irish nutritional support shop recommends n-3 fats to athletes (Healthy Fats, Fish Oils and Omega-3 Supplementation 2017): “During periods of illness, this may help promote recovery and faster return to training. Interestingly, n-3 fats are sometimes provided to COPD patients (severe airway damage and breathing difficulties) and prior to surgery in order to support the immune system and speed recovery by helping to control inflammation and infection, and repair damaged cells [17].” This is on the basis of “Immunologic impact of nutrient depletion in chronic obstructive pulmonary disease” (Herzog and Cunningham-Rundles 2011), which cites the Matsuyama paper as having demonstrated “Improved 6 min walk test, decreased leukotriene B4 level, TNF-alpha, IL-8 [91]” (i.e., the faked data for which the Matsuyama paper was retracted).

A pre-clinical study on lung repair following dust exposure (Nordgren et al. 2018) draws indirectly on the retracted science, via citation to Giudetti and Cagnazzo (2012), to motivate its work: “Furthermore, studies reveal diets high in omega-3 polyunsaturated fatty acids (n-3 PUFA) may be beneficial in inflammatory lung conditions, including asthma and COPD (17).” Giudetti and Cagnazzo discuss evidence for benefit in specific inflammatory lung diseases and devote two paragraphs to the effects on COPD; one paragraph cites Matsuyama, a review article, and an intervention in patients undergoing physical rehabilitation, while the second paragraph provides detailed results from the Matsuyama study. Later this is summarized as “Nutritional interventions with n-3 PUFA supplementation have been shown to be particularly beneficial in patients with COPD [110,111]…” Yet in effect, the only cited “evidence” for the effect of n-3 PUFAs on inflammation in COPD came from the retracted Matsuyama paper.

Possible misinformation also may have spread by moving broader claims related to the Matsuyama paper across fields. As discussed above, we considered COPD-related statements supported by Giudetti and Cagnazzo’s review as possibly spreading misinformation, due to reliance on the Matsuyama paper. Yet the Giudetti and Cagnazzo review, titled “Beneficial effects of n-3 PUFA on chronic airway inflammatory diseases,” went beyond COPD to discuss clinical applications for three related diseases—asthma, cystic fibrosis, and Acute Respiratory Distress Syndrome. A weaker case of possible spread of misinformation is found in Thesing et al. (2018), which moves the study of Omega-3 fatty acids into the realm of anxiety and depression. Its three mentions of Giudetti and Cagnazzo (2012) relate to inflammatory diseases, for instance: “Some randomized controlled trials have shown that intake of N-3 PUFAs ameliorate or even prevent physical illnesses such as inflammatory (Giudetti and Cagnazzo 2012; Simopoulos 2002) and cardiovascular diseases (La Rovere and Christensen 2015; Simopoulos 1999), while others have not (Hoogeveen et al. 2014; Kromhout et al. 2010).” This statement is not exactly wrong, since Thesing mentions the wider class of inflammatory diseases without specifying COPD. But the misinformation within the Giudetti and Cagnazzo review about COPD adds fragility to this statement, especially since it is the only lung-related paper in Thesing’s bibliography. Moving this across communities, from lung disease researchers to a psychiatric research community, may increase the risk of propagating misinformation.

Visibility of the Matsuyama paper’s retraction status in digital platforms

We ask how readers, including citing authors, could become aware that the paper is retracted, by assessing how the Matsuyama paper (Matsuyama et al. 2005) and its retraction notice (CHEST 2008) are displayed in 12 digital platforms. Additional details are given in a methods supplement.

Expectations from industry guidelines

The key industry guidelines for medical articles are the International Committee of Medical Journal Editors (ICMJE) “Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals” (International Committee of Medical Journal Editors 2019) to which the journal publishing the Matsuyama paper, Chest, adheres (International Committee of Medical Journal Editors, n.d.). They direct editors to flowcharts of the Council on Publication Ethics (COPE) for detailed procedures. COPE also publishes retraction guidelines (Wager et al. 2009, 2019), which provide similar guidelines to ICMJE.

As of December 2019, the ICMJE guidelines state that retraction notices “should be prominently labelled, appear on an electronic or numbered print page that is included in an electronic or a print Table of Contents to ensure proper indexing, and include in their heading the title of the original article. Online, the retraction [notice] and original article should be linked in both directions and the retracted article should be clearly labelled as retracted in all its forms.” (International Committee of Medical Journal Editors 2019, p. 9). The retraction notice should indicate its authorship, the reason for the retraction, and “a complete citation reference” to the retracted article (International Committee of Medical Journal Editors 2019, p. 9). COPE does add that “Journals are responsible that retractions [i.e., retraction notices] are labelled in such a way that they are identified by bibliographic databases and should also include a link to the retracted article. The retraction should appear on all online searches for the retracted publication.” (Wager et al. 2019, p. 6).

Expectations for our analysis

Analyzing previous research (Decullier et al. 2013; Elia et al. 2014; K. Wright and McDaid 2011) and industry guidelines (International Committee of Medical Journal Editors 2019; Wager et al. 2019) led us to establish the following visibility expectations for our analysis:

  • All search results for the title of the retracted article should also return the retraction notice.

  • For retracted articlesFootnote 2 on full-text sites

    • Each article landing page, full-text HTML article, and full-text PDF article should have a phrase indicating the retraction status (such as “retracted,” “withdrawn,” etc.) or a watermark indicating the retraction status.

    • Each landing page, full-text HTML article, and full-text PDF article should have a computer-actionable link to the retraction notice.

  • For retraction notices in full-text sites

    • The retraction notice should appear in Table of Contents for the issue in which it appears, with a designated page number.

    • The heading of the retraction notice should include the phrase “retraction notice” and the title of the retracted article.

    • The textual content of the retraction notice should state authorship, reason for retraction, and formally cite the retracted article.

    • Each landing page, full-text HTML notice, and full-text PDF notice should have a computer-actionable link to the retracted article.

  • For database records for retracted articles

    • A phrase indicating the retraction status, such as “retraction,” “retracted,” or “withdrawn” should appear in the article record.

    • The article record should have a computer-actionable link to the retraction notice. This could link to the database’s record for the retraction notice, or to the full-text retraction notice.

    • The article record has sufficient bibliographic information to retrieve the retraction notice.

  • For database records for retraction notices

    • The phrase “retraction notice” should appear in the text of the notice record.

    • The notice record should have a computer-actionable link to the retracted article. This could link to the database’s record for the retracted article, or to the full-text retracted article.

    • The notice record has sufficient bibliographic information to retrieve the retracted article.

All search results for the title of the retracted article should also return the retraction notice

For sites with a search interface, we asked whether the retraction notice (CHEST 2008) appeared in searches for the title of the retracted publication (Matsuyama et al. 2005) Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD.

Searches failed to meet our expectations in several ways.

The Matsuyama article’s retraction status could easily be missed, because the retraction notice appears before the article in only 4 out of the 12 searches (CINAHL EBSCOhost, MEDLINE—Ovid, PubMed, and Web of Science—All Databases).

In Google Scholar and Web of Science—Core collection the retraction notice did not appear at all in default searches as shown in Table 4. At first we were puzzled that the retraction notice did not appear in the Web of Science—Core collection. We suspect that this is due to use of MEDLINE indexing as a quality measure for inclusion in Web of Science—Core, because the record that did appear in the Web of Science—All Databases has the designation “PubMed-not-MEDLINE”.

Table 4 Searches using the default search and entering the (Matsuyama et al. 2005) article title: Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD

For ScienceDirect, the retraction notice did not appear in the first 25 records, the default first page size. The retracted article was first and the retraction notice was listed as result number 82 out of 194 results and not prominently labeled as shown in Fig. 8, making it very unlikely that a user would see the retraction notice.

Fig. 8
figure 8

ScienceDirect shows the retraction notice CITE as record 82 of 194 in the default search for Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD as of January 20, 2020

Non-default search options at ScienceDirect, not shown in the table, were somewhat better. The retraction notice is the fourth result of four on January 20, 2020 when entering the title string in the “Search in this journal” search from the ScienceDirect journal homepage for Chest. However, the retraction notice was not found by entering the title string in ScienceDirect’s Advanced search option “Title, abstract or author-specified keywords”; this search returned just 1 result, the retracted article, on January 20, 2020.

Retracted article on full-text sites

We considered full-text sources to be digital library sites that hosted PDFs. On these sites, we assessed whether each article landing page, full-text HTML article, and full-text PDF article had (1) text such as “retracted” or “retraction” or a watermark indicating the retraction status and (2) a computer-actionable link to the retraction notice. Results are shown in Table 5.

Table 5 Retracted article on full-text sites

No watermarks were used. Retraction text that does appear is not necessarily in the metadata: On Semantic Scholar the Retraction Watch blog post “More evidence scientists continue to cite retracted papers” about (Fulton et al. 2015a) is in the “paper mentions” section. The most prominent information here, from an Ovid record, could still be missed by a skimming reader.

Retraction notice in full-text sites

We found the retraction notice in only two full-text sites. For these, we assessed four aspects of the retraction notice (CHEST 2008), as shown in Table 6: the front matter, the heading content, the textual content, and its actionable links to the article. Regarding the front matter, we asked whether the retraction notice appeared in the Table of Contents for the issue in which it appears, with a designated page number. In the heading we checked for words such as “retraction”, “retraction notice”, “withdrawn”, “withdrawal”, etc.) and the title of the retracted article. In the textual content of the retraction notices we checked for authorship (here, by Chest), reason for retraction (here, “because the university that employs the authors determined that one of the authors, Wataru Matsuyama (now deceased), falsified data.”), and a formal citation to the retracted article. Each landing page, full-text HTML notice, and full-text PDF notice should have a computer-actionable link to the retracted article.

Table 6 Retraction notice (CHEST 2008) at full-text sources

Multiple areas for improvement are evident. The lack of article title in the heading impairs search results as shown in Fig. 8 above. Citation databases do not index the retraction notice (CHEST 2008) as a citing article since it is not a formal citation. Linking to the retracted article, called for in the guidelines (International Committee of Medical Journal Editors 2019; Wager et al. 2019), is also lacking.

Database records for the retracted article

For each database record referring to the article (Matsuyama et al. 2005), we asked whether a word such as “retraction” or “retracted” appeared in the article record; whether the article record included a computer-actionable link to the retraction notice (CHEST 2008); and whether there was sufficient bibliographic information to retrieve the retraction notice (CHEST 2008) manually by volume and issue number. Table 7 shows the results.

Table 7 Database records for the Matsuyama paper (Matsuyama et al. 2005)

While 5/8 databases included a word like “retracted” in the record, with the notable exception of PubMed, the retraction status is poorly signaled. Confirming information from the retraction notice would be difficult from these records, as only 1/8 (MEDLINE-Ovid) had a link to the retraction notice, and only 2/8 had full bibliographic information for the retraction notice.

The CINAHL database did show some improvements during our study; the retraction notice was not found in our searches for Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD on September 27, 2018, but by January 2020, a second record with the retraction notice had appeared.

Database records for the retraction notice

The most problematic results came in attempting to resolve to the full-text of the retraction notice from databases. For each database notice record referring to the retraction notice (CHEST 2008), we asked whether a word such as “retraction notice” appeared in the notice record, and whether the retraction record included a computer-actionable link to the retracted article (Matsuyama et al. 2005); and whether there was sufficient bibliographic information within the record itself to retrieve the retracted article manually by volume and issue number (Matsuyama et al. 2005). Table 8 shows the results.

Table 8 Database records for the retraction notice (Matsuyama et al. 2005). Multiple resolution errors are due to presence of multiple link resolver buttons from a database

Resolving errors from databases show a significant challenge in a reader reaching the retraction notice via a database search. Only 1/8 databases (and 1/9 database records), again MEDLINE Ovid, consistently resolved the retraction notice to its full-text correctly in our tests. Errors are varied, as shown in Fig. 8 above and Figs. 9, 10, 11 and 12 below. Most errors provided dead ends with no further information, though some redirected to the journal homepage or to a search page (Fig. 10).

Fig. 9
figure 9

The ChestNet error page stating “This page does not exist”, retrieved from the “Check article availability” link from the EBSCOhost CINAHL record for the retraction notice (CHEST 2008) on January 20, 2020

Fig. 10
figure 10

The ScienceDirect Error page stating “No results found”, retrieved from EMBASE—record 1 (copyright Elsevier) for (CHEST 2008) via the FindItUIC ScienceDirect journals link on January 20, 2020

Fig. 11
figure 11

The Elsevier errata notice for a different article (Acute ST-Segment Elevation Myocardial Infarction) (https://doi.org/10.1016/S0012-3692(08)60334-7), retrieved from the “Full text on publisher’s website” from the EMBASE Elsevier record for the retraction notice (CHEST 2008) on January 20, 2020

Fig. 12
figure 12

The Elsevier error page (https://linkinghub.elsevier.com/retrieve/pii/134/4/893-a) stating “Requested article is not found in IHub.”, retrieved from the link out from the PubMed page for the retraction notice (https://www.ncbi.nlm.nih.gov/pubmed/18842931) (CHEST 2008) on January 20, 2020

Delays in updating were also a factor; for instance, the Cochrane record for the retraction notice indicates that it was added October 31, 2014, 6 years after the retraction notice was published in October 2008. There are several inaccuracies: Scopus lists the retraction notice as an erratum, which it is not. Metadata errors such as an incorrect page number (892 in place of the expected 893) or inclusion of an author name hindered resolution in some cases, as is most evident in Fig. 10.

Due to these errors, readers looking for the retraction notice would either need to use search results (challenging as noted above), type in the DOI (if they somehow find it), or navigate through volume, issue, and page number in order to find the retraction notice.

Discussion and conclusions

Our case study showed that a purported nutritional treatment for lung disease has received continued citation for 11 years after its formal retraction from the literature in 2008, when data from the human trial promoting this treatment was deemed fake. We have demonstrated the spread of misinformation from a unique knowledge claim, that could not be supported by alternative data at the same level of evidence. This diffusion of misinformation has the potential to harm patients, through promoting a nutritional treatment that has not yet been established through strong, concordant evidence, and which may be used as an adjunct to pharmacotherapy. It is also conceivable that, based on the publicity of the outcomes from the Matsuyama study, some individuals with COPD may perceive omega-3 supplements as a possible natural alternative to pharmacotherapy.

Information about the purported treatment benefit has flowed widely to researchers, but also to clinical audiences, through educational modules (Nagahama et al. 2013; Schols 2008, 2013, 2017), clinical nutrition reviews (Fitzpatrick 2011; Hall III et al. 2009), and textbooks (Pison et al. 2016), including direct translation of educational modules (Nagahama et al. 2013; Schols 2013) from English into other languages. Information directed to the general public has not been corrected (Arnold 2009a, b, 2010). Researchers specifically citing the results of the Matsuyama paper seem unaware of the fact that the paper has been retracted: only 5 direct citations to the Matsuyama paper mentioned the retraction (Abdelhamid et al. 2018; Fulton et al. 2013, 2015a, b; Samp et al. 2012), and those same 5 alone described it as poor research. The retraction is not mentioned in 96% (107/112) of direct post-retraction citations we were able to examine. As of the current writing, the post-retraction citation continues unchecked, with eight additional new 2019 citations (Al-Haidose 2019; Cai et al. 2019; Collins et al. 2019; Hani 2019; Nguyen 2019; Omar et al. 2019; Ran et al. 2019; Wang and Wu 2019), including two 2019 Ph.D. dissertations (Al-Haidose 2019; Nguyen 2019); and, of the five peer-reviewed articles, one (Wang and Wu 2019) was published by Elsevier, who had published the retraction notice 11 years earlier in their journal Chest. Authors, editors, and publishers still do not have adequate tools to identify and flag retractions.

Unlike the average retracted RCT, the Matsuyama paper’s citations increased substantially after retraction. Other studies (Azoulay et al. 2015; Furman et al. 2012; Mott et al. 2019) have found decreases in citation after retraction. In particular, (Mott et al. 2019), suggested that awareness of retraction status (such as through media attention) decreases citation. Mott analyzed citations to a set of 218 retracted RCTs, using the Web of Science citations received in a 4-year period centered around retraction. The articles were split into two subgroups: 154 RCTs that received significant media attention when retracted due to 2 high-profile misconduct cases (Fujii and Boldt) and 64 non-high profile retracted RCTs. The high-profile cases received fewer citations: Fujii and Boldt’s RCTs (mean 5.2) as compared to non-high profile retracted RCTs (mean 21.7). The articles were less cited after retraction (e.g., citations to the 64 RCTs decreased 1.8% per month, compared to the trend in matched controls). The Matsuyama paper we studied was on par with the 21.7 mean for non-high-profile cases in Mott’s study, in overall number of citations in Web of Science: 21 citations, 7 in the 24 months before its October 2008 retraction, and 14 in the following 24 months. However, the trend of citations actually increased, which seems consistent with a lack of awareness of the Matsuyama paper’s retraction status.

The wider information environment is implicated in the spread of the Matsuyama paper “Effects of omega-3 polyunsaturated fatty acids on inflammatory markers in COPD” and its faked results. Unknowing citation of a retracted paper is never acceptable. Yet a paper’s retraction status is not always evident to readers. Even subscription databases and publisher websites may lack up-to-date information about a retraction status. In the case of the Matsuyama paper, 11 years after its retraction, basic information that readers need, such as the retraction status, is still missing from several providers, and is difficult to understand at other providers. In particular, retraction status was not indicated on any PDF or HTML full-text. Further, resolving errors from databases show a significant challenge in a reader reaching the retraction notice via a database search, in that only 1/8 databases (and 1/9 database records) consistently resolved the retraction notice to its full-text correctly in our tests.

It is striking to compare our case paper to an also-retracted paper which cites our case paper and was written by some of the same authors: “Use of Tiotropium Bromide for Pre-operative Treatment in Chronic Obstructive Pulmonary Disease Patients: Comparison with Oxitropium Bromide”. It had 11 citations from Google Scholar and Web of Science as of December 2019. Of these 11, three citations—its retraction notice (Matsuyama et al. 2008) (self-retracted for data problems) and 2 studies of retracted publications—cite with awareness of the retraction. Of the remaining 8 citations, 3 were in the year of retraction, while the remaining 5 citations in Greek, Chinese, and English from 2011 to 2017 are clearly post-retraction but only available in Google Scholar. This fits the picture of the lower impact factor of the journal (consistently Q3 in its JCR impact factor category, Medicine, General and Internal—SCIE, since 2001), but may also be impacted by the visibility of the retraction. On Web of Science, the retraction notice (Matsuyama et al. 2008) appears in search results for the article’s title, and in the citations. As of early 2020, the article’s abstract on the publisher website, J-STAGE, reads “This article was retracted. See the Notification.”, the HTML links to the retraction notice (Matsuyama et al. 2008), and the first page of the PDF includes a copy of the retraction notice itself. Visibility of the retraction notice is of course not the only difference between this article and our case paper but it is suggestive.

The current information environment facilitates the spread of research papers, but basic facts about these papers, such as their retraction status, do not spread as swiftly as the PDFs or citations to these papers themselves. Our case study suggests that unknowing and likely unintentional citation of retracted papers could be common, and that post-retraction citation may be correlated with visibility of retraction status. However, this work is limited to evaluation of a single case (N = 1). Future research should conduct large-scale examination of the correlation between visibility of retraction status and post-retraction citation, especially positive post-retraction citation.

Improving this situation will take dedicated efforts from multiple stakeholders in the scholarly communication ecosystem. While continued citation of retracted papers may be acceptable in rare circumstances, authors, editors and publishers should ensure that this happens only in full knowledge of the retraction. Citations to retracted papers should clearly mark the paper as retracted, and citation of the retraction notice should be encouraged as an alternative to direct citation of retracted papers. Citation standards should be explicit, and bibliographic management tools should follow existing standards (Suelzer et al. 2019). Publishers must bear responsibility for clearly marking retracted papers, by watermarking their own copies in all formats (e.g., PDF, HTML, EPUB); by prominently linking to the retraction notice wherever the paper appears, including the article landing page and issue table of contents; and by providing accurate metadata, including the retraction status, to partners. Publishers and typesetters should also surveil bibliographies for papers recognized as retracted, for follow up editorial action, whether to acknowledge the retraction or to replace the cited paper. Database providers and aggregators must demand up-to-date metadata from publishers, and should consider partnering with alternative metadata producers (such as the Retraction Watch DatabaseFootnote 3). Parties responsible for the retraction, including authors for self-retraction and investigative committees for misconduct-related retraction, should search citation databases and notify citing authors directly to the retraction notice, for follow-up action (such as corrections, retrenchment in new citations, etc.). This is particularly important for papers whose conclusions fundamentally depend on retracted work (Fu and Schneider 2020). While currently no tools provide notification to the authors of pre-retraction citations, such tools could also be beneficial.

Quality assessment of digital libraries must ensure that known problems in a paper’s validity or reproducibility, as documented by retraction, are as evident to readers as standard information such as a paper’s title, authors, or venue. While the diffusion of information is often studied at a macro-level, for understanding the flow of information across geographic, linguistic, or field-specific barriers, this paper demonstrates how mixed methods can be used to document the diffusion of a single paper and its ideas, through network analysis and citation analysis. For retracted papers, this detailed assessment of diffusion is justified by the desire to check the spread of misinformation.

In conclusion, to date there are no published high quality RCTs reporting on the effect of omega-3 polyunsaturated fatty acids alone on inflammation and exercise capacity in people with COPD (the aims of the Matsuyama paper) (Fulton et al. 2015b; Scoditti et al. 2019). While the effects of the continued citation of the retracted Matsuyama paper on the research field are difficult to quantify, it is conceivable that, believing this research question to be answered, scientists may have chosen to pursue less-well established research directions, leading to significant delays in determining the true effect. The continued citation of this paper without acknowledgement of the retraction seems to support this notion and possibly will not be resolved until this question is answered in a properly conducted RCT.