There are known technical problems, which are still under investigation.

Impact Factors

Counting the citations is exhaustive and it is unlikely to miss the intended citations. However, we still have some difficulties in identifying citable documents in a few journals such as Science. For most journals, we have successfully separated citable documents (peer-reviewed articles) from non-citable documents (e.g., editorials, news articles, opinions, etc.), but there are a few in which the document structures are very similar. Therefore, the impact factors of these journals (which are not many) are underestimated.

Non-English Languages

Although the NLP system has been specifically designed for English and we only parse English text, there are items which should not be affected by this limitation. The names and affiliations of authors are often written by non-ASCII characters, particularly in European languages. We convert all coding systems into UTF-8 for this purpose. However, depending on the source, this conversion might not be perfect. Therefore, the estimations associated with such names and affiliations might be underestimated.