Time Aware Ranking Algorithm for Scientific Publications
Moath Dawood Mahmoud Abudayeh
معاذ داود محمود ابودية
After the enormous evolution of the Web, the emergence of digital information resources and the universities and libraries have allowed access to their contents via the Internet, a large amount of data became available to users to conduct searches and queries. However, this huge content made it difficult to quickly access the required data using traditional search methods. These methods depend on matching keywords or determining the extent of relevance. As a result, the need for ranking algorithms emerged in information retrieval systems. The terms ranking and evaluation are related because the ranking process is based on certain evaluation criteria and indicators. One of the most widely used algorithms for ranking scientific publications is the PageRank algorithm. It evaluates publications using popularity metrics based on the linking analysis approach. However, this algorithm was designed mainly to rank Web pages rather than scientific publications. Therefore, due to the different nature of Web networks and citation networks, it resulted in unfair rankings and bias in favor of old publications. The reason for this bias is in its heavy reliance on the number of citations as an indicator of popularity. This study focuses on solving the problem of bias in favor to old publications by introducing a new indicator called Citation Change Rate and integrating it with PageRank algorithm. Time information such as publication date and citation occurrence time are used along with citation data in the ranking process in order to produce time aware rankings. The proposed ranking method was tested on a dataset of scientific papers in the field of medical physics. They were published in the Dimensions database from 2005 to 2017. The results showed that the proposed ranking method took into account the characteristics and dynamic nature of the publishing network. This resulted in fair rankings for publications of different ages, and less bias against recent publications. The results have shown that 13 papers published in the last four years based on the new ranking scores, are now among the top 100 ranked papers of this dataset. In addition, there were no radical changes or unreasonable jumps in the ranking process. Therefore, the correlation rate between the results of the proposed ranking method and the original PageRank algorithm was 90% based on the Spearman Correlation Coefficient. This is an indication of the quality and accuracy of the results.