How to Choose Appropriate Experts for Peer Review: An Intelligent Recommendation Method in a Big Data Context

Duanduan Liu; Wei Xu; Wei Du; Fuyin Wang

1 Introduction

Talent introduction is very important in the construction of a university faculty; the quality of the talent reflects the academic level of a university as well as, to some extent, its comprehensive strength. Therefore, universities must take effective measures to strictly control the quality of applicants in the process of their talent introduction. In general, universities use peer review to evaluate applicants. That is, they select a certain number of experts, usually three to five, who have similar research areas as the applicant to review their application documents. In this process, the choice of reviewers has a great impact on the assessment of applicants because appropriate reviewers will help universities to select excellent talent. On the other hand, unsuitable reviewers might result in the loss of talent. Therefore, choosing appropriate reviewers becomes a key step in finding the best talent for an institution.

Currently, the most widely used expert selection method is manual. In this process, a university staff first collects applicants’ application documents and personal information and identifies applicants’ features. Then they retrieve a database of experts with high professional levels and try to find those with research areas similar to that of the applicants. Next, they send invitations to the selected experts by phone or e-mail. Finally, they determine whether the applicant is employable or not according to the expert reviewers’ opinions. At present, most colleges and universities use this method to select reviewers. There is no doubt that this is an effective way to select reviewers. However, there are obvious defects in the manual selection method. First, this selection method is time-consuming. The staff needs to retrieve expert databases and find suitable experts with similar research areas. From the list of experts, three to five names are chosen. This consumes a lot of energy and time. In particular, when the number of experts in the expert database is very large, this defect is more obvious. Furthermore, an amount of manpower and resources is needed to maintain and improve the expert database. Second, manual selection has a certain one-sidedness. For example, a staff always searches for relevant experts in terms of the discipline codes provided by the applicant and the experts although only using discipline codes to reflect research areas is not comprehensive. There is much other content that can reflect research areas, such as publications, research projects, and so on. In addition, staffs use title, rewards, and status to weigh the quality of the experts. These indicators only represent a part of their quality, and this method still needs a more comprehensive evaluation standard for the experts’ productivity. Third, in the process of manual expert selection, it is difficult to avoid subjectivity both in the retrieval process and in the weighing process. The manual method also ignores the possible relationships between experts and applicants. For example, if the expert and the applicant have any cooperative relations or other relationships, the expert will not be suitable for reviewing the applicant.

In order to solve the above problems, we propose an intelligent recommendation method to recommend appropriate experts for peer review and developed a recommendation system to realize it. The proposed method has two stages in the recommendation process. In the first stage, we collected as much information as possible about experts and applicants. We obtained most of the information about the experts from an expert database provided by the university and also extracted useful information from research social network websites and the experts’ personal home pages. We obtained information about the applicants from their personal information (e.g., discipline codes, work experience, graduate school) and application documents (e.g., publications, research projects, patents). Then we created an expert profile and an applicant profile representing all their characteristics by using an information filtering method. In the second stage, we designed a recommendation model to recommend appropriate reviewers for each applicant. The recommendation model contains three modules: the relevance module, the connectivity module, and the quality module. An aggregation model was constructed to integrate these modules. Through these two stages, the most suitable list of experts was recommended for each applicant. The proposed recommendation system has been implemented, and a real survey has been taken to verify the effectiveness of the proposed intelligent recommendation method.

The reminder of this article is as follows. Section 2 reviews the related literature on expert recommendation. Section 3 introduces the proposed recommendation method. In Section 4, we implement the recommendation system and verify the results. In Section 5, we conclude the article and indicate future work.

2 Literature Review

The rapid development of the internet has led to the accumulation of massive amounts of data, and thus we find ourselves entering the age of big data. Meanwhile, we are faced with vast and diverse data from which to find useful information and about which to make good decisions. Obviously, this is a difficult task because of information overload () and information asymmetry (). Therefore, we need an intelligent method to help us deal with these large amounts of data and obtain worthy information. The recommendation service that arises at this historic moment solves this thorny problem. At present, this recommendation service has been applied to various situations, such as finding experts, job recommendations, restaurant searches, route recommendations, and so on. As information technology has become increasingly complete, the recommendation system has become better and better and can be used for more and more functions.

Expert recommendation has been researched widely and deeply by many researchers. Expert recommendation is the task that identifies appropriate experts with profound knowledge and rich experience in a specified expertise area to help users cope with problems. Expert recommendation has many purposes, such as finding experts for consulting (; ), for reviewing research projects (), for collaboration (), and so on.

Many research approaches have been proposed for expert recommendation service. Balog el al. () suggested a language model to seek suitable experts to help users solve problems. Cao, Liu, Bao, & Li () built a two-stage language model for expert search. Daud, Li, Zhou, & Muhammad () proposed a time topic model to find experts related to a specific expertise area. Deng, King, & Lyu () integrated a statistical language model, topic-based model, and hybrid model to achieve better performance in expert ranking. Li, Liu, & Li () used a fuzzy linguistic method and fuzzy text classification to assist users solving tacit knowledge problems. Fang et al. () presented a general probabilistic model to solve the expert finding problem. Macdonald & Ounis () saw the problem of finding experts as a voting problem and proposed a voting model for expert finding. Wang, Jiao, Abrahams, Fan, & Zhang () proposed a novel algorithm, ExpertRank, which considers document-based relevance and one’s authority to find experts. Yukawa, Kasahara, Kato, & Kita () found appropriate experts with a content-based method. Balog & Rijke () introduced expert profiling and applied it to expert finding. With the emergence of social networks, more and more researchers have begun to add social network analysis to the recommendation studies. Silva et al. () took social network analysis into consideration for project selection. Xu, Sun, Ma, & Du () built a connectivity analysis module based on social networking for recommending R&D project opportunities. Liu, Chen, Kao, & Wang () used link analysis for expert finding in question-answering websites. Fazel-Zarandi, Devlin, Huang, & Contractor () presented an expert recommendation system using social network analysis. From the above studies, we can see that there are two major types of research methods in expert recommendation service: content-based methods and network-based methods. Content-based methods use text mining technology to find similarities between two targets. There are two types of content-based methods. One is a profile-based method. For example, some researchers have created an expert profile that represents the characteristics of the expert (e.g., area of expertise) to model the expert’s expertise and calculate the similarity between the expert’s and the applicant’s expertise areas. Then they generate an expert ranking according to relevant scores using certain matching algorithms (; ; ; ). The other type is a document-based method that ranks documents first in the corpus given an expertise area (). A network-based method makes use of relationships among experts and users to build networks, such as a citation network, collaboration network, and other relationship networks. In real life, individuals usually interact with each other and form various social networks, and analysing these social networks can enhance the recommendation. Furthermore, some researchers use a hybrid method combining a content-based method and a network-based method to achieve better performance (; ).

Although these methods have various advantages, they still have problems, such as scalability and efficiency in the context of big data where massive amounts of information are involved. Therefore, we employed big data analysis tools to deal with these problems. Furthermore, from the existing literature, we found that most expert recommendation is limited to finding experts in a given expertise area, and there are few studies about recommending experts for individuals. In other words, existing research lacks personalized consideration. In fact, there are many differences among people apart from areas of expertise, such as educational background, social relationships, and so on. There are many discussions in the field of electronic commerce about personalized recommendations, which deal with recommending goods for individuals according to personal preference. However, in the field of expert recommendation, there is still a lot of room for improvement. Certainly, some researchers have noticed this. For example, Fazel-Zarandi et al. () took users’ motivations into consideration when they built profiles for finding experts. Liu et al. () considered user authority and reputation in recommending experts. Nevertheless, personalized analysis is still insufficient in the field of expert recommendation.

To solve the problems discussed above, we proposed an intelligent recommendation method to recommend appropriate experts for peer review in the context of big data. The proposed method aims at recommending suitable experts for individuals, with the characteristics of each individual having a great impact on the recommendation results. In our research, we designed an expert recommendation model that considers personalities and includes relevance analysis, connectivity analysis, and quality analysis for expert recommendation.

3 The Proposed Method

In this paper, we propose an intelligent expert recommendation method and construct a recommendation research framework with two main stages. The first stage includes data collection and profiling, and the second stage contains an expert recommendation framework employing relevance analysis, connectivity analysis, and quality analysis to build a more comprehensive model for expert recommendation. The process and structure of the proposed research framework are shown in Figure 1.

Figure 1

Architecture of our expert recommendation research framework.

In Figure 1 we can see the process of expert recommendation for each applicant. The proposed research framework has two stages to recommend reviewers for an applicant. In the first stage, we collect the applicant’s information and the experts’ information by various channels. The applicants fill in their personal information (e.g., expertise area, educational background, work experience) and submit their application documents (e.g., publications, research projects, patents) while the expert information is collected from an expert database provided by universities and from research social network websites. Then, we build expert profiles and an applicant profile using the collected information. In the second stage, we construct a comprehensive expert recommendation model from three aspects: quality, relevance, and connectivity. Finally, we build an aggregation model with various constraints integrating these three indicators. We use the proposed aggregation model to rank all experts and recommend top key experts for each applicant.

3.1 Data collection and profiling

In this research, we collected two types of data: expert data and applicant data. For expert data, we used information from an expert database provided by the university. Obviously, just the expert database is not enough for our research because the database needs to be updated in real-time and most universities lack this maintenance work. A recommendation service needs a huge amount of data to get the most accurate results. Thus, we also extracted useful information from research social network websites and personal homepages. With the popularity of social networks, we are able to obtain a great deal of information from them that will make up for the inadequacy of our expert database. Through these channels, we tried to obtain useful information about the experts’ areas of expertise and academic achievements, such as publications, research projects, rewards, titles, and so on. The applicants filled in their personal information (e.g., research area, educational background, work experience) and submitted their application documents (e.g., publications, research projects) by the proposed recommendation system.

After gathering the above information, we built an expert profile and an applicant profile that used relevant information and key attributes to extract areas of expertise. Obviously, the quality of profiling has a great impact on the effectiveness of the expert recommendation. In the expert recommendation context, we focused on how to gather necessary information to build more comprehensive profiles. Vivacqua, Oliveira, and Souza () stated that profiles can be built from declaration and inference methods that reflect subjective and objective information. In this article, we constructed expert and applicant profiles from both subjective and objective perspectives. In the applicant profile, subjective information refers to those structure keywords that are filled in by the applicant. The objective information can be extracted from submitted application documents, such as publications and research projects that represent the applicant’s expertise. In the expert profile, subjective information refers to those structure keywords that are self-identified by the expert. The objective information also can be obtained from academic achievements, such as publication and research projects. Furthermore, the objective information can also reflect relationships between experts and applicants through co-authored papers or collaborative research projects. Also, the level of publications and research projects represent the expert’s expertise level. Therefore, in this research, we collected all the above information and constructed relevant matrices for subsequent model analysis. For example, the keyword-document matrix can be used for a relevance analysis model, the expert-applicant matrix can be used for a connectivity analysis model, and the publication-journal matrix and project-type matrix can be used for a quality analysis model.

3.2 Expert recommendation framework

The core of this research is the proposed expert recommendation model. After collecting relevant information and profiling, we used the proposed model to deal with the above data and generate a ranking score of experts. Then, experts with high scores were recommended for the applicant. The proposed expert recommendation model has three modules: a relevance analysis model, connectivity analysis model, and quality analysis model. The components of our expert recommendation model are shown in Figure 2.

Figure 2

The component of our expert recommendation model.

As is shown in Figure 2, our expert recommendation model deals with expert profiles and applicant profiles in three ways. First, a relevance analysis model is used for measuring the similarity between the experts’ expertise and the applicant’s expertise and for selecting relevant experts as candidate reviewers. Second, a quality analysis model is used for evaluating the experts’ expertise. Third, a connectivity analysis model excludes those experts who have relationships with the applicant to ensure the fairness of the peer review. In general, the experts remaining after the above two processes can review the applicant, but in fact, the number of candidate reviewers is very large, and therefore, we need to use quality analysis to rank the experts. Finally, an aggregation model integrates these three aspects and tries to find more suitable experts to review the applicant.

3.2.1 Relevance analysis model

The relevance analysis model is used for calculating the similarity between expert and applicant. When searching for an expert to be a reviewer to review an applicant, we first need to consider the similarity between the two targets. In this research, we focus on the similarity of areas of expertise. An expert’s relevance can be divided into two parts: subjective and objective.

Subjective relevance can be measured by a list of structure keywords that are self-identified by experts and applicants. We obtained these standard terms from the expert database and the applicant’s submission, and constructed two sets to represent the subjective information of expert and applicant.

(1)

S i = {k e y 1, k e y 2, …, k e y 5}

M1 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {S_i} = \{ ke{y_1},ke{y_2}, \ldots ,ke{y_5}\} \] \end{document}

(2)

S j = {k e y 1, k e y 2, …, k e y 5}

M2 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {S_j} = \{ ke{y_1},ke{y_2}, \ldots ,ke{y_5}\} \] \end{document}

where S_i denotes the self-identified expertise area set of expert i and S_j denotes the self-identified research area of applicant j. We selected Jaccard similarity () to measure the similarity between expert i and applicant j:

(3)

S i m i j (s e l f) = # | S i ∩ S j | # | S i ∪ S j |

M3 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ Si{m_{ij}}(self) = \frac{{\# |{S_i} \cap {S_j}|}}{{\# |{S_i} \cup {S_j}|}} \] \end{document}

where #|•| denotes the number of keywords in each set.

Objective relevance can be measured by the academic achievements of expert and applicant. In this research, we focus on relevance of publications and research projects, which can, to some extent, reveal research areas. The main content of a document, publication, or research project can be represented by a list of keywords and corresponding weights. Traditionally, many calculation methods have used the frequency of the keyword to represent its weight. However, in our research, this weight calculation method is not comprehensive. Unlike a paper or a research project that generally involves only one research area, a person may have different research areas at different times. Our method reflects that papers published at different times may have different themes. Obviously, recently published papers reflect the person’s recent research area. Therefore, we took the time factor into consideration when we calculated the weight of keywords, the smaller the time interval between publication and the present, the greater the weight. We used a list of keywords and relevant weights to represent a publication as follows:

(4)

P u b k = {(k e y 1, f 1, k), (k e y 2, f 2, k), …, (k e y m, f m, k), …}

M4 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ Pu{b_k} = \{ (ke{y_1},{f_{1,k}}),(ke{y_2},{f_{2,k}}), \ldots ,(ke{y_m},{f_{m,k}}), \ldots \} \] \end{document}

where Pub_k denotes publication k and f_m,k denotes the frequency of key_m in publication k. Then all publications are represented as follows:

(5)

Pub = {(k e y 1, w 1), (k e y 2, w 2), …, (k e y m, w m), …}

M5 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\rm{Pub}} = \{ (ke{y_1},{w_1}),(ke{y_2},{w_2}), \ldots ,(ke{y_m},{w_m}), \ldots \} \] \end{document}

(6)

w m = ∑ k = 1 K 1 t k × f m, k

M6 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {w_m} = \sum\limits_{k = 1}^K {\frac{1}{{{t_k}}} \times {f_{m,k}}} \] \end{document}

where w_m denotes the weight of key_m in all publications and t_k denotes the time interval between the published time of publication k and current time.

We used the same approach to deal with research projects. We constructed a publication vector and a research project vector for each expert and applicant as follows:

(7)

V e c i (p u b) = 〈 (k e y 1, w 1), (k e y 2, w 2), …, (k e y m, w m), … 〉

M7 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ Ve{c_i}(pub) = \left\langle {(ke{y_1},{w_1}),(ke{y_2},{w_2}), \ldots ,(ke{y_m},{w_m}), \ldots } \right\rangle \] \end{document}

(8)

V e c i (p r o) = 〈 (k e y 1, w 1), (k e y 2, w 2), …, (k e y m, w m), … 〉

M8 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ Ve{c_i}(pro) = \left\langle {(ke{y_1},{w_1}),(ke{y_2},{w_2}), \ldots ,(ke{y_m},{w_m}), \ldots } \right\rangle \] \end{document}

(9)

V e c j (p u b) = 〈 (k e y 1, w 1), (k e y 2, w 2), …, (k e y m, w m), … 〉

M9 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ Ve{c_j}(pub) = \left\langle {(ke{y_1},{w_1}),(ke{y_2},{w_2}), \ldots ,(ke{y_m},{w_m}), \ldots } \right\rangle \] \end{document}

(10)

V e c j (p r o) = 〈 (k e y 1, w 1), (k e y 2, w 2), …, (k e y m, w m), … 〉

M10 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ Ve{c_j}(pro) = \left\langle {(ke{y_1},{w_1}),(ke{y_2},{w_2}), \ldots ,(ke{y_m},{w_m}), \ldots } \right\rangle \] \end{document}

where Vec_i(pub) and Vec_i(pro) denote the publication and research vectors of expert i. Vec_j(pub) and Vec_j(pro) denote the publication and research vectors of applicant j. w_m denotes the weight of key_m in a publication or research project. Then, we used the cosine similarity to measure the similarity between expert i and applicant j.

(11)

S i m i j (p u b) = V e c i (p u b) ⋅ V e c j (p u b) ∥ V e c i (p u b) ∥ ‖ V e c j (p u b) ‖

M11 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ Si{m_{ij}}(pub) = \frac{{Ve{c_i}(pub) \cdot Ve{c_j}(pub)}}{{\parallel Ve{c_i}(pub)\parallel \left\| {Ve{c_j}(pub)} \right\|}} \] \end{document}

(12)

S i m i j (p r o) = V e c i (p r o) ⋅ V e c j (p r o) ∥ V e c i (p r o) ∥ ‖ V e c j (p r o) ‖

M12 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ Si{m_{ij}}(pro) = \frac{{Ve{c_i}(pro) \cdot Ve{c_j}(pro)}}{{\parallel Ve{c_i}(pro)\parallel \left\| {Ve{c_j}(pro)} \right\|}} \] \end{document}

where Sim_ij(pub) denotes the publication similarity between expert i and applicant j and Sim_ij(pro) denotes the research similarity between expert i and applicant j.

Finally, we integrated the subjective relevance and objective relevance and generated an integrated relevance R_ij that reflects the similarity between expert i and applicant j.

(13)

R i j = α S i m i j (s e l f) + β S i m i j (p u b) + γ S i m i j (p r o)

M13 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {R_{ij}} = \alpha Si{m_{ij}}(self) + \beta Si{m_{ij}}(pub) + \gamma Si{m_{ij}}(pro) \] \end{document}

where α + β + γ = 1.

3.2.2 Quality analysis model

The quality analysis model was used for evaluating the experts’ level of expertise. Obviously, experts with higher professional levels are more suitable to review applicants. In this research, we used publications and research projects to evaluate the quality of the experts. We selected three aspects, quantity, quality, and time interval, to calculate the performance of experts on the publication level. The quantity of publications reflects the experts’ contributions in a certain field. Quality contains two factors: citations and the level of the publication’s journal. In general, impact factor is a commonly used evaluation index for journals. However, in some special fields, although the impact factors are not high, the quality of the journal in this field is high. Thus we use the ratio of the journal impact factor for the greatest impact factor in its field (). The time interval is the time difference between published time and current time, and it reflects the experts’ activity during this time. Quantity and quality are benefit attributes, and the time interval is a cost attribute. We calculated the performance of experts in publication levels as follows:

(14)

Q j (p u b) = ∑ k = 1 K (C k + 1) × i f k i f max × 1 t k

M14 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {Q_j}(pub) = \sum\limits_{k = 1}^K {({C_k} + 1) \times \frac{{i{f_k}}}{{i{f_{\max }}}}} \times \frac{1}{{{t_k}}} \] \end{document}

where Q_j(pub) denotes the publication quality of expert j. C_k denotes the citations of publication k. if_k represents the impact factor of the journal publishing publication k, and if_max denotes the greatest impact factor of the journal in its field. t_k denotes the time interval between the published time and current time. K denotes the quantity of all publication of expert j.

For our research project, we used the project level to reflect the quality of the research project. In general, a research project can be classified into four levels: national (N), ministry (M), provincial (P), and local city (C). Obviously, the national level is better than the others. We defined the research project quality as follows:

(15)

Q j (p r o) = ∑ k = 1 4 w k p ⋅ q k j p

M15 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {Q_j}(pro) = \sum\limits_{k = 1}^4 {w_k^p} \cdot q_{kj}^p \] \end{document}

where Q_j(pro) denotes the research project quality of expert j. w_k^p represents the project level weight. q^p_kj represents the quantity of research projects expert j participated in with level w_k^p in the past five years.

Finally, we obtained an integrated quality Q_j of expert j as follows:

(16)

Q j = μ Q j (p u b) + (1 − μ) Q j (p r o)

M16 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {Q_j} = \mu {Q_j}(pub) + (1 - \mu ){Q_j}(pro) \] \end{document}

3.2.3 Connectivity analysis model

Researchers form relationships by collaborating with each other, especially in similar areas of expertise where more collaboration opportunities exist than other situations. Obviously, if an expert has a collaborative relationship with an applicant, he or she is not suitable to review the applicant because of a conflict of interest. Therefore, when we are selecting appropriate experts as reviewers, we should exclude experts with conflicts of interest to ensure fairness of the review. In this research, a connectivity analysis model was used for excluding conflicts of interest, and we focused on co-authoring relationships in publications and collaboration in research projects.

To solve the conflict of interest problem, we constructed two matrices: a publication-level matrix and a project-level matrix. We assume that there are m experts and n applicants and thus will get two m × n matrices:

(17)

M p u b = [e 11 e 12 … e 1 n e 21 e 22 … e 2 n … … e m 1 e m 2 … e m n]

M17 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {M_{pub}} = \left[ \begin{array}{l} {e_{11}}{e_{12}}{\rm{ }} \ldots {\rm{ }}{e_{1n}}\\ {e_{21}}{e_{22}}{\rm{ }} \ldots {\rm{ }}{e_{2n}}\\ {\rm{ }} \ldots {\rm{ }} \ldots \\ {e_{m1}}{e_{m2}}{\rm{ }} \ldots {\rm{ }}{e_{mn}} \end{array} \right] \] \end{document}

(18)

M p r o = [e 11 p e 12 p … e 1 n p e 21 p e 22 p … e 2 n p … … e m 1 p e m 2 p … e m n p]

M18 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {M_{pro}} = \left[ \begin{array}{l} e_{_{11}}^pe_{_{12}}^p{\rm{ }} \ldots {\rm{ }}e_{_{1n}}^p\\ e_{_{21}}^pe_{_{22}}^p{\rm{ }} \ldots {\rm{ }}e_{_{2n}}^p\\ {\rm{ }} \ldots {\rm{ }} \ldots \\ e_{_{m1}}^pe_{_{m2}}^p{\rm{ }} \ldots {\rm{ }}e_{_{mn}}^p \end{array} \right] \] \end{document}

where e_ij in M_pub denotes the number of collaborations between expert i and applicant j on the publication-level while e^p_ij in M_pro denotes the number of collaborations between expert i and applicant j on the research project-level.

Then we defined connectivity C_ij as follows:

(19)

C i j = {1, e i j > 0 o r e i j p > 0 0, e i j = 0 o r e i j p = 0

M19 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[{C_{ij}} = \left\{ {\begin{array}{*{20}{c}} {1,}&{{e_{ij}} > 0{\rm{ }}or{\rm{ }}e_{ij}^p > 0}\\ {0,}&{{e_{ij}} = 0{\rm{ }}or{\rm{ }}e_{ij}^p = 0} \end{array}} \right.\] \end{document}

where C_ij denotes the connectivity between expert i and applicant j. From the above formula, we found that if an expert has a collaborative relationship with an applicant, no matter whether in a publication or research project, the expert will lose the opportunity to review the applicant.

3.2.4 Aggregation model

Through the above three modules we obtained three scores for each expert: relevance score (R_ij), quality score (Q_j), and connectivity score (C_ij). We proposed an aggregation model integrating these aspects to make the score more appropriate and accurate. In this research, we need to recommend experts with high relevance and quality scores. The best situation is that the expert receives high scores in both relevance and quality. If the expert gets a high quality score but a low relevance score, he or she is also not suitable to review the applicant. Meanwhile, we must exclude the expert whose connectivity score is equal to one, which means there is a link between expert and applicant. Therefore, the proposed aggregation model can be denoted as follows:

(20)

S c o r e i j = (1 − C i j) × R i j × Q j

M20 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ Scor{e_{ij}} = (1 - {C_{ij}}) \times {R_{ij}} \times {Q_j} \] \end{document}

where Score_ij denotes the comprehensive score of expert j for applicant i. Meanwhile, the connectivity between external experts and applicants should be satisfied if C_ij = 0. If the expert’s connectivity score is equal to one, which means the expert’s Score_ij is equal to zero, the expert will lose the opportunity to review the applicant. The proposed model ranks all experts according to their scores and outputs a list of recommended experts for an applicant.

4 Empirical Analyses

4.1 Data and method

The proposed intelligent expert recommendation method was implemented to aid universities or other research institutions in selecting excellent talent. In this section, we discuss how we verified the validity and accuracy of the proposed method and compared its performance with other baseline methods.

First we collected expert and applicant information from various sources, such as the expert database provided by the university, research social network websites, personal homepages, and so on. Then we extracted useful information about areas of expertise, publications, and research projects and used this data in the next phase of the experiment. Second, we compared the proposed method with the baseline method to verify the better performance of the proposed method. In our research, we took time factors into consideration and redefined the weight of each keyword when calculating the similarity between expert and applicant while the traditional method views all keywords as having the same weight. In our experiment, the applicant, unlike a paper or a project, may have various research areas during different time periods so the keywords of different periods have different levels of influence and importance for the applicant. Obviously, the shorter the time interval between the time the papers were published and the present, the more the paper represents the applicant’s research area. The details of the comparative experiment are as follows:

The baseline method: This recommendation method, used widely in research expert recommendation context, combines the relevance, connectivity, and quality perspectives to be the targets. The relevance analysis model views all publications as having the same weight when calculating the similarity between applicant and experts.
The proposed method: This method, based on the baseline method, also uses relevance, connectivity, and quality to recommend appropriate experts for each applicant. The difference is that we take time factors into consideration because unlike a paper or a research project, which generally focuses on one research field, an applicant may have various research areas at different times. Therefore, we added a time factor in the proposed relevance analysis model. We put forward a hypothesis that the smaller the time interval, the more representative the publication, which can reflect the applicant’s area of expertise. Based on this hypothesis, we redefined the weight of keywords extracted from publications and research projects, by saying the shorter the time interval, the greater the weight.

In this paper, we built an aggregation model integrating quality, relevance, and connectivity, and in each aspect we also proposed several criteria. These criteria, as we all know, have different weights in each aspect. To solve the multi-criteria weight problem, we adopted an analytic hierarchy process (AHP) approach to set different weights for different indicators.

4.2 Evaluation metrics

In our research, we recommended a list of appropriate experts as reviewers for each applicant and asked them to rate the recommendation results based on relevance. We used the Average Rating score (AR) and Normalized Discounted Cumulative Gain (NDCG) to weigh the performance of the proposed recommendation method. AR and NDCG were computed for the top one and the top five recommended experts. More particularly, the AR was computed using the ratings from all the users, and it reflects the average rating of all the recommendations. The NDCG is usually used for evaluating a search engine’s performance, and it is common for gradual judgments. In our work, the AR and NDCG were defined as follows:

(21)

AR = 1 | U | ∑ i = 1 | U | 1 N ∑ j = 1 N r i j

M21 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ {\rm{AR}} = \frac{1}{{\left| U \right|}}\sum\limits_{i = 1}^{\left| U \right|} {\frac{1}{N}} \sum\nolimits_{j = 1}^N {{r_{ij}}} \] \end{document}

(22)

N D C G = 1 | U | ∑ i = 1 | U | Z ∑ j = 1 N 2 r i j − 1 log (1 + j)

M22 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ NDCG = \frac{1}{{|U|}}\sum\limits_{i = 1}^{|U|} Z \sum\nolimits_{j = 1}^N {\frac{{{2^{{r_{ij}}}} - 1}}{{\log (1 + j)}}} \] \end{document}

where |U| denotes the number of researchers in our survey, N represents the number of recommended experts and N=1 or 5, r_ij denotes the rating of researcher i on expert j, and Z is a normalization constant that is chosen so that a perfect ranking’s NDCG value is 1.

4.3 Results and discussion

We compared the proposed method with the baseline method, and we compared their results based on AR metrics and NDCG metrics. The details of results are shown as follows (Table 1 and Table 2):

Table 1

Performance of two methods in terms of AR.

	The proposed method	The baseline method

Top 1	4.23	4.05
Top 5	4.68	4.26

Table 2

Performance of two methods in terms of NDCG.

	The proposed method	The baseline method

Top 1	0.78	0.62
Top 5	0.84	0.74

From the above tables, we discovered that the value of the two metrics using the proposed method was greater than that given by the baseline method. Thus, we can conclude that the performance of the proposed method is better than that of the baseline method. Therefore, the proposed method recommends more appropriate reviewers than the baseline method.

5 Conclusions

In this article, we proposed an intelligent recommendation method in the context of big data to recommend appropriate experts for applicant review. The proposed recommendation method combined a relevance analysis model, connectivity analysis model, and quality analysis model and added a time factor to construct a comprehensive recommendation model. Our method was implemented in an online research community, and the results exhibit that the proposed method is more effective than existing ones.

However, there are still some research problems that need more study. First, although this paper takes some personal aspects into consideration, such as time factors, our analysis of personalization is not large enough. We should find more features of the applicant with which to model the applicant’s profile. Second, in this paper we only analysed a collaboration network. In fact, there are still other social networks that could impact the recommendation results, such as friend networks and citation networks. We will focus on these points in the future. Third, we used some big data analysis methods in our research, such as collecting data from the internet (e.g., social network websites and personal homepages), dealing with various types of data, building all kinds of calculation models, and so on. However, in the future, we plan to adopt the MapReduce framework and cloud computing to improve efficiency and effectiveness. Of course, we still will expand the scale of the data to improve the accuracy of our expert recommendation system. Finally, the proposed recommendation method can be used in other situations, such as finding a job and research project selection.

Data Science Journal

Proceedings Papers