Article Preview
TopIntroduction
Recently, the number of people using Question and Answer (Q&A) sites on the Internet has been increasing (Yahoo! Answers, 2013; Yahoo! Chiebukuro, 2013). Q&A sites are online communities where users can manually post questions and answers. Hence, these sites can be considered as databases containing enormous amounts of knowledge that can be used to solve various problems. When a user posts a question, other users may respond. The questioner selects the most appropriate response as the “Best Answer” and awards the respondent with some points as a form of fee. The Best Answer is the response statement that the questioner subjectively finds most satisfying. Several research efforts have attempted to estimate the Best Answer (Blooma, Chua, & Goh, 2008; Agichtein, Castillo, Donato, Gioni, & Mishne, 2008; Wang, Tu, Feng, & Zhang, 2009; Kim, Oh, & Oh, 2007; Nishihara, Matsumura, & Yachida, 2008).
As the number of users of Q&A sites increase and more questions are posted, it becomes harder for respondents to select questions that match their specialty and interests. Consequently, a question posed by a user may not be seen or answered by qualified respondents. Moreover, if an appropriate respondent is not encountered, mismatching may occur, which may cause the following problems:
- •
A questioner may acquire incorrect knowledge from inappropriate answers;
- •
Respondents may not have the necessary knowledge to properly answer the question, and thus the problem remains unsolved;
- •
Users may be offended by answers that contain abusive words, slanders, or statements against public order and standards of decency.
The authors’ objective is to present questions to qualified users who can appropriately answer them, thus avoiding the problems described above. Specifically, the authors used the impressions of 60 statements posted on Yahoo! Chiebukuro (Yokoyama, Hochin, Nomiya, & Satoh 2011), a Q&A site in Japan, and conducted an impression evaluation experiment. By applying factor analysis to the scores obtained in the experiment, nine factors have been obtained (Yokoyama et al., 2011).
However, using this approach the authors can only obtain factor scores for the statements used in the experiment. To estimate the factor scores of other statements, multiple regression analysis is applied to the feature values of the statements. The authors adopt the syntactic information of the statements, such as word classes (such as nouns and verbs), and the number of appearances (or the percentage) of alphanumeric characters and kanji (Yokoyama et al., 2011), which is one of Chinese characters and is the Japanese writing system (“Text Seer Manual”, 2013). Moreover, word imageability, closing sentence expressions, word familiarity, and notation validity are also adopted as feature values (Sakuma, Ijuin, Fushimi, Tatsumi, Tanaka, Amano & Kondoh, 2008; Amano & Kondoh, 2003). It is shown that the overall estimation accuracy is good. The authors have confirmed the validity of estimating the scores of each factor by obtaining the major feature values (Amano & Kondoh, 2003).