Article Preview
Top1. Introduction
With the rapid development of Internet technology, various forms of drug data are also developed rapidly on the Internet. Vast amounts of drug data stored in their specific organization way, users find the resource in the large amounts of drug data according to their requirements. The occurrence of the search engine greatly reduce the difficulty in finding information for users. But usually, it doesn't make users get the most satisfactory retrieval results. For example, when a drug is lack, users input indication terms query by using the generic search tool, the number of query results are a lot, so users hardly select the most appropriate drug(s) through simple judgment at this time. Accordingly, non-appropriate selected drug(s) not only has no good treatment effect, but also lead to more serious consequences and negative effects. Therefore, how to rapidly and accurately find a alternative drug has extensive application prospect in the field of drug information retrieval.
To solve this problem, this paper presents a Drug Similarity Computation (DSC) algorithm based on weighted indication. Our contributions mainly includes three parts. First, we should find a good data structure website to collected drug data resources, then drug data are collected and organized to establish the user dictionary and drug lib based on Chinese word segmentation technology (Abudoulikemu, 2010; Wu, 2011; Zhang, 2014; Ni, 2014). It is important that how to determine keywords of drug indications. Wang (Wang, 2011; Wang, et. al. 2011) puts forward synergy of cognitive informatics, which is helpful for us to extract this kind of indication terms. Second, some weights are assigned to the drug indication terms and establish the weighted indication knowledge database. Also it can prepare for calculation of drug similarity. Finally, the drug similarity is computed to get a drug sub-set with similar treatment effect. When users enter a drug name, the presented algorithm will sorts all records in this drug sub-set and does some filtering, and recommends some reasonable drugs. So users can choose the best alternative drug(s) by similarity computation among some drugs.
The primary task of the drug data sensing is extracting medical terms from the indication texts of drugs, inference as the basic mechanism of thought is abilities gifted to human beings according to Inference algebra (IA) (Wang, 2012), and IA are explored in three categories: a) logical inferences; b) analytic inferences; and c) hybrid inferences. The extracting process of medical terms can utilize Chinese word segmentation technology and our prior window-split idea (Zhang, 2014). In addition, drug data sensing must select a similarity calculation method. Concept algebra (CA) is a denotational mathematical structure for formal knowledge representation and manipulation in machine learning and cognitive computing. CA provides a rigorous knowledge modeling and processing tool (Wang, et. al. 2011). Currently, similarity calculation method can be roughly divided into two kinds: one is statistics with large-scale corpus. This method is based on the probability distribution of vocabulary context information to calculate. Another is usually based on the hierarchy relation of complete semantic dictionary, such as Liu (Liu, 2002) etc. He is put forward similarity calculation based on “Hownet” word (Guan, 2002; Li, 2012). The method based on semantic dictionary is simple and effective, more intuitive, users can quickly complete the calculation by building the related database, so the method based on dictionary is also the main method.