Article Preview
Top1. Introduction
Since the first appearance of humans on earth, plants played an important role throughout human history. Plants affect the human body by identical processes for each of their chemical compounds such as digitalis that is isolated from foxglove, Taxol from periwinkle, vincristine from yew, and morphine that is extracted from opium poppy and considered as one of the most effective sedative for pain; those effects are well known in the domain of pharmaceutical drugs. Herbal medicines have the same work as conventional drugs on human body, thus they have also the same side effects (Tapsell, 2006). So, the best use of plants in medicine needs a careful documentation. Here we found the domain of Ethnobotany, it is simply the investigation about plants used by primitive societies in various parts of the world (Acharya, 2008). However, the first step to the right use of plants is the recognition of the plant species.
In 1990's, researchers discovered that yew tree bark could not be used as a sustainable source of the drug, which made them to stop using the blockbuster drug Taxol. This is a simple example of the high number of clinical trials that caused a diminution in the clinical potential of these compounds and that is due to low production levels in plant species. In that case, a Taxol precursor happened to be more readily available in a renewable part of the tree, and a semi-synthetic protocol could be developed to convert it into the drug. While researchers look for more efficient solutions that are needed in order to ensure that wealth of bio active compounds works well, they have found metabolic engineering of effective plant and microbial production platforms, these techniques are based on DNA sequencing.
DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule. It includes any method or technology that is used to determine the order of the four bases — adenine, guanine, cytosine, and thymine - in a strand of DNA. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery. In biology, one of the main field of research is the knowledge of DNA sequences. This kind of researches is applied in many domains such as medicine, biotechnology, recognition of species. Recently, DNA sequences has become quick which allows to recognize different species from plants to animals to humans even microbial species are recognized based on its DNA sequences. A lot of works have been done in recognizing unknown DNA sequences, the works are divided into several categories: the alignment-based, alignment-free, statistics method and others.
Data mining is the core stage of the knowledge discovery process that is aimed at the extraction of interesting nontrivial, implicit, previously unknown and potentially useful information from data in large databases (Fayyad, 1996). Machine learning is a part of data mining which it focuses on prediction, based on known properties learned from the training data.
The present paper shows a bagging-based approach of machine learning algorithms in data mining to identify DNA sequences for recognition of medical plants. The organization of the paper was done as following, the next section presented a view of literature about the domains that this work touches. Section 3 described the used data set and collection of the Medical Plants Genome Resources. While the discussion of the proposed approach was given in Section 4. In section 5 we detailed the obtained results in the experiments and studies done in this work. Finally, we cited the major conclusions in section 6 and mentioned the future works.