Article Preview
TopIntroduction
The cost of developing a single new drug is now estimated to exceed two billion dollars, and the approval process often takes decades. According to a recent study by Wong and Siah (2019), only about fourteen percent of all newly developed drugs pass through clinical trials. Many of these drugs fail because they lack efficacy or have adverse side-effects (Pitts, 2014). The problem of accurately predicting the therapeutic indications and side-effects of new drugs has led many researchers to work on developing machine learning (ML) systems to classify compounds based on Anatomical Therapeutic Chemical (ATC) classes. ML systems that filter out drugs with a significant probability of failing in clinical trials offer the promise of accelerating drug development and reducing costs.
The ATC coding system, controlled by the World Health Organization, categorizes drugs into overlapping classes at five different levels based on their therapeutic, pharmacological, and chemical properties and on the organs or systems the drugs act on. The first level identifies the broad anatomical groups a drug targets by coding it with one of fourteen letters:
A: Alimentary tract and metabolism
B: Blood and blood-forming organs
C: Cardiovascular system
D: Dermatologicals
G: Genitourinary system and sex hormones
H: Systemic hormonal preparations, excluding sex hormones and insulins
J: Anti-infectives for systemic use
L: Antineoplastic and immunomodulating agents
M: Musculoskeletal system
N: Nervous system
P: Antiparasitic products, insecticides, and repellents
R: Respiratory system
S: Sensory organs
V: Various
Levels 2 through 3 of the ATC coding system mostly represent pharmacological subgroups, whereas level 5 identifies the chemical substances. A drug is given one or more ATC codes based on its memberships in the different classes contained in these five levels. Acetylsalicylic acid, or Asprin, for example, has three ATC codes since it functions as a local oral treatment (level 1 group A), as a platelet inhibitor (level 1 group B), and as an analgesic and antipyretic (level 1 group N).
Although the ATC classification system has become an essential tool for providing guidance to drug developers regarding the potential clinical value of a compound, only a fraction of all pharmaceuticals has been assigned ATC codes. That so few drugs have been labeled is due to the labor-intensive experimental methods required to identify a new drug's ATC classes. This bottleneck has resulted in the proposal of many ML methods and the establishment of some web servers capable of performing automatic ATC classification (Dunkel, Günther, Ahmed, Wittig, & Preissner, 2008; Wu, Ai, Liu, & Fan, 2013). Most research in this area, including the study presented here, focuses on identifying the fourteen organ/system classes at the first level of the ATC coding system. As illustrated with Aspirin, predicting a compound’s level 1 classification is a complex multi-label problem.