Article Preview
TopIntroduction
In the era of big data, all walks of life carry out business through the network, resulting in the accumulation of large amounts of data in the network (Bouramoul, 2016; Mary & Malarvizhi, 2014; Pereira & Pereira, 2015; Qumsiyeh & Ng, 2016; Shen, Liu, Shen, Liu, & Sun, 2017; Shen, Shen, Chen, Huang, & Susilo, 2016; Tsai, 2011; Tsou, 2010). In healthcare field, a large number of patient information, drug information, and diagnosis and treatment information are stored. (Barbantan, Porumb, Lemnaru, & Potolea, 2016; Wang X, 2015). In education, there are a lot of information about students, teachers and specialties. In telecommunications industry, massive traffic data and communication data are generated every day (Trasarti, Giannotti, Nanni, Pedreschi, & Renso, 2011). The analysis and effective use of data in various areas can help each industry arrange resources reasonably, increase productivity and discover opportunities. Mining the hidden information in these data can help managers to make decisions to improve the quality and efficiency of production and life (Daly & Taniar, 2004; Silvestri, Corazza, Benerecetti, & Alicante, 2016; Taniar, Rahayu, Vincent, & Daly, 2008). However, regardless of which field, the latest and most advanced technologies and methods are usually revealed to the world in the form of patents, in order to grab technology heights as early as possible.
Patent is a kind of special text in the Internet, with strict format requirements and writing habits, which would bring in conflict and contradiction. On the one hand, it is necessary to express the techniques and inventions clearly; and on the other hand, the expressions should be as obscure as possible to prevent the invention from being imitated or infringed. As a carrier of human wisdom and innovation, patents contain rich technical, economic and legal information. In recent years, patent has become a competing object of analysis and mining. The effective use of patent information can provide important support for enterprises on technological innovations, avoiding risks, purchasing patents, safeguarding their interests and so on (Mandl, 2017; Tseng, Lin, & Lin, 2007; Zhang, Li, & Li, 2015).
Patent annotation (Agatonovic et al., 2008; Carvalho, Franca, & Lima, 2014) is a key step in patent online retrieval, analysis and mining. Patent annotation extracts important information from patents, such as techniques, functions, keywords and so on, which can help to realize online retrieval, patent analysis and mining on semantic level, reflecting a certain degree of intelligence. In patent online retrieval, you can improve the recall rate by extending the search terms with similar techniques or functions. In patent analysis, patent technology effect matrix can be constructed by enumerating the technologies and effects of multiple patents in tabular form to help patent applicants discover patent minefield and patent blank area (Chen, 2011; Zhang, 2017). In patent mining, patent annotation is the key step of patent classification, clustering and recommendation.
Patent abstract is an important component of patent text. It describes and summarizes patent background, purpose, method and function in brief space, and usually does not include any professional and complicated legal information, while retains most of important information in patent. Thus, patent abstract is a very good data source for patent annotation, analysis and mining.