Article Preview
Top1. Introduction To Source Code Search
Source Code search primarily involves searching internal or external code base to match the search query input. Software developers search for relevant code during various phases of software development lifecycle and while implementing a code change (Sadowski et al., 2015) or during the start of new functionality development. Source code search mainly involves search, evaluation, retrieval and application of the source code from various sources to solve a development problem (Hummel & Atkinson, 2006). 52% of people use general purpose search engines to find reusable software (Hucka & Graham, 2016). Other sources of information are asking colleagues (45%), literature survey (34%), social sites (25%), public software repository (21%) and mailing list (12%) (Hucka & Graham, 2016).
Developers like to look at existing working code sample and use it as reference for their development needs. With vast amount of code available on web, we could use web as a source code repository (Hummel & Atkinson, 2006). Though code reusability is the primary purpose of code search, developers also use search tool for other things such as code understanding, code repair (Ke et al., 2015), impact analysis and such. We are going to look at the main motivations for code search in coming sections.
Availability of Free and open source software (FOSS) has further increased the scope and effectiveness of code search (Rao, 2013) and helps in code implementation (Gallardo-Valencia, 2013).The emergence of Social media platforms, Web 2.0 technologies has given rise to a new set of code sharing platforms such as StackOverflow (Ponzanelli et al., 2014), YouTube, Yahoo Answers, Facebook questions, Quora and such (Barzilay et al., 2013). These platforms mainly harness the collective intelligence of the crowd through users’ active participation and contribution. Public code repositories such as GitHub, BitBucket, SourceForge and such also provide rich source of reusable code. Modern code search engines leverage the crowd sourced code sources to match the relevant search results.
1.1 Contribution of This Paper
Given below are the high level contributions of this paper:
- 1.
The paper does extensive literature survey of code search goals and introduces the novel concept of “intent-based code search” and defines the taxonomy based on searcher’s intent/search goals. Intent based code search identifies the primary search goal (code reuse, code understanding or code repair) and customizes the search process (indexing, matching, result display) based on the identified search goal. For each of the goals we have identified the sub goals, search methods, query types, and query matching and techniques available in state of the art tools.
- 2.
Analysis of challenges and gaps with the state of the art in each of the code search goals.
- 3.
We have conducted a developer survey about code search to re-validate the code search goal taxonomy categories and the paper discusses the high level findings from the developer survey.
- 4.
We have analyzed and identified the implications of code search goals on search tool designers and have recommended the key features needed based on the analysis.
- 5.
The paper provides comprehensive survey of key code search tools along with their details such as matching techniques, ranking algorithms used, UI, data source and such.
- 6.
The paper elaborates the key metrics that are used for evaluating the quality of code search results.
- 7.
Compilation of trends and potential research topics in the area of code search.
Note: We have used “code search goals” and “user intent” synonymously in this paper.