A Survey and Taxonomy of Intent-Based Code Search

A Survey and Taxonomy of Intent-Based Code Search

Shailesh Kumar Shivakumar
Copyright: © 2021 |Pages: 42
DOI: 10.4018/IJSI.2021010106
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In this paper, the authors introduce the novel concept of intent-based code search that categorizes code search goals into a hierarchy. They will explore state-of-the-art techniques in source code search covering various tools, techniques, and algorithms related to source code search. They will survey the code search field through the core use cases of code search such as code reusability, code understanding, and code repair. They propose a user intent-based taxonomy based on the code search goals. The code search goal taxonomy is derived based on deep analysis of literature survey of code search, and the taxonomy is validated based on their exclusive developer survey conducted as part of this paper. The code search goal taxonomy is based on logical categorization of code search goals and shared characteristics (query type, expected response, and such) for each of the categories in the taxonomy. The paper also details the latest trends and surveys the code search tools and the implications on tool design.
Article Preview
Top

Source Code search primarily involves searching internal or external code base to match the search query input. Software developers search for relevant code during various phases of software development lifecycle and while implementing a code change (Sadowski et al., 2015) or during the start of new functionality development. Source code search mainly involves search, evaluation, retrieval and application of the source code from various sources to solve a development problem (Hummel & Atkinson, 2006). 52% of people use general purpose search engines to find reusable software (Hucka & Graham, 2016). Other sources of information are asking colleagues (45%), literature survey (34%), social sites (25%), public software repository (21%) and mailing list (12%) (Hucka & Graham, 2016).

Developers like to look at existing working code sample and use it as reference for their development needs. With vast amount of code available on web, we could use web as a source code repository (Hummel & Atkinson, 2006). Though code reusability is the primary purpose of code search, developers also use search tool for other things such as code understanding, code repair (Ke et al., 2015), impact analysis and such. We are going to look at the main motivations for code search in coming sections.

Availability of Free and open source software (FOSS) has further increased the scope and effectiveness of code search (Rao, 2013) and helps in code implementation (Gallardo-Valencia, 2013).The emergence of Social media platforms, Web 2.0 technologies has given rise to a new set of code sharing platforms such as StackOverflow (Ponzanelli et al., 2014), YouTube, Yahoo Answers, Facebook questions, Quora and such (Barzilay et al., 2013). These platforms mainly harness the collective intelligence of the crowd through users’ active participation and contribution. Public code repositories such as GitHub, BitBucket, SourceForge and such also provide rich source of reusable code. Modern code search engines leverage the crowd sourced code sources to match the relevant search results.

1.1 Contribution of This Paper

Given below are the high level contributions of this paper:

  • 1.

    The paper does extensive literature survey of code search goals and introduces the novel concept of “intent-based code search” and defines the taxonomy based on searcher’s intent/search goals. Intent based code search identifies the primary search goal (code reuse, code understanding or code repair) and customizes the search process (indexing, matching, result display) based on the identified search goal. For each of the goals we have identified the sub goals, search methods, query types, and query matching and techniques available in state of the art tools.

  • 2.

    Analysis of challenges and gaps with the state of the art in each of the code search goals.

  • 3.

    We have conducted a developer survey about code search to re-validate the code search goal taxonomy categories and the paper discusses the high level findings from the developer survey.

  • 4.

    We have analyzed and identified the implications of code search goals on search tool designers and have recommended the key features needed based on the analysis.

  • 5.

    The paper provides comprehensive survey of key code search tools along with their details such as matching techniques, ranking algorithms used, UI, data source and such.

  • 6.

    The paper elaborates the key metrics that are used for evaluating the quality of code search results.

  • 7.

    Compilation of trends and potential research topics in the area of code search.

Note: We have used “code search goals” and “user intent” synonymously in this paper.

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2024)
Volume 11: 1 Issue (2023)
Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2021)
Volume 8: 4 Issues (2020)
Volume 7: 4 Issues (2019)
Volume 6: 4 Issues (2018)
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing