Article Preview
Top1. Introduction
An important indicator of code quality is compliance with naming conventions (interchangeably referred to as naming style or naming guidelines). Usually, these conventions consist of a set of programming practices or rules such as “class name should start with a capital C” or “do not use underscore with the method/function name”. With modern programming languages, developers’ compliance with these conventions is optional in the sense that programs that violate the conventions can still be functionally and syntactically correct. However, compliance with naming conventions is essential to ensure that software is understandable and readable and thus more likely to be reusable and maintainable (Elish and Offutt 2001; Galin, 2004). From the perspective of the software industry, developers are encouraged to apply such conventions because software is rarely maintained or reviewed for its whole life by the original author, especially within the open source model. In fact, names in the programs represent defined concepts because they connect the source code to the problem domain (Binkley et al. 2013; Caprile and Tonella 2000). This is important because every code reader may infer different meanings from the code depending on naming and other conventions, despite the architecture and design of the code (Green and Ledgard 2011).
The consequences of violating these conventions in one place would lead to damaged program readability in numerous other places; thus, there is a real comprehension cost when names are not chosen carefully (Butler et al. 2009; Butler et al. 2010). As an example, the meaning of a method that returns the variable detect may take some time to understand. Using more words and underscores (e.g., detect_bad_sector) would remove the violation and clarify the method’s objective. Another example that highlights the importance of naming conventions comes from the variable lcf_name. This may also take considerable time to identify unless we discover that the developer prefixes the variable names with a tag that indicates the type of data and scope they hold. For example, lcf_name indicates that the f_name variable has local scope (l) and character data type (c). With such a notation (called Hungarian (Wang et al. 2014)), it becomes much worse once we know that any programming language has several data types and different scopes. Compared with the past, making the program names easily understandable has become a necessity because programming currently is a team-based activity rather than an individual experience. Several studies have noted the importance of careful naming (Caprile and Tonella 2000; Butler et al. 2009; Lawrie et al. 2006; Rilling and Klemola 2003).
To the best of our knowledge, there is no study available to determine how well code complies with the language naming conventions. In this study, we conducted an experiment to fill this gap empirically for the Java and C# programming languages. Our choice of these languages resulted not only from their widespread use but also from their early attention to the naming convention issue; the founders of the languages (Microsoft for C# and Sun Microsystems/Oracle for Java) defined a single naming guideline for the entire programming language (Sun Microsystems 1999; Pradeep 2008; Jasonall 2008; DoFactory website). Herein, 120 arbitrarily selected open source Java and C# classes were evaluated with respect to naming conventions. The next section describes the naming conventions of Java and C# that were studied. Section 3 describes the design and planning of the experiment. The experimental results are analysed and discussed in section 4. Section 5 describes the study’s limitations and the threats to its validity. In section 6, we present an overview of the related work in this area. A conclusion and plans for future work are presented in section 7.