Kyushu University Academic Staff Educational and Research Activities Database
List of Papers
Yasutaka Kamei Last modified date:2024.03.12

Professor / Advanced Software Engineering / Department of Advanced Information Technology / Faculty of Information Science and Electrical Engineering


Papers
1. Analyzing the Impact of Automatic Test Case Generation Considering Execution Paths on Automated Program Repair
For automated program repairs (APR), the cost of the patch generation process will be reduced if automatically-generated test suites can be used. Automatic test-case generation techniques often take classes as input. This study aims at identifying which classes should be given as input for the aforementioned technique. In this study, we investigate the relationship between the test suites that detected failures and the actual classes that had bugs fixed by developers. We observe the cases where test suites do not identify the classes fixed by developers as a cause of failures. We also find that these cases occur when the classes fixed by developers are on the traces generated by test suites' exercises. Based on this finding, we examine the impact of the automatically generated test-suites on the performance of APR. We demonstrate, taking into account all the classes exercised by the failed test cases, that the total number of generated patches decreases but the number of correct patches increases..
2. A case study of automatic debugging problem generation using novice programmers' bug fix histories
The debugging support for beginner programmers has been an active research area in recent years. However, instead of directly supporting their debugging, training such programmers to be intermediate programmers by using exercises to debug programs is overlooked. In this training, it is important to prepare the programs including bugs that capture the tendency of beginner programmers. Therefore, we focused on Learning-Mutation, which learns the bugs using machine translation from buggy programs and fixed programs, and automatically induces bugs into programs. In this study, we applied Learning-Mutation to the programs written by beginner programmers at Kyushu University. By comparing the induced bugs by Learning-Mutation with the actual bugs by such programmers, we evaluated whether Learning-Mutation can be used to support the exercises by preparing the programs including bugs. As a result, the induced bugs are similar to the actual bugs, and the patterns of bugs that are forgetting semicolons and undeclaring variables or functions accounted for more than 36% when the number of tokens was small. On the other hand, as the number of tokens increased, the number of incorrect expressions increased. Furthermore, although there are bugs that are difficult to generate, beam search relieves this difficulty..
3. An Empirical Analysis of the Evolution of README Files.
4. Jeongju Sohn, Yasutaka Kamei, Shane McIntosh, Shin Yoo, Leveraging Fault Localisation to Enhance Defect Prediction. , 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2021.03.
5. Gopi Krishnan Rajbahadur, Shaowei Wang, Gustavo Ansaldi, Yasutaka Kamei, Ahmed E. Hassan, The impact of feature importance methods on the interpretation of defect classifiers, IEEE Transactions on Software Engineering, 10.1109/TSE.2021.3056941, 2021.02, Classifier specific (CS) and classifier agnostic (CA) feature importance methods are widely used (often interchangeably) by prior studies to derive feature importance ranks from a defect classifier. However, different feature importance methods are likely to compute different feature importance ranks even for the same dataset and classifier. Hence such interchangeable use of feature importance methods can lead to conclusion instabilities unless there is a strong agreement among different methods. Therefore, in this paper, we evaluate the agreement between the feature importance ranks associated with the studied classifiers through a case study of 18 software projects and six commonly used classifiers. We find that: 1) The computed feature importance ranks by CA and CS methods do not always strongly agree with each other. 2) The computed feature importance ranks by the studied CA methods exhibit a strong agreement including the features reported at top-1 and top-3 ranks for a given dataset and classifier, while even the commonly used CS methods yield vastly different feature importance ranks. Such findings raise concerns about the stability of conclusions across replicated studies. We further observe that the commonly used defect datasets are rife with feature interactions and these feature interactions impact the computed feature importance ranks of the CS methods (not the CA methods). We demonstrate that removing these feature interactions, even with simple methods like CFS improves agreement between the computed feature importance ranks of CA and CS methods. In light of our findings, we provide guidelines for stakeholders and practitioners when performing model interpretation and directions for future research, e.g., future research is needed to investigate the impact of advanced feature interaction removal methods on computed feature importance ranks of different CS methods..
6. Ryujiro Nishinaka, Naoyasu Ubayashi, Yasutaka Kamei, Ryosuke Sato, How Fast and Effectively Can Code Change History Enrich Stack Overflow? , Proceedings - IEEE International Conference on Software Quality, Reliability and Security, QRS 2020, 10.1109/QRS51102.2020.00066, 467-478, 2020.12.
7. Yasutaka Kamei, Andy Zaidman, Guest editorial
Mining software repositories 2018, Empirical Software Engineering, 10.1007/s10664-020-09817-8, 25, 3, 2055-2057, 2020.05.
8. Masanari Kondo, Cor-Paul Bezemer, Yasutaka Kamei, Ahmed E. Hassan, Osamu Mizuno, The impact of feature reduction techniques on defect prediction models., Empirical Software Engineering, 10.1007/s10664-018-9679-5, 24, 4, 1925-1963, 2019.08, © 2019, Springer Science+Business Media, LLC, part of Springer Nature. Defect prediction is an important task for preserving software quality. Most prior work on defect prediction uses software features, such as the number of lines of code, to predict whether a file or commit will be defective in the future. There are several reasons to keep the number of features that are used in a defect prediction model small. For example, using a small number of features avoids the problem of multicollinearity and the so-called ‘curse of dimensionality’. Feature selection and reduction techniques can help to reduce the number of features in a model. Feature selection techniques reduce the number of features in a model by selecting the most important ones, while feature reduction techniques reduce the number of features by creating new, combined features from the original features. Several recent studies have investigated the impact of feature selection techniques on defect prediction. However, there do not exist large-scale studies in which the impact of multiple feature reduction techniques on defect prediction is investigated. In this paper, we study the impact of eight feature reduction techniques on the performance and the variance in performance of five supervised learning and five unsupervised defect prediction models. In addition, we compare the impact of the studied feature reduction techniques with the impact of the two best-performing feature selection techniques (according to prior work). The following findings are the highlights of our study: (1) The studied correlation and consistency-based feature selection techniques result in the best-performing supervised defect prediction models, while feature reduction techniques using neural network-based techniques (restricted Boltzmann machine and autoencoder) result in the best-performing unsupervised defect prediction models. In both cases, the defect prediction models that use the selected/generated features perform better than those that use the original features (in terms of AUC and performance variance). (2) Neural network-based feature reduction techniques generate features that have a small variance across both supervised and unsupervised defect prediction models. Hence, we recommend that practitioners who do not wish to choose a best-performing defect prediction model for their data use a neural network-based feature reduction technique..
9. Naoyasu Ubayashi, Yasutaka Kamei, Ryosuke Sato, When and Why Do Software Developers Face Uncertainty?, 19th IEEE International Conference on Software Quality, Reliability and Security, QRS 2019 Proceedings - 19th IEEE International Conference on Software Quality, Reliability and Security, QRS 2019, 10.1109/QRS.2019.00045, 288-299, 2019.07, Recently, many developers begin to notice that uncertainty is a crucial problem in software development. Unfortunately, no one knows how often uncertainty appears or what kinds of uncertainty exist in actual projects, because there are no empirical studies on uncertainty. To deal with this problem, we conduct a large-scale empirical study analyzing commit messages and revision histories of 1,444 OSS projects randomly selected from the GitHub repositories. The main findings are as follows: 1) Uncertainty exists in the ratio of 1.44% (average); 2) Uncertain program behavior, uncertain variable/value/name, and uncertain program defects are major kinds of uncertainty; and 3) Sometimes developers tend to take an action for not resolving but escaping or ignoring uncertainty. Uncertainty exists everywhere in a certain percentage and developers cannot ignore the existence of uncertainty..
10. Giancarlo Sierra, Emad Shihab, Yasutaka Kamei, A survey of self-admitted technical debt, Journal of Systems and Software, 10.1016/j.jss.2019.02.056, 152, 70-82, 2019.06, Technical Debt is a metaphor used to express sub-optimal source code implementations that are introduced for short-term benefits that often need to be paid back later, at an increased cost. In recent years, various empirical studies have focused on investigating source code comments that indicate Technical Debt often referred to as Self-Admitted Technical Debt (SATD). Since the introduction of SATD as a concept, an increasing number of studies have examined various aspects pertaining to SATD. Therefore, in this paper we survey research work on SATD, analyzing the characteristics of current approaches and techniques for SATD detection, comprehension, and repayment. To motivate the submission of novel and improved work, we compile tools, resources, and data sets made available to replicate or extend current SATD research. To set the stage for future work, we identify open challenges in the study of SATD, areas that are missing investigation, and discuss potential future research avenues..
11. Giancarlo Sierra, Emad Shihab, Yasutaka Kamei, A survey of self-admitted technical debt., Journal of Systems and Software, 10.1016/j.jss.2019.02.056, 152, 70-82, 2019.06, © 2019 Elsevier Inc. Technical Debt is a metaphor used to express sub-optimal source code implementations that are introduced for short-term benefits that often need to be paid back later, at an increased cost. In recent years, various empirical studies have focused on investigating source code comments that indicate Technical Debt often referred to as Self-Admitted Technical Debt (SATD). Since the introduction of SATD as a concept, an increasing number of studies have examined various aspects pertaining to SATD. Therefore, in this paper we survey research work on SATD, analyzing the characteristics of current approaches and techniques for SATD detection, comprehension, and repayment. To motivate the submission of novel and improved work, we compile tools, resources, and data sets made available to replicate or extend current SATD research. To set the stage for future work, we identify open challenges in the study of SATD, areas that are missing investigation, and discuss potential future research avenues..
12. Naoyasu Ubayashi, Takuya Watanabe, Yasutaka Kamei, Ryosuke Sato, Git-based integrated uncertainty manager, 41st IEEE/ACM International Conference on Software Engineering: Companion, ICSE-Companion 2019 Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering Companion, ICSE-Companion 2019, 10.1109/ICSE-Companion.2019.00047, 95-98, 2019.05, Nowadays, many software systems are required to be updated and delivered in a short period of time. It is important for developers to make software embrace uncertainty, because user requirements or design decisions are not always completely determined. This paper introduces iArch-U, an Eclipse-based uncertainty-aware software development tool chain, for developers to properly describe, trace, and manage uncertainty crosscutting over UML modeling, Java programming, and testing phases. Integrating with Git, iArch-U can manage why/when/where uncertain concerns arise or are fixed to be certain in a project. In this tool demonstration, we show the world of uncertainty-aware software development using iArch-U. Our tool is open source software released from http://posl.github.io/iArch/..
13. Shaiful Alam Chowdhury, Abram Hindle, Rick Kazman, Takumi Shuto, Ken Matsui, Yasutaka Kamei, GreenBundle: An Empirical Study on the Energy Impact of Bundled Processing, Proceedings - International Conference on Software Engineering, 10.1109/ICSE.2019.00114, 2019-May, 1107-1118, 2019.05, © 2019 IEEE. Energy consumption is a concern in the data-center and at the edge, on mobile devices such as smartphones. Software that consumes too much energy threatens the utility of the end-user's mobile device. Energy consumption is fundamentally a systemic kind of performance and hence it should be addressed at design time via a software architecture that supports it, rather than after release, via some form of refactoring. Unfortunately developers often lack knowledge of what kinds of designs and architectures can help address software energy consumption. In this paper we show that some simple design choices can have significant effects on energy consumption. In particular we examine the Model-View-Controller architectural pattern and demonstrate how converting to Model-View-Presenter with bundling can improve the energy performance of both benchmark systems and real world applications. We show the relationship between energy consumption and bundled and delayed view updates: bundling events in the presenter can often reduce energy consumption by 30%..
14. Hoa Khanh Dam, Truyen Tran, John Grundy, Aditya Ghose, Yasutaka Kamei, Towards effective AI-powered agile project management, 41st IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Results, ICSE-NIER 2019 Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering New Ideas and Emerging Results, ICSE-NIER 2019, 10.1109/ICSE-NIER.2019.00019, 41-44, 2019.05, The rise of Artificial intelligence (AI) has the potential to significantly transform the practice of project management. Project management has a large socio-technical element with many uncertainties arising from variability in human aspects, e.g. customers' needs, developers' performance and team dynamics. AI can assist project managers and team members by automating repetitive, high-volume tasks to enable project analytics for estimation and risk prediction, providing actionable recommendations, and even making decisions. AI is potentially a game changer for project management in helping to accelerate productivity and increase project success rates. In this paper, we propose a framework where AI technologies can be leveraged to offer support for managing agile projects, which have become increasingly popular in the industry..
15. Naoyasu Ubayashi, Takuya Watanabe, Yasutaka Kamei, Ryosuke Sato, Git-based integrated uncertainty manager, Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion, ICSE-Companion 2019, 10.1109/ICSE-Companion.2019.00047, 95-98, 2019.05, © 2019 IEEE. Nowadays, many software systems are required to be updated and delivered in a short period of time. It is important for developers to make software embrace uncertainty, because user requirements or design decisions are not always completely determined. This paper introduces iArch-U, an Eclipse-based uncertainty-aware software development tool chain, for developers to properly describe, trace, and manage uncertainty crosscutting over UML modeling, Java programming, and testing phases. Integrating with Git, iArch-U can manage why/when/where uncertain concerns arise or are fixed to be certain in a project. In this tool demonstration, we show the world of uncertainty-aware software development using iArch-U. Our tool is open source software released from http://posl.github.io/iArch/..
16. Shaiful Alam Chowdhury, Abram Hindle, Rick Kazman, Takumi Shuto, Ken Matsui, Yasutaka Kamei, GreenBundle: An Empirical Study on the Energy Impact of Bundled Processing, Proceedings - International Conference on Software Engineering, 10.1109/ICSE.2019.00114, 2019-May, 1107-1118, 2019.05, © 2019 IEEE. Energy consumption is a concern in the data-center and at the edge, on mobile devices such as smartphones. Software that consumes too much energy threatens the utility of the end-user's mobile device. Energy consumption is fundamentally a systemic kind of performance and hence it should be addressed at design time via a software architecture that supports it, rather than after release, via some form of refactoring. Unfortunately developers often lack knowledge of what kinds of designs and architectures can help address software energy consumption. In this paper we show that some simple design choices can have significant effects on energy consumption. In particular we examine the Model-View-Controller architectural pattern and demonstrate how converting to Model-View-Presenter with bundling can improve the energy performance of both benchmark systems and real world applications. We show the relationship between energy consumption and bundled and delayed view updates: bundling events in the presenter can often reduce energy consumption by 30%..
17. Hoa Khanh Dam, Truyen Tran, John Grundy, Aditya Ghose, Yasutaka Kamei, Towards effective AI-powered agile project management, Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results, ICSE-NIER 2019, 10.1109/ICSE-NIER.2019.00019, 41-44, 2019.05, © 2019 IEEE. The rise of Artificial intelligence (AI) has the potential to significantly transform the practice of project management. Project management has a large socio-technical element with many uncertainties arising from variability in human aspects, e.g. customers' needs, developers' performance and team dynamics. AI can assist project managers and team members by automating repetitive, high-volume tasks to enable project analytics for estimation and risk prediction, providing actionable recommendations, and even making decisions. AI is potentially a game changer for project management in helping to accelerate productivity and increase project success rates. In this paper, we propose a framework where AI technologies can be leveraged to offer support for managing agile projects, which have become increasingly popular in the industry..
18. Naoyasu Ubayashi, Yasutaka Kamei, Ryosuke Sato, IARCH-U/MC
An uncertainty-aware model checker for embracing known unknowns, 13th International Conference on Software Technologies, ICSOFT 2018 ICSOFT 2018 - Proceedings of the 13th International Conference on Software Technologies, 176-184, 2019.01, Embracing uncertainty in software development is one of the crucial research topics in software engineering. In most projects, we have to deal with uncertain concerns by using informal ways such as documents, mailing lists, or issue tracking systems. This task is tedious and error-prone. Especially, uncertainty in programming is one of the challenging issues to be tackled, because it is difficult to verify the correctness of a program when there are uncertain user requirements, unfixed design choices, and alternative algorithms. This paper proposes iArch-U/MC, an uncertainty-aware model checker for verifying whether or not some important properties are guaranteed even if Known Unknowns remain in a program. Our tool is based on LTSA (Labelled Transition System Analyzer) and is implemented as an Eclipse plug-in..
19. Gopi Krishnan Rajbahadur, Shaowei Wang, Yasutaka Kamei, Ahmed E. Hassan, Impact of Discretization Noise of the Dependent variable on Machine Learning Classifiers in Software Engineering, IEEE Transactions on Software Engineering, 10.1109/TSE.2019.2924371, 2019.01, Researchers usually discretize a continuous dependent variable into two target classes by introducing an artificial discretization threshold (e.g., median). However, such discretization may introduce noise (i.e., discretization noise) due to ambiguous class loyalty of data points that are close to the artificial threshold. Previous studies do not provide a clear directive on the impact of discretization noise on the classifiers and how to handle such noise. In this paper, we propose a framework to help researchers and practitioners systematically estimate the impact of discretization noise on classifiers in terms of its impact on various performance measures and the interpretation of classifiers. Through a case study of 7 software engineering datasets, we find that: 1) discretization noise affects the different performance measures of a classifier differently for different datasets; 2) Though the interpretation of the classifiers are impacted by the discretization noise on the whole, the top 3 most important features are not affected by the discretization noise. Therefore, we suggest that practitioners and researchers use our framework to understand the impact of discretization noise on the performance of their built classifiers and estimate the exact amount of discretization noise to be discarded from the dataset to avoid the negative impact of such noise..
20. Masanari Kondo, Cor Paul Bezemer, Yasutaka Kamei, Ahmed E. Hassan, Osamu Mizuno, The impact of feature reduction techniques on defect prediction models, Empirical Software Engineering, 10.1007/s10664-018-9679-5, 2019.01, Defect prediction is an important task for preserving software quality. Most prior work on defect prediction uses software features, such as the number of lines of code, to predict whether a file or commit will be defective in the future. There are several reasons to keep the number of features that are used in a defect prediction model small. For example, using a small number of features avoids the problem of multicollinearity and the so-called ‘curse of dimensionality’. Feature selection and reduction techniques can help to reduce the number of features in a model. Feature selection techniques reduce the number of features in a model by selecting the most important ones, while feature reduction techniques reduce the number of features by creating new, combined features from the original features. Several recent studies have investigated the impact of feature selection techniques on defect prediction. However, there do not exist large-scale studies in which the impact of multiple feature reduction techniques on defect prediction is investigated. In this paper, we study the impact of eight feature reduction techniques on the performance and the variance in performance of five supervised learning and five unsupervised defect prediction models. In addition, we compare the impact of the studied feature reduction techniques with the impact of the two best-performing feature selection techniques (according to prior work). The following findings are the highlights of our study: (1) The studied correlation and consistency-based feature selection techniques result in the best-performing supervised defect prediction models, while feature reduction techniques using neural network-based techniques (restricted Boltzmann machine and autoencoder) result in the best-performing unsupervised defect prediction models. In both cases, the defect prediction models that use the selected/generated features perform better than those that use the original features (in terms of AUC and performance variance). (2) Neural network-based feature reduction techniques generate features that have a small variance across both supervised and unsupervised defect prediction models. Hence, we recommend that practitioners who do not wish to choose a best-performing defect prediction model for their data use a neural network-based feature reduction technique..
21. Yasutaka Kamei, Takahiro Matsumoto, Kazuhiro Yamashita, Naoyasu Ubayashi, Takashi Iwasaki, Shuichi Takayama, Studying the Cost and Effectiveness of OSS Quality Assessment Models: An Experience Report of Fujitsu QNET, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.2018EDP7163, E101D, 11, 2744-2753, 2018.11, Nowadays, open source software (OSS) systems are adopted by proprietary software projects. To reduce the risk of using problematic OSS systems (e.g., causing system crashes), it is important for proprietary software projects to assess OSS systems in advance. Therefore, OSS quality assessment models are studied to obtain information regarding the quality of OSS systems. Although the OSS quality assessment models are partially validated using a small number of case studies, to the best of our knowledge, there are few studies that empirically report how industrial projects actually use OSS quality assessment models in their own development process. In this study, we empirically evaluate the cost and effectiveness of OSS quality assessment models at Fujitsu Kyushu Network Technologies Limited (Fujitsu QNET). To conduct the empirical study, we collect datasets from (a) 120 OSS projects that Fujitsu QNET's projects actually used and (b) 10 problematic OSS projects that caused major problems in the projects. We find that (1) it takes average and median times of 51 and 49 minutes, respectively, to gather all assessment metrics per OSS project and (2) there is a possibility that we can filter problematic OSS systems by using the threshold derived from a pool of assessment metrics. Fujitsu QNET's developers agree that our results lead to improvements in Fujitsu QNET's OSS assessment process. We believe that our work significantly contributes to the empirical knowledge about applying OSS assessment techniques to industrial projects..
22. Yasutaka Kamei, Takahiro Matsumoto, Kazuhiro Yamashita, Naoyasu Ubayashi, Takashi Iwasaki, Shuichi Takayama, Studying the Cost and Effectiveness of OSS Quality Assessment Models: An Experience Report of Fujitsu QNET, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.2018EDP7163, E101D, 11, 2744-2753, 2018.11, Nowadays, open source software (OSS) systems are adopted by proprietary software projects. To reduce the risk of using problematic OSS systems (e.g., causing system crashes), it is important for proprietary software projects to assess OSS systems in advance. Therefore, OSS quality assessment models are studied to obtain information regarding the quality of OSS systems. Although the OSS quality assessment models are partially validated using a small number of case studies, to the best of our knowledge, there are few studies that empirically report how industrial projects actually use OSS quality assessment models in their own development process. In this study, we empirically evaluate the cost and effectiveness of OSS quality assessment models at Fujitsu Kyushu Network Technologies Limited (Fujitsu QNET). To conduct the empirical study, we collect datasets from (a) 120 OSS projects that Fujitsu QNET's projects actually used and (b) 10 problematic OSS projects that caused major problems in the projects. We find that (1) it takes average and median times of 51 and 49 minutes, respectively, to gather all assessment metrics per OSS project and (2) there is a possibility that we can filter problematic OSS systems by using the threshold derived from a pool of assessment metrics. Fujitsu QNET's developers agree that our results lead to improvements in Fujitsu QNET's OSS assessment process. We believe that our work significantly contributes to the empirical knowledge about applying OSS assessment techniques to industrial projects..
23. Junji Shimagaki, Yasutaka Kamei, Naoyasu Ubayashi, Abram Hindle, Automatic topic classification of test cases using text mining at an Android smartphone vendor, 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018 Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018, 10.1145/3239235.3268927, 2018.10, Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested and verified?". This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called "feature labels (FLs)", and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups. Aim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling. Method: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company. Results: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label. Conclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results..
24. Junji Shimagaki, Yasutaka Kamei, Naoyasu Ubayashi, Abram Hindle, Automatic topic classification of test cases using text mining at an Android smartphone vendor, International Symposium on Empirical Software Engineering and Measurement, 10.1145/3239235.3268927, 32-10, 2018.10, © 2018 ACM. Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested and verified?". This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called "feature labels (FLs)", and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups. Aim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling. Method: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company. Results: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label. Conclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results..
25. Takashi Watanabe, Akito Monden, Zeynep Yucel, Yasutaka Kamei, Shuji Morisaki, Cross-Validation-Based Association Rule Prioritization Metric for Software Defect Characterization, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.2018EDP7020, E101D, 9, 2269-2278, 2018.09, Association rule mining discovers relationships among variables in a data set, representing them as rules. These are expected to often have predictive abilities, that is, to be able to predict future events, but commonly used rule interestingness measures, such as support and confidence, do not directly assess their predictive power. This paper proposes a cross-validation -based metric that quantifies the predictive power of such rules for characterizing software defects. The results of evaluation this metric experimentally using four open-source data sets (Mylyn, NetBeans, Apache Ant and jEdit) show that it can improve rule prioritization performance over conventional metrics (support, confidence and odds ratio) by 72.8% for Mylyn, 15.0% for NetBeans, 10.5% for Apache Ant and 0 for jEdit in terms of SumNormPre(100) precision criterion. This suggests that the proposed metric can provide better rule prioritization performance than conventional metrics and can at least provide similar performance even in the worst case..
26. Takashi Watanabe, Akito Monden, Zeynep Yucel, Yasutaka Kamei, Shuji Morisaki, Cross-Validation-Based Association Rule Prioritization Metric for Software Defect Characterization, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.2018EDP7020, E101D, 9, 2269-2278, 2018.09, Association rule mining discovers relationships among variables in a data set, representing them as rules. These are expected to often have predictive abilities, that is, to be able to predict future events, but commonly used rule interestingness measures, such as support and confidence, do not directly assess their predictive power. This paper proposes a cross-validation -based metric that quantifies the predictive power of such rules for characterizing software defects. The results of evaluation this metric experimentally using four open-source data sets (Mylyn, NetBeans, Apache Ant and jEdit) show that it can improve rule prioritization performance over conventional metrics (support, confidence and odds ratio) by 72.8% for Mylyn, 15.0% for NetBeans, 10.5% for Apache Ant and 0 for jEdit in terms of SumNormPre(100) precision criterion. This suggests that the proposed metric can provide better rule prioritization performance than conventional metrics and can at least provide similar performance even in the worst case..
27. Shane McIntosh, Yasutaka Kamei, Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction, IEEE Transactions on Software Engineering, 10.1109/TSE.2017.2693980, 44, 5, 412-428, 2018.05, Just-In-Time (JIT) models identify fix-inducing code changes. JIT models are trained using techniques that assume that past fix-inducing changes are similar to future ones. However, this assumption may not hold, e.g., as system complexity tends to accrue, expertise may become more important as systems age. In this paper, we study JIT models as systems evolve. Through a longitudinal case study of 37,524 changes from the rapidly evolving Qt and OpenStack systems, we find that fluctuations in the properties of fix-inducing changes can impact the performance and interpretation of JIT models. More specifically: (a) the discriminatory power (AUC) and calibration (Brier) scores of JIT models drop considerably one year after being trained; (b) the role that code change properties (e.g., Size, Experience) play within JIT models fluctuates over time; and (c) those fluctuations yield over- and underestimates of the future impact of code change properties on the likelihood of inducing fixes. To avoid erroneous or misleading predictions, JIT models should be retrained using recently recorded data (within three months). Moreover, quality improvement plans should be informed by JIT models that are trained using six months (or more) of historical data, since they are more resilient to period-specific fluctuations in the importance of code change properties..
28. Ariel Rodriguez, Fumiya Tanaka, Yasutaka Kamei, Empirical study on the relationship between developer's working habits and efficiency, 15th ACM/IEEE International Conference on Mining Software Repositories, MSR 2018, co-located with the 40th International Conference on Software Engineering, ICSE 2018 Proceedings - 2018 ACM/IEEE 15th International Conference on Mining Software Repositories, MSR 2018, 10.1145/3196398.3196458, 74-77, 2018.05, Software developers can have a reputation for frequently working long and irregular hours which are widely considered to inhibit mental capacity and negatively affect work quality. This paper analyzes the working habits of software developers and the effects these habits have on efficiency based on a large amount of data extracted from the actions of developers in the IDE (Integrated Development Environment), Visual Studio. We use events that recorded the times at which all developer actions were performed along with the numbers of successful and failed build and test events. Due to the high level of detail of the events provided by KaVE project's tool, we were able to analyze the data in a way that previous studies have not been able to. We structure our study along three dimensions: (1) days of the week, (2) time of the day, and (3) continuous work. Our findings will help software developers and team leaders to appropriatly allocate working times and to maximize work quality..
29. Shane McIntosh, Yasutaka Kamei, Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction, IEEE Transactions on Software Engineering, 10.1109/TSE.2017.2693980, 44, 5, 412-428, 2018.05, © 2017 IEEE. Just-In-Time (JIT) models identify fix-inducing code changes. JIT models are trained using techniques that assume that past fix-inducing changes are similar to future ones. However, this assumption may not hold, e.g., as system complexity tends to accrue, expertise may become more important as systems age. In this paper, we study JIT models as systems evolve. Through a longitudinal case study of 37,524 changes from the rapidly evolving Qt and OpenStack systems, we find that fluctuations in the properties of fix-inducing changes can impact the performance and interpretation of JIT models. More specifically: (a) the discriminatory power (AUC) and calibration (Brier) scores of JIT models drop considerably one year after being trained; (b) the role that code change properties (e.g., Size, Experience) play within JIT models fluctuates over time; and (c) those fluctuations yield over- and underestimates of the future impact of code change properties on the likelihood of inducing fixes. To avoid erroneous or misleading predictions, JIT models should be retrained using recently recorded data (within three months). Moreover, quality improvement plans should be informed by JIT models that are trained using six months (or more) of historical data, since they are more resilient to period-specific fluctuations in the importance of code change properties..
30. Xiaochen Li, He Jiang, Yasutaka Kamei, Xin Chen, Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding, IEEE Transactions on Software Engineering, 10.1109/TSE.2018.2876006, 2018.01, Developers increasingly rely on text matching tools to analyze the relation between natural language words and APIs. However, semantic gaps, namely textual mismatches between words and APIs, negatively affect these tools. Previous studies have transformed words or APIs into low-dimensional vectors for matching; however, inaccurate results were obtained due to the failure of modeling words and APIs simultaneously. To resolve this problem, two main challenges are to be addressed: the acquisition of massive words and APIs for mining and the alignment of words and APIs for modeling. Therefore, this study proposes Word2API to effectively estimate relatedness of words and APIs. Word2API collects millions of commonly used words and APIs from code repositories to address the acquisition challenge. Then, a shuffling strategy is used to transform related words and APIs into tuples to address the alignment challenge. Using these tuples, Word2API models words and APIs simultaneously. Word2API outperforms baselines by 10%-49.6% of relatedness estimation in terms of precision and NDCG. Word2API is also effective on solving typical software tasks, e.g., query expansion and API documents linking. A simple system with Word2API-expanded queries recommends up to 21.4% more related APIs for developers. Meanwhile, Word2API improves comparison algorithms by 7.9%-17.4% in linking questions in Question&Answer communities to API documents..
31. Keisuke Watanabe, Naoyasu Ubayashi, Takuya Fukamachi, Shunya Nakamura, Hokuto Muraoka, Yasutaka Kamei, IArch-U: Interface-Centric Integrated Uncertainty-Aware Development Environment, Proceedings - 2017 IEEE/ACM 9th International Workshop on Modelling in Software Engineering, MiSE 2017, 10.1109/MiSE.2017.7, 40-46, 2017.06, © 2017 IEEE. Uncertainty can appear in all aspects of software development: Uncertainty in requirements analysis, design decisions, implementation and testing. If uncertainty can be dealt with modularly, we can add or delete uncertain concerns to/from models, code and tests whenever these concerns arise or are fixed to certain concerns. To deal with this problem, we developed iArch-U, an IDE (Integrated Development Environment) for managing uncertainty modularly in all phases in software development. In this paper, we introduce an overview of iArch-U. The iArch-U IDE is open source software and can be downloaded from GitHub..
32. Gopi Krishnan Rajbahadur, Shaowei Wang, Yasutaka Kamei, Ahmed E. Hassan, The impact of using regression models to build defect classifiers, IEEE International Working Conference on Mining Software Repositories, 10.1109/MSR.2017.4, 135-145, 2017.06, © 2017 IEEE. It is common practice to discretize continuous defect counts into defective and non-defective classes and use them as a target variable when building defect classifiers (discretized classifiers). However, this discretization of continuous defect counts leads to information loss that might affect the performance and interpretation of defect classifiers. Another possible approach to build defect classifiers is through the use of regression models then discretizing the predicted defect counts into defective and non-defective classes (regression-based classifiers). In this paper, we compare the performance and interpretation of defect classifiers that are built using both approaches (i.e., discretized classifiers and regression-based classifiers) across six commonly used machine learning classifiers (i.e., linear/logistic regression, random forest, KNN, SVM, CART, and neural networks) and 17 datasets. We find that: i) Random forest based classifiers outperform other classifiers (best AUC) for both classifier building approaches, ii) In contrast to common practice, building a defect classifier using discretized defect counts (i.e., discretized classifiers) does not always lead to better performance. Hence we suggest that future defect classification studies should consider building regression-based classifiers (in particular when the defective ratio of the modeled dataset is low). Moreover, we suggest that both approaches for building defect classifiers should be explored, so the best-performing classifier can be used when determining the most influential features..
33. Keisuke Watanabe, Naoyasu Ubayashi, Takuya Fukamachi, Shunya Nakamura, Hokuto Muraoka, Yasutaka Kamei, iArch-U: Interface-Centric Integrated Uncertainty-aware Development Environment, International Workshop on Modeling in Software Engineering (MiSE2017), 2017.05.
34. Gopi Krishnan Rajbahadur, Shaowei Wang, Yasutaka Kamei, Ahmed E. Hassan, The Impact Of Using Regression Models to Build Defect Classifiers, International Conference on Mining Software Repositories (MSR 2017), 2017.05.
35. Keisuke Watanabe, Takuya Fukamachi, Naoyasu Ubayashi, Yasutaka Kamei, Automated A/B Testing with Declarative Variability Expressions, Proceedings - 10th IEEE International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2017, 10.1109/ICSTW.2017.72, 387-388, 2017.04, © 2017 IEEE. A/B testing is the experiment strategy, which is often used on web or mobile application development. In A/B testing, a developer has to implement multiple variations of application, assign each variation to a subset of the entire user population randomly, and analyze log data to decide which variation should be used as a final product. Therefore, it is challenging to keep the application code clean in A/B testing, because defining variations of software or assigning user to each variation needs the modification of code. In fact there are some existing tools to approach this problem. Considering such a context of A/B testing research, we propose the solution based on the interface Archface-U and AOP (Aspect Oriented Programming) which aims to minimize the complication of code in A/B testing..
36. Keisuke Watanabe, Takuya Fukamachi, Naoyasu Ubayashi, Yasutaka Kamei, Automated A/B Testing with Declarative Variability Expressions, Proceedings - 10th IEEE International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2017, 10.1109/ICSTW.2017.72, 387-388, 2017.04, © 2017 IEEE. A/B testing is the experiment strategy, which is often used on web or mobile application development. In A/B testing, a developer has to implement multiple variations of application, assign each variation to a subset of the entire user population randomly, and analyze log data to decide which variation should be used as a final product. Therefore, it is challenging to keep the application code clean in A/B testing, because defining variations of software or assigning user to each variation needs the modification of code. In fact there are some existing tools to approach this problem. Considering such a context of A/B testing research, we propose the solution based on the interface Archface-U and AOP (Aspect Oriented Programming) which aims to minimize the complication of code in A/B testing..
37. Pawin Suthipornopas, Pattara Leelaprute, Akito Monden, Hidetake Uwano, Yasutaka Kamei, Naoyasu Ubayashi, Kenji Araki, Kingo Yamada, Ken-ichi Matsumoto, Industry Application of Software Development Task Measurement System : TaskPit, IEICE Transactions on Information and Systems, Vol.E100-D, No.3, pp.(To Appear), 2017.03.
38. Pawin Suthipornopas, Pattara Leelaprute, Akito Monden, Hidetake Uwano, Yasutaka Kamei, Naoyasu Ubayashi, Kenji Araki, Kingo Yamada, Ken Ichi Matsumoto, Industry Application of Software Development Task Measurement System: TaskPit, IEICE Transactions on Information and Systems, 10.1587/transinf.2016EDP7222, E100D, 3, 462-472, 2017.03, © 2017 The Institute of Electronics, Information and Communication Engineers. To identify problems in a software development process, we have been developing an automated measurement tool called TaskPit, which monitors software development tasks such as programming, testing and documentation based on the execution history of software applications. This paper introduces the system requirements, design and implementation of TaskPit; then, presents two real-world case studies applying TaskPit to actual software development. In the first case study, we applied TaskPit to 12 software developers in a certain software development division. As a result, several concerns (to be improved) have been revealed such as (a) a project leader spent too much time on development tasks while he was supposed to be a manager rather than a developer, (b) several developers rarely used e-mails despite the company's instruction to use e-mail as much as possible to leave communication records during development, and (c) several developers wrote too long e-mails to their customers. In the second case study, we have recorded the planned, actual, and self reported time of development tasks. As a result, we found that (d) there were unplanned tasks in more than half of days, and (e) the declared time became closer day by day to the actual time measured by TaskPit. These findings suggest that TaskPit is useful not only for a project manager who is responsible for process monitoring and improvement but also for a developer who wants to improve by him/herself..
39. Junji Shimagaki, Yasutaka Kamei, Shane McIntosh, David Pursehouse, Naoyasu Ubayashi, Why are commits being reverted? A comparative study of industrial and open source projects, Proceedings - 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016, 10.1109/ICSME.2016.83, 301-311, 2017.01, © 2016 IEEE. Software development is a cyclic process of integrating new features while introducing and fixing defects. During development, commits that modify source code files are uploaded to version control systems. Occasionally, these commits need to be reverted, i.e., the code changes need to be completely backed out of the software project. While one can often speculate about the purpose of reverted commits (e.g., the commit may have caused integration or build problems), little empirical evidence exists to substantiate such claims. The goal of this paper is to better understand why commits are reverted in large software systems. To that end, we quantitatively and qualitatively study two proprietary and four open source projects to measure: (1) the proportion of commits that are reverted, (2) the amount of time that commits that are eventually reverted linger within a codebase, and (3) the most frequent reasons why commits are reverted. Our results show that 1%-5% of the commits in the studied systems are reverted. Those commits that are eventually reverted linger within the studied codebases for 1-35 days (median). Furthermore, we identify 13 common reasons for reverting commits, and observe that the frequency of reverted commits of each reason varies broadly from project to project. A complementary qualitative analysis suggests that many reverted commits could have been avoided with better team communication and change awareness. Our findings made Sony Mobile's stakeholders aware that internally reverted commits can be reduced by paying more attention to their own changes. On the other hand, externally reverted commits could be minimized only if external stakeholders are involved to improve inter-company communication or requirements elicitation..
40. Yasutaka Kamei, Everton Maldonado, Emad Shihab, Naoyasu Ubayashi, Using Analytics to Quantify the Interest of Self-Admitted Technical Debt, International Workshop on Technical Debt Analytics (TDA2016), pp.1-4, December 2016., 2016.12.
41. Junji Shimagaki, Yasutaka Kamei, Shane Mcintosh, David Pursehouse and Naoyasu Ubayashi, Why are Commits being Reverted? A Comparative Study of Industrial and Open Source Projects, International Conference on Software Maintenance and Evolution (ICSME2016), pp.301-311, October 2016. (Raleigh, North Carolina, USA), 2016.10, Software development is a cyclic process of integrating new features while introducing and fixing defects. During development, commits that modify source code files are uploaded to version control systems. Occasionally, these commits need to be reverted, i.e., the code changes need to be completely backed out of the software project. While one can often speculate about the purpose of reverted commits (e.g., the commit may have caused integration or build problems), little empirical evidence exists to substantiate such claims. The goal of this paper is to better understand why commits are reverted in large software systems. To that end, we quantitatively and qualitatively study two proprietary and four open source projects to measure: (1) the proportion of commits that are reverted, (2) the amount of time that commits that are eventually reverted linger within a codebase, and (3) the most frequent reasons why commits are reverted. Our results show that 1%-5% of the commits in the studied systems are reverted. Those commits that are eventually reverted linger within the studied codebases for 1-35 days (median). Furthermore, we identify 13 common reasons for reverting commits, and observe that the frequency of reverted commits of each reason varies broadly from project to project. A complementary qualitative analysis suggests that many reverted commits could have been avoided with better team communication and change awareness. Our findings made Sony Mobile’s stakeholders aware that internally reverted commits can be reduced by paying more attention to their own changes. On the other hand, externally reverted commits could be minimized only if external stakeholders are involved to improve inter-company communication or requirements elicitation..
42. Shane McIntosh, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan, An empirical study of the impact of modern code review practices on software quality, Empirical Software Engineering, 10.1007/s10664-015-9381-9, 21, 5, 2146-2189, 2016.10, © 2015, Springer Science+Business Media New York. Software code review, i.e., the practice of having other team members critique changes to a software system, is a well-established best practice in both open source and proprietary software domains. Prior work has shown that formal code inspections tend to improve the quality of delivered software. However, the formal code inspection process mandates strict review criteria (e.g., in-person meetings and reviewer checklists) to ensure a base level of review quality, while the modern, lightweight code reviewing process does not. Although recent work explores the modern code review process, little is known about the relationship between modern code review practices and long-term software quality. Hence, in this paper, we study the relationship between post-release defects (a popular proxy for long-term software quality) and: (1) code review coverage, i.e., the proportion of changes that have been code reviewed, (2) code review participation, i.e., the degree of reviewer involvement in the code review process, and (3) code reviewer expertise, i.e., the level of domain-specific expertise of the code reviewers. Through a case study of the Qt, VTK, and ITK projects, we find that code review coverage, participation, and expertise share a significant link with software quality. Hence, our results empirically confirm the intuition that poorly-reviewed code has a negative impact on software quality in large systems using modern reviewing tools..
43. Kwabena Ebo Bennin, Koji Toda, Yasutaka Kamei, Jacky Keung, Akito Monden, Naoyasu Ubayashi, Empirical Evaluation of Cross-Release Effort-Aware Defect Prediction Models, Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016, 10.1109/QRS.2016.33, 214-221, 2016.10, © 2016 IEEE. To prioritize quality assurance efforts, various fault prediction models have been proposed. However, the best performing fault prediction model is unknown due to three major drawbacks: (1) comparison of few fault prediction models considering small number of data sets, (2) use of evaluation measures that ignore testing efforts and (3) use of n-fold cross-validation instead of the more practical cross-release validation. To address these concerns, we conducted cross-release evaluation of 11 fault density prediction models using data sets collected from 2 releases of 25 open source software projects with an effort-Aware performance measure known as Norm(Popt). Our result shows that, whilst M5 and K∗ had the best performances, they were greatly influenced by the percentage of faulty modules present and size of data set. Using Norm(Popt) produced an overall average performance of more than 50% across all the selected models clearly indicating the importance of considering testing efforts in building fault-prone prediction models..
44. Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, Ahmed E. Hassan, Studying just-in-time defect prediction using cross-project models, Empirical Software Engineering, 10.1007/s10664-015-9400-x, 21, 5, 2072-2106, 2016.10, © 2015, Springer Science+Business Media New York. Unlike traditional defect prediction models that identify defect-prone modules, Just-In-Time (JIT) defect prediction models identify defect-inducing changes. As such, JIT defect models can provide earlier feedback for developers, while design decisions are still fresh in their minds. Unfortunately, similar to traditional defect models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this limitation in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from other projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT models in a cross-project context. Through an empirical study on 11 open source projects, we find that while JIT models rarely perform well in a cross-project context, their performance tends to improve when using approaches that: (1) select models trained using other projects that are similar to the testing project, (2) combine the data of several other projects to produce a larger pool of training data, and (3) combine the models of several other projects to produce an ensemble model. Our findings empirically confirm that JIT models learned using other projects are a viable solution for projects with limited historical data. However, JIT models tend to perform best in a cross-project context when the data used to learn them are carefully selected..
45. Xin Xia, Emad Shihab, Yasutaka Kamei, David Lo and Xinyu Wang, Predicting Crashing Releases of Mobile Applications, International Symposium on Empirical Software Engineering and Measurement (ESEM), (To appear). (Ciudad Real, Spain)., pp.29:1-29:10, September 2016. (Ciudad Real, Spain)., 2016.09, Context: The quality of mobile applications has a vital impact on their user’s experience, ratings and ultimately overall success. Given the high competition in the mobile application market, i.e., many mobile applications perform the same or similar functionality, users of mobile apps tend to be less tolerant to quality issues.
Goal: Therefore, identifying these crashing releases early on so that they can be avoided will help mobile app developers keep their user base and ensure the overall success of their apps.
Method: To help mobile developers, we use machine learning techniques to effectively predict mobile app releases that are more likely to cause crashes, i.e., crashing releases. To perform our prediction, we mine and use a number of factors about the mobile releases, that are grouped into six unique dimensions: complexity, time, code, diffusion, commit, and text, and use a Naive Bayes classified to perform our prediction.
Results: We perform an empirical study on 10 open source mobile applications containing a total of 2,638 releases from the F-Droid repository. On average, our approach can achieve F1 and AUC scores that improve over a baseline (random) predictor by 50% and 28%, respectively. We also find that factors related to text extracted from the commit logs prior to a release are the best predictors of crashing releases and have the largest effect.
Conclusions: Ourproposedapproachcouldhelptoidentifycrash releases for mobile apps..
46. Keisuke Miura, Shane Mcintosh, Yasutaka Kamei, Ahmed E. Hassan and Naoyasu Ubayashi, The Impact of Task Granularity on Co-evolution Analyses, International Symposium on Empirical Software Engineering and Measurement (ESEM), (To appear). (Ciudad Real, Spain)., 2016.09, Aim: In this paper, we set out to understand the impact that the revision granularity has on co-change analyses. Method: We conduct an empirical study of 14 open source systems that are developed by the Apache Software Foundation. To understand the impact that the revision granularity may have on co-change activity, we study work items, i.e., logical groups of revisions that address a single issue. Results: We find that work item grouping has the poten- tial to impact co-change activity, since 29% of work items consist of 2 or more revisions in 7 of the 14 studied systems. Deeper quantitative analysis shows that, in 7 of the 14 studied systems: (1) 11% of largest work items are entirely composed of small revisions, and would be missed by traditional approaches to filter or analyze large changes, (2) 83% of revisions that co-change under a single work item cannot be grouped using the typical configuration of the sliding time window technique and (3) 48% of work items that involve multiple developers cannot be grouped at the revision-level. Conclusions: Since the work item granularity is the natural means that practitioners use to separate development tasks, future software evolution studies, especially co-change analyses, should be conducted at the work item level..
47. Xin Xia, Emad Shihab, Yasutaka Kamei, David Lo, Xinyu Wang, Predicting Crashing Releases of Mobile Applications, International Symposium on Empirical Software Engineering and Measurement, 10.1145/2961111.2962606, 08-09-September-2016, 2016.09, © 2016 ACM. Context: The quality of mobile applications has a vital impact on their user's experience, ratings and ultimately overall success. Given the high competition in the mobile application market, i.e., many mobile applications perform the same or similar functionality, users of mobile apps tend to be less tolerant to quality issues. Goal: Therefore, identifying these crashing releases early on so that they can be avoided will help mobile app developers keep their user base and ensure the overall success of their apps. Method: To help mobile developers, we use machine learning techniques to effectively predict mobile app releases that are more likely to cause crashes, i.e., crashing releases. To perform our prediction, we mine and use a number of factors about the mobile releases, that are grouped into six unique dimensions: complexity, time, code, diffusion, commit, and text, and use a Naive Bayes classified to perform our prediction. Results: We perform an empirical study on 10 open source mobile applications containing a total of 2,638 releases from the F-Droid repository. On average, our approach can achieve F1 and AUC scores that improve over a baseline (random) predictor by 50% and 28%, respectively. We also find that factors related to text extracted from the commit logs prior to a release are the best predictors of crashing releases and have the largest effect. Conclusions: Our proposed approach could help to identify crash releases for mobile apps..
48. Keisuke Miura, Shane McIntosh, Yasutaka Kamei, Ahmed E. Hassan, Naoyasu Ubayashi, The Impact of Task Granularity on Co-evolution Analyses, International Symposium on Empirical Software Engineering and Measurement, 10.1145/2961111.2962607, 08-09-September-2016, 2016.09, © 2016 ACM. Background: Substantial research in the software evolution field aims to recover knowledge about development from the project history that is archived in repositories, such as a Version Control System (VCS). However, the data that is archived in these repositories can be analyzed at different levels of granularity. Although software evolution is a well-studied phenomenon at the revision-level, revisions may be too fine-grained to accurately represent development tasks. Aim: In this paper, we set out to understand the impact that the revision granularity has on co-change analyses. Method: We conduct an empirical study of 14 open source systems that are developed by the Apache Software Foundation. To understand the impact that the revision granularity may have on co-change activity, we study work items, i.e., logical groups of revisions that address a single issue. Results: We find that work item grouping has the potential to impact co-change activity, since 29% of work items consist of 2 or more revisions in 7 of the 14 studied systems. Deeper quantitative analysis shows that, in 7 of the 14 studied systems: (1) 11% of largest work items are entirely composed of small revisions, and would be missed by traditional approaches to filter or analyze large changes, (2) 83% of revisions that co-change under a single work item cannot be grouped using the typical configuration of the sliding time window technique and (3) 48% of work items that involve multiple developers cannot be grouped at the revision-level. Conclusions: Since the work item granularity is the natural means that practitioners use to separate development tasks, future software evolution studies, especially co-change analyses, should be conducted at the work item level..
49. Kwabena Ebo Bennin, Koji Toda, Yasutaka Kamei, Jacky Keung, Akito Monden and Naoyasu Ubayashi, Empirical Evaluation of Cross-Release Effort-Aware Defect Prediction Models, International Conference on Software Quality, Reliability and Security (QRS2016), pp.214-221, August 2016. (Vienna, Austria)., 2016.08.
50. Kazuhiro Yamashita, Changyun Huang, Meiyappan Nagappan, Yasutaka Kamei, Audris Mockus, Ahmed E. Hassan and Naoyasu Ubayashi, Thresholds for Size and Complexity Metrics: A Case Study from the Perspective of Defect Density, International Conference on Software Quality, Reliability and Security (QRS2016), pp.191-201, August 2016. (Vienna, Austria)., 2016.08.
51. Takashi Watanabe, Akito Monden, Yasutaka Kamei, Shuji Morisaki, Identifying recurring association rules in software defect prediction, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science, ICIS 2016 - Proceedings, 10.1109/ICIS.2016.7550867, 861-866, 2016.08, © 2016 IEEE. Association rule mining discovers patterns of co-occurrences of attributes as association rules in a data set. The derived association rules are expected to be recurrent, that is, the patterns recur in future in other data sets. This paper defines the recurrence of a rule, and aims to find a criteria to distinguish between high recurrent rules and low recurrent ones using a data set for software defect prediction. An experiment with the Eclipse Mylyn defect data set showed that rules of lower than 30 transactions showed low recurrence. We also found that the lower bound of transactions to select high recurrence rules is dependent on the required precision of defect prediction..
52. Kwabena Ebo Bennin, Jacky Keung, Akito Monden, Yasutaka Kamei, Naoyasu Ubayashi, Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models, Proceedings - International Computer Software and Applications Conference, 10.1109/COMPSAC.2016.144, 1, 154-163, 2016.08, © 2016 IEEE. To prioritize software quality assurance efforts, faultprediction models have been proposed to distinguish faulty modules from clean modules. The performances of such models are often biased due to the skewness or class imbalance of the datasets considered. To improve the prediction performance of these models, sampling techniques have been employed to rebalance the distribution of fault-prone and non-fault-prone modules. The effect of these techniques have been evaluated in terms of accuracy/geometric mean/F1-measure in previous studies, however, these measures do not consider the effort needed to fixfaults. To empirically investigate the effect of sampling techniqueson the performance of software fault prediction models in a morerealistic setting, this study employs Norm(Popt), an effort-awaremeasure that considers the testing effort. We performed two setsof experiments aimed at (1) assessing the effects of samplingtechniques on effort-aware models and finding the appropriateclass distribution for training datasets (2) investigating the roleof balanced training and testing datasets on performance ofpredictive models. Of the four sampling techniques applied, the over-sampling techniques outperformed the under-samplingtechniques with Random Over-sampling performing best withrespect to the Norm (Popt) evaluation measure. Also, performanceof all the prediction models improved when sampling techniqueswere applied between the rates of (20-30)% on the trainingdatasets implying that a strictly balanced dataset (50% faultymodules and 50% clean modules) does not result in the bestperformance for effort-aware models. Our results also indicatethat performances of effort-aware models are significantly dependenton the proportions of the two types of the classes in thetesting dataset. Models trained on moderately balanced datasetsare more likely to withstand fluctuations in performance as theclass distribution in the testing data varies..
53. Takashi Watanabe, Akito Monden, Yasutaka Kamei, Shuji Morisaki, Identifying Recurring Association Rules in Software Defect Prediction, International Conference on Computer and Information Science (ICIS2016), pp.1-6, June 2016. (Okayama, Japan)., 2016.06.
54. Kwabena Ebo Bennin, Jacky Keung, Akito Monden, Yasutaka Kamei and Naoyasu Ubayashi, Investigating the Effects of Balanced Training and Testing Data Sets on Effort-Aware Fault Prediction Models, International Conference on Computers, Software and Applications (COMPSAC), 2016.06.
55. Junji Shimagaki, Yasutaka Kamei, Shane Mcintosh, Ahmed E. Hassan and Naoyasu Ubayashi, A Study of the Quality-Impacting Practices of Modern Code Review at Sony Mobile, the International Conference on Software Engineering (ICSE2016) Software Engineering in Practice (SEIP), 2016.05, Nowadays, a flexible, lightweight variant of the code review process (i.e., the practice of having other team members critique software changes) is adopted by open source and pro prietary software projects. While this flexibility is a blessing (e.g., enabling code reviews to span the globe), it does not mandate minimum review quality criteria like the formal code inspections of the past. Recent work shows that lax reviewing can impact the quality of open source systems. In this paper, we investigate the impact that code review- ing practices have on the quality of a proprietary system that is developed by Sony Mobile. We begin by replicating open source analyses of the relationship between software quality (as approximated by post-release defect-proneness) and: (1) code review coverage, i.e., the proportion of code changes that have been reviewed and (2) code review partic ipation, i.e., the degree of reviewer involvement in the code review process. We also perform a qualitative analysis, with a survey of 93 stakeholders, semi-structured interviews with 15 stakeholders, and a follow-up survey of 25 senior engineers. Our results indicate that while past measures of review coverage and participation do not share a relationship with defect-proneness at Sony Mobile, reviewing measures that are aware of the Sony Mobile development context are associated with defect-proneness. Our results have lead to improvements of the Sony Mobile code review process..
56. Masateru Tsunoda, Yasutaka Kamei, Atsushi Sawada, Assessing the differences of clone detection methods used in the fault-prone module prediction, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, 10.1109/SANER.2016.65, 15-16, 2016.05, © 2016 IEEE. We have investigated through several experiments the differences in the fault-prone module prediction accuracy caused by the differences in the constituent code clone metrics of the prediction model. In the previous studies, they use one or more code clone metrics as independent variables to build an accurate prediction model. While they often use the clone detection method proposed by Kamiya et al. to calculate these metrics, the effect of the detection method on the prediction accuracy is not clear. In the experiment, we built prediction models using a dataset collected from an open source software project. The result suggests that the prediction accuracy is improved, when clone metrics derived from the various clone detection tool are used..
57. Junji Shimagaki, Yasutaka Kamei, Shane McIntosh, Ahmed E. Hassan, Naoyasu Ubayashi, A study of the quality-impacting practices of modern code review at Sony mobile, Proceedings - International Conference on Software Engineering, 10.1145/2889160.2889243, 212-221, 2016.05, © 2016 ACM. Nowadays, a flexible, lightweight variant of the code review process (i.e., the practice of having other team members critique software changes) is adopted by open source and proprietary software projects. While this flexibility is a blessing (e.g., enabling code reviews to span the globe), it does not mandate minimum review quality criteria like the formal code inspections of the past. Recent work shows that lax reviewing can impact the quality of open source systems. In this paper, we investigate the impact that code reviewing practices have on the quality of a proprietary system that is developed by Sony Mobile. We begin by replicating open source analyses of the relationship between software quality (as approximated by post-release defect-proneness) and: (1) code review coverage, i.e., the proportion of code changes that have been reviewed and (2) code review participation, i.e., the degree of reviewer involvement in the code review process. We also perform a qualitative analysis, with a survey of 93 stakeholders, semi-structured interviews with 15 stakeholders, and a follow-up survey of 25 senior engineers. Our results indicate that while past measures of review coverage and participation do not share a relationship with defect-proneness at Sony Mobile, reviewing measures that are aware of the Sony Mobile development context are associated with defect-proneness. Our results have lead to improvements of the Sony Mobile code review process..
58. Masateru Tsunoda, Yasutaka Kamei, Atsushi Sawada, Assessing the differences of clone detection methods used in the fault-prone module prediction, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, 10.1109/SANER.2016.65, 15-16, 2016.05, © 2016 IEEE. We have investigated through several experiments the differences in the fault-prone module prediction accuracy caused by the differences in the constituent code clone metrics of the prediction model. In the previous studies, they use one or more code clone metrics as independent variables to build an accurate prediction model. While they often use the clone detection method proposed by Kamiya et al. to calculate these metrics, the effect of the detection method on the prediction accuracy is not clear. In the experiment, we built prediction models using a dataset collected from an open source software project. The result suggests that the prediction accuracy is improved, when clone metrics derived from the various clone detection tool are used..
59. Bodin Chinthanet, Passakorn Phannachitta, Yasutaka Kamei, Pattara Leelaprute, Arnon Rungsawang, Naoyasu Ubayashi and Kenichi Matsumoto, A Review and Comparison of Methods for Determining the Best Analogies in Analogy-based Software Effort Estimation, International Symposium on Applied Computing (SAC 2016) Poster Session, 2016.04.
60. Yasutaka Kamei, Emad Shihab, Defect Prediction: Accomplishments and Future Challenges, Leaders of Tomorrow / Future of Software Engineering Track at International Conference on Software Analysis Evolution and Reengineering (SANER2016), Issue 2, pp.99-104., 2016.03.
61. Kazuhiro Yamashita, Yasutaka Kamei, Shane McIntosh, Ahmed E. Hassan and Naoyasu Ubayashi, Magnet or Sticky? Measuring Project Characteristics from the Perspective of Developer Attraction and Retention, Journal of Information Processing, Vol.24, No.2, pp.339-348, 2016.03.
62. Yasutaka Kamei, Software Quality Assurance 2.0: Proactive, Practical, and Relevant, IEEE SOFTWARE, 33, 2, 102-103, 2016.03.
63. Yasutaka Kamei, Software Quality Assurance 2.0: Proactive, Practical, and Relevant, IEEE SOFTWARE, 33, 2, 102-103, 2016.03.
64. Meiyappan Nagappan, Romain Robbes, Yasutaka Kamei, Eric Tanter, Shane Mcintosh, Audris Mockus, Ahmed E. Hassan, An Empirical Study of goto in C Code from GitHub Repositories, the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE2015), pp.404-414, 2015.09.
65. Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi and Ahmed E. Hassan, Studying Just-In-Time Defect Prediction using Cross-Project Models, Journal of Empirical Software Engineering, Online first (pp.1-35), 2015.09.
66. Kazuhiro Yamashita, Shane McIntosh, Yasutaka Kamei, Ahmed E. Hassan and Naoyasu Ubayashi, Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects, International Workshop on Principles of Software Evolution (IWPSE 2015), pp.46-55, 2015.08.
67. Meiyappan Nagappan, Romain Robbes, Yasutaka Kamei, Éric Tanter, Shane Mcintosh, Audris Mockus, Ahmed E. Hassan, An empirical study of goto in C code from github repositories, 2015 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2015 - Proceedings, 10.1145/2786805.2786834, 404-414, 2015.08, © 2015 ACM. It is nearly 50 years since Dijkstra argued that goto obscures the ow of control in program execution and urged programmers to abandon the goto statement. While past research has shown that goto is still in use, little is known about whether goto is used in the unrestricted manner that Dijkstra feared, and if it is harmful' enough to be a part of a post-release bug. We, therefore, conduct a two part empirical study - (1) qualitatively analyze a statistically representative sample of 384 files from a population of almost 250K C programming language files collected from over 11K GitHub repositories and find that developers use goto in C files for error handling (80:21 ± 5%) and cleaning up resources at the end of a procedure (40:36 ± 5%); and (2) quantitatively analyze the commit history from the release branches of six OSS projects and find that no goto statement was re- moved/modified in the post-release phase of four of the six projects. We conclude that developers limit themselves to using goto appropriately in most cases, and not in an un- restricted manner like Dijkstra feared, thus suggesting that goto does not appear to be harmful in practice..
68. Takuya Fukamachi, Naoyasu Ubayashi, Shintaro Hosoai, Yasutaka Kamei, Poster: Conquering Uncertainty in Java Programming, Proceedings - International Conference on Software Engineering, 10.1109/ICSE.2015.266, 2, 823-824, 2015.08, © 2015 IEEE. Uncertainty in programming is one of the challenging issues to be tackled, because it is error-prone for many programmers to temporally avoid uncertain concerns only using simple language constructs such as comments and conditional statements. This paper proposes ucJava, a new Java programming environment for conquering uncertainty. Our environment provides a modular programming style for uncertainty and supports test-driven development taking uncertainty into consideration..
69. Kazuhiro Yamashita, Shane McIntosh, Yasutaka Kamei, Ahmed E. Hassan, Naoyasu Ubayashi, Revisiting the applicability of the pareto principle to core development teams in open source software projects, International Workshop on Principles of Software Evolution (IWPSE), 10.1145/2804360.2804366, 30-Aug-2015, 46-55, 2015.08, © 2015 ACM. It is often observed that the majority of the development work of an Open Source Software (OSS) project is contributed by a core team, i.e., a small subset of the pool of active developers. In fact, recent work has found that core development teams follow the Pareto principle-roughly 80% of the code contributions are produced by 20% of the active developers. However, those findings are based on samples of between one and nine studied systems. In this paper, we revisit prior studies about core developers using 2,496 projects hosted on GitHub. We find that even when we vary the heuristic for detecting core developers, and when we control for system size, team size, and project age: (1) the Pareto principle does not seem to apply for 40%-87% of GitHub projects; and (2) more than 88% of GitHub projects have fewer than 16 core developers. Moreover, we find that when we control for the quantity of contributions, bug fixing accounts for a similar proportion of the contributions of both core (18%-20%) and non-core developers (21%-22%). Our findings suggest that the Pareto principle is not compatible with the core teams of many GitHub projects. In fact, several of the studied GitHub projects are susceptible to the bus factor, where the impact of a core developer leaving would be quite harmful..
70. Shane Mcintosh, Yasutaka Kamei, Bram Adams and Ahmed E. Hassan, An Empirical Study of the Impact of Modern Code Review Practices on Software Quality, Journal of Empirical Software Engineering, Online first (pp.1-45), 2015.05.
71. Takuya Fukamachi, Naoyasu Ubayashi, Shintaro Hosoai, Yasutaka Kamei, Modularity for Uncertainty, International Workshop on Modeling in Software Engineering (MiSE2015), pp.7-12, 2015.05.
72. Takuya Fukamachi, Naoyasu Ubayashi, Shintaro Hosoai, Yasutaka Kamei, Poster: Conquering Uncertainty in Java Programming, International Conference on Software Engineering (ICSE2015), Poster Session., 2015.05.
73. Ayse Tosun Misirli, Emad Shihab, Yasutaka Kamei, Studying High Impact Fix-Inducing Changes, Journal of Empirical Software Engineering, Online first (pp.1-37), 2015.05.
74. Changyun Huang, Ataru Osaka, Yasutaka Kamei, Naoyasu Ubayashi, Automated DSL Construction Based on Software Product Lines, International Conference on Model-Driven Engineering and Software Development (MODELSWARD2015), Poster Session, 2015.02.
75. A Bug Triaging Method for Reducing the Time to Fix Bugs in Large-scale Open Source Software Development
This paper proposes a bug triaging method to reduce the time to fix bugs in large-scale open source software development. Our method considers the upper limit of tasks which can be fixed by a developer in a certain period. In this paper, we conduct a case study of applying our method to Mozilla Firefox and Eclipse Platform projects and show the following findings: (1) using our method mitigates the situation where the majority of bug-fixing tasks are assigned to particular developers, (2) our method can reduce up to 50%-83% of time to fix bugs compared with the manual bug triaging method and up to 34%-38% compared with the existing method, and (3) the two factors, Preference (adequate for fixing a bug) and Limit (limits of developers' working hours), used in our method have an dispersion effect on the task assignment..
76. Peiyuan Li, Naoyasu Ubayashi, Di Ai, Yu Ning Li, Shintaro Hosoai, Yasutaka Kamei, Sketch-Based Gradual Model-Driven Development, International Workshop on Innovative Software Development Methodologies and Practices (InnoSWDev 2014), pp.100-105, 2014.11.
77. Naoyasu Ubayashi, Di Ai, Peiyuan Li, Yu Ning Li, Shintaro Hosoai, Yasutaka Kamei, Uncertainty-Aware Architectural Interface, International Workshop on Advanced Modularization Techniques (AOAsia/Pacific 2014), pp.4-6, 2014.11.
78. Akinori Ihara, Yasutaka Kamei, Masao Ohira, Ahmed E. Hassan, Naoyasu Ubayashi, Ken Ichi Matsumoto, Early identification of future committers in open source software projects, Proceedings - International Conference on Quality Software, 10.1109/QSIC.2014.30, 47-56, 2014.11, © 2014 IEEE. There exists two types of developers in Open Source Software (OSS) projects: 1) Committers who have permission to commit edited source code to the Version Control System (VCS), 2) Developers who contribute source code but cannot commit to the VCS directly. In order to develop and evolve high quality OSS, projects are always in search of new committers. OSS projects often promote strong developers to become committers. When existing committers find strong developers, they propose their promotion to a committer role. Delaying the committer-promotion might lead to strong developers departing from an OSS project and the project losing them. However early committer-promotion comes with its own slew of risks as well (e.g., the promotion of inexperienced developers). Hence, committer-promotion decisions are critical for the quality and successful evolution of OSS projects. In this paper, we examine the committer-promotion phenomena for two OSS projects (Eclipse and Firefox). We find that the amount of activities by future committers was higher than the amount of activities by developers who did not become committers). We also find that some developers are promoted to a committer role very rapidly (within a few month) while some of developers take over one year to become a committer. Finally, we develop a committer-identification model to assist OSS projects identifying future committers..
79. Akinori Ihara, Yasutaka Kamei, Masao Ohira, Ahmed E. Hassan, Naoyasu Ubayashi and Kenichi Matsumoto, Early Identification of Future Committers in Open Source Software Projects, International Conference on Quality Software (QSIC2014), pp.47-56, 2014.10.
80. Naoyasu Ubayashi, Di Ai, Peiyuan Li, Yu Ning Li, Shintaro Hosoai and Yasutaka Kamei, Abstraction-aware Verifying Compiler for Yet Another MDD, International Conference on Automated Software Engineering (ASE 2014) [new ideas paper track], pp.557-562, 2014.09.
81. On Measuring the Difficulty of Program Comprehension based on Cerebral Blood Flow
In this research, we aim to quantify the difficulty of program comprehension during source code reading. We use Near Infra-Red Spectroscopy(NIRS) to measure the activation of brain. As a result of an experiment with 10 subjects, 8 of them showed a strong activation in the brain during reading of strongly obfuscated programs that are extremely difficult to comprehend. We also normalized the data for each participant and aggregated them for statistical testing. As a result of t-test, significant difference (p
82. Shuhei Ohsako, Yasutaka Kamei, Shintaro Hosoai, Weiqiang Kong, Kimitaka Kato, Akihiko Ishizuka, Kazutoshi Sakaguchi, Miyuki Kawataka, Yoshitsugu Morita, Naoyasu Ubayashi and Akira Fukuda, A Case Study on Introducing the Design Thinking into PBL, International Conference on Frontiers in Education: CS and CE (FECS 2014), 2014.07.
83. Takafumi Fukushima, Yasutaka Kamei, Shane McIntosh, Kazuhiro Yamashita and Naoyasu Ubayashi, An Empirical Study of Just-In-Time Defect Prediction Using Cross-Project Models, International Working Conference on Mining Software Repositories (MSR 2014), pp.172-181, 2014.06.
84. Kazuhiro Yamashita, Shane McIntosh, Yasutaka Kamei and Naoyasu Ubayashi, Magnet or Sticky?: An OSS Project-by-Project Typology, International Working Conference on Mining Software Repositories (MSR 2014), pp.344-347, 2014.06.
85. Takao Nakagawa, Yasutaka Kamei, Hidetake Uwano, Akito Monden, Kenichi Matsumoto and Daniel M. German, Quantifying Programmers' Mental Workload during Program Comprehension Based on Cerebral Blood Flow Measurement: A Controlled Experiment, International Conference on Software Engineering (ICSE2014), NIER Track, pp.448-451, 2014.06.
86. Shane Mcintosh, Yasutaka Kamei, Bram Adams and Ahmed E. Hassan, The Impact of Code Review Coverage and Code Review Participation on Software Quality: A Case Study of the Qt, VTK, and ITK Projects, International Working Conference on Mining Software Repositories (MSR 2014), pp.192-201, 2014.06.
87. Di Ai, Naoyasu Ubayashi, Peiyuan Li, Daisuke Yamamoto, Yu Ning Li, Shintaro Hosoai, Yasutaka Kamei, iArch: An IDE for Supporting Fluid Abstraction, International Conference on Modularity'14, Tool Demo Session, 2014.04.
88. Changyun Huang, Naoyasu Ubayashi and Yasutaka Kamei, Towards Language-Oriented Software Development, International Workshop on Open and Original Problems in Software Language Engineering (OOPSLE 2014), 2014.02.
89. Di Ai, Naoyasu Ubayashi, Peiyuan Li, Shintaro Hosoai and Yasutaka Kamei, iArch - An IDE for Supporting Abstraction-aware Design Traceability, International Conference on Model-Driven Engineering and Software Development (MODELSWARD2014), Poster Session, 2014.01.
90. Emad Shihab, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan, Is Lines of Code a Good Measure of Effort in Effort-Aware Models?, Information and Software Technology, Vol.55, No.11, 2013.11.
91. Emad Shihab, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan, Is lines of code a good measure of effort in effort-aware models?, Information and Software Technology, 10.1016/j.infsof.2013.06.002, 55, 11, 1981-1993, 2013.11, Context Effort-aware models, e.g., effort-aware bug prediction models aim to help practitioners identify and prioritize buggy software locations according to the effort involved with fixing the bugs. Since the effort of current bugs is not yet known and the effort of past bugs is typically not explicitly recorded, effort-aware bug prediction models are forced to use approximations, such as the number of lines of code (LOC) of the predicted files. Objective Although the choice of these approximations is critical for the performance of the prediction models, there is no empirical evidence on whether LOC is actually a good approximation. Therefore, in this paper, we investigate the question: is LOC a good measure of effort for use in effort-aware models? Method We perform an empirical study on four open source projects, for which we obtain explicitly-recorded effort data, and compare the use of LOC to various complexity, size and churn metrics as measures of effort. Results We find that using a combination of complexity, size and churn metrics are a better measure of effort than using LOC alone. Furthermore, we examine the impact of our findings on previous effort-aware bug prediction work and find that using LOC as a measure for effort does not significantly affect the list of files being flagged, however, using LOC under-estimates the amount of effort required compared to our best effort predictor by approximately 66%. Conclusion Studies using effort-aware models should not assume that LOC is a good measure of effort. For the case of effort-aware bug prediction, using LOC provides results that are similar to combining complexity, churn, size and LOC as a proxy for effort when prioritizing the most risky files. However, we find that for the purpose of effort-estimation, using LOC may under-estimate the amount of effort required. © 2013 Elsevier B.V. All rights reserved..
92. Emad Shihab, Akinori Ihara, Yasutaka Kamei, Walid M. Ibrahim, Masao Ohira, Bram Adams, Ahmed E. Hassan and Ken-ichi Matsumoto, Studying Re-opened Bugs in Open Source Software, Journal of Empirical Software Engineering, Vol.18, No.5, pp.1005-1042, 2013.10.
93. Emad Shihab, Akinori Ihara, Yasutaka Kamei, Walid M. Ibrahim, Masao Ohira, Bram Adams, Ahmed E. Hassan, Ken Ichi Matsumoto, Studying re-opened bugs in open source software, Empirical Software Engineering, 10.1007/s10664-012-9228-6, 18, 5, 1005-1042, 2013.10, Bug fixing accounts for a large amount of the software maintenance resources. Generally, bugs are reported, fixed, verified and closed. However, in some cases bugs have to be re-opened. Re-opened bugs increase maintenance costs, degrade the overall user-perceived quality of the software and lead to unnecessary rework by busy practitioners. In this paper, we study and predict re-opened bugs through a case study on three large open source projects - namely Eclipse, Apache and OpenOffice. We structure our study along four dimensions: (1) the work habits dimension (e.g., the weekday on which the bug was initially closed), (2) the bug report dimension (e.g., the component in which the bug was found) (3) the bug fix dimension (e.g., the amount of time it took to perform the initial fix) and (4) the team dimension (e.g., the experience of the bug fixer). We build decision trees using the aforementioned factors that aim to predict re-opened bugs. We perform top node analysis to determine which factors are the most important indicators of whether or not a bug will be re-opened. Our study shows that the comment text and last status of the bug when it is initially closed are the most important factors related to whether or not a bug will be re-opened. Using a combination of these dimensions, we can build explainable prediction models that can achieve a precision between 52.1-78.6 % and a recall in the range of 70.5-94.1 % when predicting whether a bug will be re-opened. We find that the factors that best indicate which bugs might be re-opened vary based on the project. The comment text is the most important factor for the Eclipse and OpenOffice projects, while the last status is the most important one for Apache. These factors should be closely examined in order to reduce maintenance cost due to re-opened bugs. © 2012 Springer Science+Business Media, LLC..
94. Changyun Huang, Yasutaka Kamei, Kazuhiro Yamashita and Naoyasu Ubayashi, Using Alloy to Support Feature-Based DSL Construction for Mining Software Repositories, International Workshop on Model-driven Approaches in Software Product Line Engineering and Workshop on Scalable Modeling Techniques for Software Product Lines (MAPLE/SCALE 2013), 2013.08.
95. Experimental Evaluation of Eleven Fault-Density Models.
96. Masateru Tsunoda, Kyohei Fushida, Yasutaka Kamei, Masahide Nakamura, Kohei Mitsui, Keita Goto, and Ken-ichi Matsumoto, An Authentication Method with Spatiotemporal Interval and Partial Matching, International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2013), 2013.07.
97. Tetsuya Oishi, Weiqiang Kong, Yasutaka Kamei, Norimichi Hiroshige, Naoyasu Ubayashi and Akira Fukuda, An Empirical Study on Remote Lectures Using Video Conferencing Systems, International Conference on Frontiers in Education: CS and CE (FECS 2013), 2013.07.
98. Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha and Naoyasu Ubayashi, A Large-Scale Empirical Study of Just-In-Time Quality Assurance, IEEE Transactions on Software Engineering, Vol.39, No.6, pp.757-773, 2013.06.
99. Naoyasu Ubayashi and Yasutaka Kamei, Design Module: A Modularity Vision Beyond Code -Not Only Program Code But Also a Design Model Is a Module-, International Workshop on Modeling in Software Engineering (MiSE2013), 2013.05.
100. Changyun Huang, Kazuhiro Yamashita, Yasutaka Kamei, Kenji Hisazumi and Naoyasu Ubayashi, Domain Analysis for Mining Software Repositories -Towards Feature-based DSL Construction-, International Workshop on Product LinE Approaches in Software Engineering (PLEASE 2013), 2013.05.
101. Masateru Tsunoda, Koji Toda, Kyohei Fushida, Yasutaka Kamei, Meiyappan Nagappan and Naoyasu Ubayashi, Revisiting Software Development Effort Estimation Based on Early Phase Development Activities, International Working Conference on Mining Software Repositories (MSR 2013), 2013.05.
102. Tetsuya Oishi, Yasutaka Kamei, Weiqiang Kong, Norimichi Hiroshige, Naoyasu Ubayashi, Akira Fukuda, An Experience Report on Remote Lecture Using Multi-point Control Unit, International Conference on Education and Teaching (ICET 2013), pp.1-8, 2013.03.
103. Naoyasu Ubayashi and Yasutaka Kamei, UML-based Design and Verification Method for Developing Dependable Context-Aware Systems, International Conference on Model-Driven Engineering and Software Development (MODELSWARD 2013), pp.89-94, 2013.02.
104. Akito Monden, Jacky Keung, Shuji Morisaki, Yasutaka Kamei and Kenichi Matsumoto, A Heuristic Rule Reduction Approach to Software Fault-proneness Prediction, Asia-Pacific Software Engineering Conference (APSEC 2012), pp.838-847, 2012.12.
105. Akinori Ihara, Yasutaka Kamei, Akito Monden, Masao Ohira, Jacky Keung, Naoyasu Ubayashi and Kenichi Matsumoto, An Investigation on Software Bug Fix Prediction for Open Source Software Projects -A Case Study on the Eclipse Project-, International Workshop on Software Analysis, Testing and Applications (SATA2012), pp.112-119, 2012.12.
106. Phiradet Bangcharoensap, Akinori Ihara, Yasutaka Kamei, Ken-ichi Matsumoto, Locating Source Code to be Fixed based on Initial Bug Reports -A Case Study on the Eclipse Project, International Workshop on Empirical Software Engineering in Practice (IWESEP2012), pp.10-15, 2012.10.
107. Hiroki Nakamura, Rina Nagano, Kenji Hisazumi, Yasutaka Kamei, Naoyasu Ubayashi and Akira Fukuda, QORAL : External Domain-Specific Language for Mining Software Repositories., International Workshop on Empirical Software Engineering in Practice (IWESEP2012), pp.23-29, 2012.10.
108. Naoyasu Ubayashi and Yasutaka Kamei, UML4COP: UML-based DSML for Context-Aware Systems, International Workshop on Domain-Specific Modeling (DSM2012), pp.33-38, 2012.10.
109. Rina Nagano, Hiroki Nakamura, Yasutaka Kamei, Bram Adams, Kenji Hisazumi, Naoyasu Ubayashi and Akira Fukuda, Using the GPGPU for Scaling Up Mining Software Repositories, International Conference on Software Engineering (ICSE2012), Poster Session, pp.1435-1436, 2012.06.
110. Naoyasu Ubayashi, Yasutaka Kamei, Verifiable Architectural Interface for Supporting Model-Driven Development with Adequate Abstraction Level, International Workshop on Modeling in Software Engineering (MiSE2012), pp.15-21, 2012.06.
111. Naoyasu Ubayashi, Yasutaka Kamei, An Extensible Aspect-oriented Modeling Environment for Constructing Domain-Specific Languages, IEICE Transactions on Information and Systems, Vol.E95-D No.4 pp.942-958., 2012.04.
112. Naoyasu Ubayashi, Yasutaka Kamei, An extensible aspect-oriented modeling environment for constructing domain-specific languages, IEICE Transactions on Information and Systems, 10.1587/transinf.E95.D.942, E95-D, 4, 942-958, 2012.04, AspectM, an aspect oriented modeling (AOM) language, provides not only basic modeling constructs but also an extension mechanism called metamodel access protocol (MMAP) that allows a modeler to modify the metamodel. MMAP consists of metamodel extension points, extension operations, and primitive predicates for navigating the metamodel. Although the notion of MMAP is useful, it needs tool support. This paper proposes a method for implementing a MMAP based AspectM support tool. It consists of model editor, model weaver, and model verifier. We introduce the notion of edit-time structural reflection and extensible model weaving. Using these mechanisms, a modeler can easily construct domain-specific languages (DSLs). We show a case study using the AspectM support tool and discuss the effectiveness of the extension mechanism provided by MMAP. As a case study, we show a UML based DSL for describing the external contexts of embedded systems. Copyright © 2012 The Institute of Electronics, Information and Communication Engineers..
113. Naoyasu Ubayashi and Yasutaka Kamei, Architectural Point Mapping for Design Traceability, Foundations of Aspect-Oriented Languages workshop (FOAL2012), pp.39-44, 2012.03.
114. Fault-prone Module Prediction Across Software Development Projects : Lessons Learned from 18 Projects.
115. A Candidate Committer Prediction Based on Developer Activities in Open Source Software Projects.
116. dcNavi: A Concern-oriented Recommendation System for Debugging Support
Programmers tend to spend a lot of time debugging code. They check the erroneous phenomena, navigate the code, search the past bug fixes, and modify the code. If a sequence of these debug activities can be automated, programmers can use their time for more creative tasks. To deal with this problem, we propose dcNavi (Debug Concern Navigator), a concern-oriented recommendation system for debugging. The dcNavi provides appropriate hints to programmers according to their debug concerns by mining a repository containing not only program information but also test results and program modification history. In this paper, we evaluate the effectiveness of our approach in terms of the reusability of past bug fixes by using nine open source repositories created in the Eclipse plug-in projects..
117. Hidetake Uwano, Yasutaka Kamei, Akito Monden, Ken-Ichi Matsumoto, An Analysis of Cost-overrun Projects using Financial Data and Software Metrics, The Joint Conference of the 21th International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement (IWSM/MENSURA2011), pp.227-232, 2011.11.
118. Yasutaka Kamei, Hiroki Sato, Akito Monden, Shinji Kawaguchi, Hidetake Uwano, Masataka Nagura, Ken-Ichi Matsumoto, Naoyasu Ubayashi, An Empirical Study of Fault Prediction with Code Clone Metrics, The Joint Conference of the 21th International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement (IWSM/MENSURA2011), pp.55-61, 2011.11.
119. Ryosuke Nakashiro, Yasutaka Kamei, Naoyasu Ubayashi, Shin Nakajima, Akihito Iwai, Translation Pattern of BPEL Process into Promela Code, The Joint Conference of the 21th International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement (IWSM/MENSURA2011), pp.285-290, 2011.11.
120. Emad Shihab, Audris Mockus, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan,, High-Impact Defects: A Study of Breakage and Surprise Defects, the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE2011), pp.300-310, 2011.09.
121. Naoyasu Ubayashi, Yasutaka Kamei, Masayuki Hirayama, Tetsuo Tamai, Context Analysis Method for Embedded Systems ---Exploring a Requirement Boundary between a System and Its Context, 3rd Workshop on Context-Oriented Programming (COP 2011), pp.143-152, 2011.08.
122. Shuji Morisaki, Yasutaka Kamei, and Ken-ichi Matsumoto, Experimental Evaluation of Effect of Specifying a Focused Defect Classification in Software Inspection, JSSST Journal, Vol.28, No.3, pp.173-178, 2011.08.
123. Shizuka Uchio, Naoyasu Ubayashi, Yasutaka Kamei, CJAdviser: SMT-based Debugging Support for ContextJ*, 3rd Workshop on Context-Oriented Programming (COP 2011), pp.1-6, 2011.07.
124. Masaru Shiozuka, Naoyasu Ubayashi, Yasutaka Kamei, Debug Concern Navigator, the 23rd International Conference on Software Engineering and Knowledge Engineering (SEKE 2011), pp.197-202, 2011.07.
125. Naoyasu Ubayashi, Yasutaka Kamei, Stepwise Context Boundary Exploration Using Guide Words, the 23rd International Conference on Advanced Information Systems Engineering (CAiSE 2011 Forum), pp.131-138., 2011.06.
126. Shane McIntosh, Bram Adams, Thanh H. D. Nguyen, Yasutaka Kamei and Ahmed E. Hassan, An Empirical Study of Build Maintenance Effort, the 33rd International Conference on Software Engineering (ICSE2011), pp.141-150, 2011.05.
127. Analyzing Software Reliability Based on Developer Metrics.
128. Analyzing Software Reliability Based on Developer Metrics.
129. A Replicated Experiment to Fault-Prone Module Detection with Clone Metrics.
130. A Replicated Experiment to Fault-Prone Module Detection with Clone Metrics.
131. A Code Review Technique to Reduce Fix Assurance Test Size
In testing phases of software development projects, detected defects are fixed by modifying artifact including source code. Most of the detected defects require both test cases to confirm that the modification is correct and the modification does not cause new defects. In this paper, we propose a code reading technique in order to reduce size of such testing. The proposed method preferentially detects defects that potentially require larger size of testing by giving information that helps reviewer to estimate size of testing. In an evaluation experiment, the proposed technique reduces size of testing 2.1 times compared with test case based reading and 1.9 times compared with ad-hoc reading among 18 subjects including 6 commercial software developers..
132. Modeling of Test Effort Allocation and Software Reliability in Fault-prone Module Detection
Various fault-prone detection models have been proposed to improve software reliability. However, while improvement of prediction accuracy was discussed, there was few discussion about how the models shuld be used in the field, i.e. how test effort should be allocated. Thus, improvement of software reliability by fault-prone module detection was not clear. In this paper, we proposed TEAR (Test Effort Allocation and software Reliability) model that represents the relationship among fault-prone detection, test effort allocation and software reliability. The result of simulations based on TEAR model showed that greater test effort should be allocated for fault-prone modules when prediction accuracy was high and/or when the number of faulty modules were small. On the other hand, fault-prone module detection should not be use when prediction accuracy was small or the number of faulty modules were large..
133. Modeling of Test Effort Allocation and Software Reliability in Fault-prone Module Detection
Various fault-prone detection models have been proposed to improve software reliability. However, while improvement of prediction accuracy was discussed, there was few discussion about how the models shuld be used in the field, i.e. how test effort should be allocated. Thus, improvement of software reliability by fault-prone module detection was not clear. In this paper, we proposed TEAR (Test Effort Allocation and software Reliability) model that represents the relationship among fault-prone detection, test effort allocation and software reliability. The result of simulations based on TEAR model showed that greater test effort should be allocated for fault-prone modules when prediction accuracy was high and/or when the number of faulty modules were small. On the other hand, fault-prone module detection should not be use when prediction accuracy was small or the number of faulty modules were large..
134. The Effect of Collaborative Filtering on Software Component Recommendation
To clarify the effect of collaborative filtering (CF) on recommending highgenerality / low-generality software components, we experimentally verified two hypotheses; (1) the recommendation accuracy of CF for high-generality components is better than that of conventional methods (random algorithm and user average algorithm) and (2) the recommendation accuracy of CF for lowgenerality components is better than that of the conventional methods. We evaluated recommendation accuracy of CF with a dataset containing 29 open source software development projects (including 2,558 used components). As a result, the hypothesis (2) was supported, and the recommendation accuracy of CF showed better performance than the conventional methods and the median of NDPM was improved from 0.55 to 0.33 for low-generality components..