九州大学 研究者情報
論文一覧
亀井 靖高(かめい やすたか) データ更新日:2024.03.12

教授 /  システム情報科学研究院 情報知能工学部門 高度ソフトウェア工学


原著論文
1. Dong Wang, Masanari Kondo, Yasutaka Kamei, Raula Gaikovina Kula, Naoyasu Ubayashi, When conversations turn into work: a taxonomy of converted discussions and issues in GitHub., Empirical Software Engineering, 10.1007/s10664-023-10366-z, 28, 6, 138-138, 2023.11.
2. Dong Wang, Tao Xiao, Teyon Son, Raula Gaikovina Kula, Takashi Ishio, Yasutaka Kamei, Kenichi Matsumoto, More than React: Investigating the Role of Emoji Reaction in GitHub Pull Requests., Empirical Software Engineering, 10.1007/s10664-023-10336-5, 28, 5, 123-123, 2023.10.
3. Olivier Nourry, Yutaro Kashiwa, Bin Lin, Gabriele Bavota, Michele Lanza, Yasutaka Kamei, The Human Side of Fuzzing: Challenges Faced by Developers During Fuzzing Activities, ACM Transactions on Software Engineering and Methodology, 10.1145/3611668, 2023.08, Fuzz testing, also known as fuzzing, is a software testing technique aimed at identifying software vulnerabilities. In recent decades, fuzzing has gained increasing popularity in the research community. However, existing studies led by fuzzing experts mainly focus on improving the coverage and performance of fuzzing techniques. That is, there is still a gap in empirical knowledge regarding fuzzing, especially about the challenges developers face when they adopt fuzzing. Understanding these challenges can provide valuable insights to both practitioners and researchers on how to further improve fuzzing processes and techniques. We conducted a study to understand the challenges encountered by developers during fuzzing. More specifically, we first manually analyzed 829 randomly sampled fuzzing-related GitHub issues and constructed a taxonomy consisting of 39 types of challenges (22 related to the fuzzing process itself, 17 related to using external fuzzing providers). We then surveyed 106 fuzzing practitioners to verify the validity of our taxonomy and collected feedback on how the fuzzing process can be improved. Our taxonomy, accompanied with representative examples and highlighted implications, can serve as a reference point on how to better adopt fuzzing techniques for practitioners, and indicates potential directions researchers can work on toward better fuzzing approaches and practices..
4. 松田 雄河, 山手 響介, 近藤 将成, 柏 祐太郎, 亀井 靖高, 鵜林 尚靖, 実行経路を考慮した自動テストケース生成が自動プログラム修正に与える影響の分析, コンピュータ ソフトウェア, 10.11309/jssst.40.1_45, 40, 1, 1_45-1_56, 2023.01, テストスイートベースの自動プログラム修正に,自動テストケース生成によって生成されたテストスイートが有用であれば,パッチ生成のコスト削減につながる.自動テストケース生成には,入力として与えられたクラスに対するテストスイートを生成する手法がある.本研究の目的は,自動プログラム修正に自動テストケース生成を利用する際に,入力としてどのクラスを与えるべきかを明らかにすることである.そのため,本研究では,失敗テストスイートと実際に修正されたクラスの関係を調査した.調査の結果,失敗テストスイートのテスト対象クラスと,開発者による修正クラスが一致していない場合,その原因は,失敗テストケースの実行経路に,修正クラスが含まれることであると確認された.また調査によって得られた考察に基づき自動生成したテストスイートが,自動プログラム修正の結果に与える影響に関して調査を行った.失敗テストケースの実行経路に含まれるクラスを考慮し自動生成したテストスイートを自動プログラム修正に用いることで,パッチの生成数は減少するが,正しい修正は増加する場合があることがわかった..
5. 秋山 楽登, 中村 司, 近藤 将成, 亀井 靖高, 鵜林 尚靖, プログラミング初学者のバグ修正履歴を用いたデバッグ問題自動生成の事例研究, コンピュータ ソフトウェア, 10.11309/jssst.39.4_10, 39, 4, 4_10-4_16, 2022.10, プログラミング初学者のためのデバッグ支援に関する研究は近年盛んに行われている.しかし,初学者のバグの傾向を捉えたデバッグの演習問題を提供することによる学習支援は研究されていない.そこで,本研究では,そのような演習問題の生成を目指す.その方法として,実際に開発者が作成したバグ修正前後のソースコードから埋め込まれているバグを機械翻訳技術の応用により学習し,バグを生成するLearning-Mutationという手法に着目した.九州大学のプログラミング初学者らのデータに対してLearning-Mutationを適用し,生成されたバグと実際のバグを比較することで,デバッグ演習問題の作成に繋げげられるかを評価した.その結果,トークン数が少ないとき,生成されるバグは実際のバグに類似しており,セミコロン忘れや変数・関数の未宣言が36%以上を占めていた.一方,トークン数が多くなると実際とは異なるバグを埋め込む可能性が高まることがわかった.また,ビームサーチのビーム幅を増やすことで実際の初学者のバグの分布に近づけることができた..
6. Olivier Nourry, Yutaro Kashiwa, Bin Lin, Gabriele Bavota, Michele Lanza, Yasutaka Kamei, AIP: Scalable and Reproducible Execution Traces in Energy Studies on Mobile Devices, 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), 10.1109/icsme55016.2022.00057, 2022.10.
7. Yutaro Kashiwa, Ryoma Nishikawa, Yasutaka Kamei, Masanari Kondo, Emad Shihab, Ryosuke Sato, Naoyasu Ubayashi, An empirical study on self-admitted technical debt in modern code review., Inf. Softw. Technol., 10.1016/j.infsof.2022.106855, 146, 106855-106855, 2022.06.
8. Hiroki Kuramoto, Masanari Kondo, Yutaro Kashiwa, Yuta Ishimoto, Kaze Shindo, Yasutaka Kamei, Naoyasu Ubayashi, Do visual issue reports help developers fix bugs?, Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, 10.1145/3524610.3527882, 2022.05.
9. 新堂 風, 近藤 将成, 柏 祐太郎, 東 英明, 柗本 真佑, 亀井 靖高, 鵜林 尚靖, コンテナ仮想化技術におけるSATDの削除に関する調査, 情報処理学会論文誌, 10.20729/00217598, 63, 4, 949-959, 2022.04, Self-Admitted Technical Debt(SATD)とは,コード中に存在するバグや解消すべき課題のことであり,その中でも開発者が課題を認識したうえで,コードに埋め込んだものを指す.SATDの調査は,ソフトウェアの品質向上につながることから,SATDの追加や削除について様々な研究が行われている.他方,近年ソフトウェアのクラウド化にともない,コンテナ仮想化技術の1つであるDockerが注目されている.Dockerにおいても,従来のSATD研究で調査対象とされてきた一般的なプログラミング言語と同様に,SATDの存在が報告されている.しかし,DockerにおけるSATDの削除についての調査はまだ行われていない.SATD解決実態の把握により,SATD解決パターンの獲得や解決案の提示といった応用が期待できる.そこで本研究では,Docker Hubの人気上位250イメージを対象に,Dockerfileに含まれるSATDの削除の性質理解のための調査を行う.調査の結果,Dockerfile内のSATDのうち40.7%の負債が解決されており,存在期間は中央値が67日,平均値が166日であった.また,Dockerfile自体のレビューを求めるSATDや,外部システムに起因するSATDが多いことを明らかにした.外部システムに起因するSATDの早期解決を開発者に対して促すため,外部システムの変更を検知し,更新時に開発者に通知を行うツールを作成した.
Developers occasionally leave bugs and issues that need to be resolved in source codes on purpose because of several reasons such as lack of development effort. Self-Admitted Technical Debt (SATD) refers to such bugs and issues. As it is pivotal to reduce SATDs from source codes, researchers have been investigating the additions and deletions of SATDs so far. A prior study found SATDs on Docker, one of the container virtualization technologies attracting attention in recent years. Although Docker is an important technology because of the shift to cloud computing, no prior work studies the deletions of SATD on Docker. Hence, in this study, we aim at revealing the process of deletions for SATDs on Docker to support the development with Docker. We investigate the characteristics of the deletions of SATDs on Dockerfiles, a text file to build an image for a container, in the top 250 most popular repositories of Docker Hub. We found that about 40.7% of the SATDs in the Dockerfiles are resolved within 67 days on median and 166 days on average. We also found that many SATDs exist that request a review of the Dockerfile itself and SATDs caused by external systems. In order to encourage developers to resolve SATDs caused by external systems as fast as possible, we created a tool that detects changes in external systems and notifies developers when such changes resolve the cause of SATDs..
10. 沖野 健太郎, 松尾 春紀, 山本 大貴, 近藤 将成, 亀井 靖高, 鵜林 尚靖, 木編集距離に着目した類似解答ソースコード検索器における深層学習モデルの性能評価, 情報処理学会論文誌, 10.20729/00217602, 63, 4, 986-998, 2022.04, 近年のIT社会の発展によってIT人材の不足が深刻になり,プログラム自動生成を含むソフトウェア開発の自動化が求められている.多くの研究が行われているなかで,プログラム自動生成をより実用的なものとするために,自動生成の過程でソースコード検索器を使用している研究がある.その研究では,求めるソースコードに木構造が近いと推測される類似解答ソースコードを検索し,自動生成の雛形としている.この手法を用いることで,プログラミングコンテストAtCoderの解答ソースコードの自動生成において,検索を行わない場合と比較して自動生成できた件数が増加したと報告されている.本研究では,木編集距離を学習に用いたソースコード検索器に着目した.ソースコード検索器の性能に影響を与える要因を実証的に調査することで,プログラム自動生成の精度向上への知見を得ることを目指す.調査では,検索精度に影響を与える要因として,深層学習モデルの構造,ソースコードの入力形式,問題の複雑度の3つを対象とし,AtCoderの問題を使用して検索精度の比較を行った.調査の結果,類似解答ソースコード検索においてTransformerのエンコーダ部分の使用は有効であることが期待できること,AtCoderの問題に対して抽象構文木のベクトル表現の使用は有効であるとはいえないこと,問題の複雑度は検索精度に影響を与えることを示した.
Automatic program generation is an active research topic in software engineering. To make automatic program generation more practical, a prior study applies source code search to the method of automatic program generation. In that study, source codes whose tree structures may be close to the desired source code developers require are searched and used as a template for the method. They reported that the method with source code search increases the number of generated source codes compared to the method without source code search. In this study, we use source code search using the tree edit distance. By empirically investigating the factors that affect the performance of source code search, we aim to improve the accuracy of automatic program generation. We focused on three factors that affect the search accuracy: the structure of the deep learning model, the input format of the source code, and the complexity of the problem. We compared the search accuracy on the AtCoder problems. We found that the encoder part of Transformer is promising for source code search, the use of vectorized representation of abstract syntax trees is less effective for the AtCoder problems, and the complexity of the problem affects the search accuracy..
11. Hideaki Azuma, Shinsuke Matsumoto, Yasutaka Kamei, Shinji Kusumoto, An empirical study on self-admitted technical debt in Dockerfiles., Empirical Software Engineering, 10.1007/s10664-021-10081-7, 27, 2, 49-49, 2022.01.
12. Olivier Nourry, Yutaro Kashiwa, Yasutaka Kamei, Naoyasu Ubayashi, Does shortening the release cycle affect refactoring activities: A case study of the JDT Core, Platform SWT, and UI projects., Information & Software Technology, 10.1016/j.infsof.2021.106623, 139, 106623-106623, 2021.11.
13. Jiayuan Zhou, Shaowei Wang 0002, Yasutaka Kamei, Ahmed E. Hassan, Naoyasu Ubayashi, Studying donations and their expenses in open source projects: a case study of GitHub projects collecting donations through open collectives., Empirical Software Engineering, 10.1007/s10664-021-10060-y, 27, 1, 24-24, 2021.11.
14. Yutaro Kashiwa, Kazuki Shimizu, Bin Lin 0008, Gabriele Bavota, Michele Lanza, Yasutaka Kamei, Naoyasu Ubayashi, Does Refactoring Break Tests and to What Extent?, ICSME, 10.1109/ICSME52107.2021.00022, 171-182, 2021.10.
15. Hassan Atwi, Bin Lin 0008, Nikolaos Tsantalis, Yutaro Kashiwa, Yasutaka Kamei, Naoyasu Ubayashi, Gabriele Bavota, Michele Lanza, PYREF: Refactoring Detection in Python Projects., SCAM, 10.1109/SCAM52516.2021.00025, 136-141, 2021.09.
16. Sophia Quach, Maxime Lamothe, Yasutaka Kamei, Weiyi Shang, An empirical study on the use of SZZ for identifying inducing changes of non-functional bugs., Empirical Software Engineering, 10.1007/s10664-021-09970-8, 26, 4, 71-71, 2021.07.
17. Sophia Quach, Maxime Lamothe, Bram Adams, Yasutaka Kamei, Weiyi Shang, Evaluating the impact of falsely detected performance bug-inducing changes in JIT models., Empirical Software Engineering, 10.1007/s10664-021-10004-6, 26, 5, 97-97, 2021.07.
18. Gopi Krishnan Rajbahadur, Shaowei Wang 0002, Yasutaka Kamei, Ahmed E. Hassan, Impact of Discretization Noise of the Dependent Variable on Machine Learning Classifiers in Software Engineering., IEEE Transactions on Software Engineering, 10.1109/TSE.2019.2924371, 47, 7, 1414-1430, 2021.07, IEEE Researchers usually discretize a continuous dependent variable into two target classes by introducing an artificial discretization threshold (e.g., median). However, such discretization may introduce noise (i.e., discretization noise) due to ambiguous class loyalty of data points that are close to the artificial threshold. Previous studies do not provide a clear directive on the impact of discretization noise on the classifiers and how to handle such noise. In this paper, we propose a framework to help researchers and practitioners systematically estimate the impact of discretization noise on classifiers in terms of its impact on various performance measures and the interpretation of classifiers. Through a case study of 7 software engineering datasets, we find that: 1) discretization noise affects the different performance measures of a classifier differently for different datasets; 2) Though the interpretation of the classifiers are impacted by the discretization noise on the whole, the top 3 most important features are not affected by the discretization noise. Therefore, we suggest that practitioners and researchers use our framework to understand the impact of discretization noise on the performance of their built classifiers and estimate the exact amount of discretization noise to be discarded from the dataset to avoid the negative impact of such noise..
19. 亀井靖高, 清水一輝, 柏祐太郎, 佐藤亮介, 鵜林尚靖, READMEファイルの進化に関する実証的分析, 情報処理学会論文誌ジャーナル(Web), 62, 4, 2021.04.
20. Jeongju Sohn, Yasutaka Kamei, Shane McIntosh, Shin Yoo, Leveraging Fault Localisation to Enhance Defect Prediction. , 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2021.03.
21. Jeongju Sohn, Yasutaka Kamei, Shane McIntosh, Shin Yoo, Leveraging Fault Localisation to Enhance Defect Prediction., 28th IEEE International Conference on Software Analysis, Evolution and Reengineering(SANER), 10.1109/SANER50967.2021.00034, 284-294, 2021.03.
22. Gopi Krishnan Rajbahadur, Shaowei Wang, Gustavo Ansaldi, Yasutaka Kamei, Ahmed E. Hassan, The impact of feature importance methods on the interpretation of defect classifiers, IEEE Transactions on Software Engineering, 10.1109/TSE.2021.3056941, 2021.02, [URL], Classifier specific (CS) and classifier agnostic (CA) feature importance methods are widely used (often interchangeably) by prior studies to derive feature importance ranks from a defect classifier. However, different feature importance methods are likely to compute different feature importance ranks even for the same dataset and classifier. Hence such interchangeable use of feature importance methods can lead to conclusion instabilities unless there is a strong agreement among different methods. Therefore, in this paper, we evaluate the agreement between the feature importance ranks associated with the studied classifiers through a case study of 18 software projects and six commonly used classifiers. We find that: 1) The computed feature importance ranks by CA and CS methods do not always strongly agree with each other. 2) The computed feature importance ranks by the studied CA methods exhibit a strong agreement including the features reported at top-1 and top-3 ranks for a given dataset and classifier, while even the commonly used CS methods yield vastly different feature importance ranks. Such findings raise concerns about the stability of conclusions across replicated studies. We further observe that the commonly used defect datasets are rife with feature interactions and these feature interactions impact the computed feature importance ranks of the CS methods (not the CA methods). We demonstrate that removing these feature interactions, even with simple methods like CFS improves agreement between the computed feature importance ranks of CA and CS methods. In light of our findings, we provide guidelines for stakeholders and practitioners when performing model interpretation and directions for future research, e.g., future research is needed to investigate the impact of advanced feature interaction removal methods on computed feature importance ranks of different CS methods..
23. Gopi Krishnan Rajbahadur, Shaowei Wang, Gustavo Ansaldi, Yasutaka Kamei, Ahmed E. Hassan, The impact of feature importance methods on the interpretation of defect classifiers, IEEE Transactions on Software Engineering, 10.1109/TSE.2021.3056941, 2021.02, Classifier specific (CS) and classifier agnostic (CA) feature importance methods are widely used (often interchangeably) by prior studies to derive feature importance ranks from a defect classifier. However, different feature importance methods are likely to compute different feature importance ranks even for the same dataset and classifier. Hence such interchangeable use of feature importance methods can lead to conclusion instabilities unless there is a strong agreement among different methods. Therefore, in this paper, we evaluate the agreement between the feature importance ranks associated with the studied classifiers through a case study of 18 software projects and six commonly used classifiers. We find that: 1) The computed feature importance ranks by CA and CS methods do not always strongly agree with each other. 2) The computed feature importance ranks by the studied CA methods exhibit a strong agreement including the features reported at top-1 and top-3 ranks for a given dataset and classifier, while even the commonly used CS methods yield vastly different feature importance ranks. Such findings raise concerns about the stability of conclusions across replicated studies. We further observe that the commonly used defect datasets are rife with feature interactions and these feature interactions impact the computed feature importance ranks of the CS methods (not the CA methods). We demonstrate that removing these feature interactions, even with simple methods like CFS improves agreement between the computed feature importance ranks of CA and CS methods. In light of our findings, we provide guidelines for stakeholders and practitioners when performing model interpretation and directions for future research, e.g., future research is needed to investigate the impact of advanced feature interaction removal methods on computed feature importance ranks of different CS methods..
24. Ryujiro Nishinaka, Naoyasu Ubayashi, Yasutaka Kamei, Ryosuke Sato, How Fast and Effectively Can Code Change History Enrich Stack Overflow? , Proceedings - IEEE International Conference on Software Quality, Reliability and Security, QRS 2020, 10.1109/QRS51102.2020.00066, 467-478, 2020.12, [URL].
25. Yasutaka Kamei, Andy Zaidman, Guest editorial
Mining software repositories 2018, Empirical Software Engineering, 10.1007/s10664-020-09817-8, 25, 3, 2055-2057, 2020.05, [URL].
26. 村岡 北斗, 鵜林 尚靖, 亀井 靖高, 佐藤 亮介, Revertに着目した不確かさに関する実証的分析, 情報処理学会論文誌, 2020.04.
27. Masanari Kondo, Cor-Paul Bezemer, Yasutaka Kamei, Ahmed E. Hassan, Osamu Mizuno, The impact of feature reduction techniques on defect prediction models., Empirical Software Engineering, 10.1007/s10664-018-9679-5, 24, 4, 1925-1963, 2019.08, © 2019, Springer Science+Business Media, LLC, part of Springer Nature. Defect prediction is an important task for preserving software quality. Most prior work on defect prediction uses software features, such as the number of lines of code, to predict whether a file or commit will be defective in the future. There are several reasons to keep the number of features that are used in a defect prediction model small. For example, using a small number of features avoids the problem of multicollinearity and the so-called ‘curse of dimensionality’. Feature selection and reduction techniques can help to reduce the number of features in a model. Feature selection techniques reduce the number of features in a model by selecting the most important ones, while feature reduction techniques reduce the number of features by creating new, combined features from the original features. Several recent studies have investigated the impact of feature selection techniques on defect prediction. However, there do not exist large-scale studies in which the impact of multiple feature reduction techniques on defect prediction is investigated. In this paper, we study the impact of eight feature reduction techniques on the performance and the variance in performance of five supervised learning and five unsupervised defect prediction models. In addition, we compare the impact of the studied feature reduction techniques with the impact of the two best-performing feature selection techniques (according to prior work). The following findings are the highlights of our study: (1) The studied correlation and consistency-based feature selection techniques result in the best-performing supervised defect prediction models, while feature reduction techniques using neural network-based techniques (restricted Boltzmann machine and autoencoder) result in the best-performing unsupervised defect prediction models. In both cases, the defect prediction models that use the selected/generated features perform better than those that use the original features (in terms of AUC and performance variance). (2) Neural network-based feature reduction techniques generate features that have a small variance across both supervised and unsupervised defect prediction models. Hence, we recommend that practitioners who do not wish to choose a best-performing defect prediction model for their data use a neural network-based feature reduction technique..
28. Naoyasu Ubayashi, Yasutaka Kamei, Ryosuke Sato, When and Why Do Software Developers Face Uncertainty?, 19th IEEE International Conference on Software Quality, Reliability and Security, QRS 2019
Proceedings - 19th IEEE International Conference on Software Quality, Reliability and Security, QRS 2019
, 10.1109/QRS.2019.00045, 288-299, 2019.07, [URL], Recently, many developers begin to notice that uncertainty is a crucial problem in software development. Unfortunately, no one knows how often uncertainty appears or what kinds of uncertainty exist in actual projects, because there are no empirical studies on uncertainty. To deal with this problem, we conduct a large-scale empirical study analyzing commit messages and revision histories of 1,444 OSS projects randomly selected from the GitHub repositories. The main findings are as follows: 1) Uncertainty exists in the ratio of 1.44% (average); 2) Uncertain program behavior, uncertain variable/value/name, and uncertain program defects are major kinds of uncertainty; and 3) Sometimes developers tend to take an action for not resolving but escaping or ignoring uncertainty. Uncertainty exists everywhere in a certain percentage and developers cannot ignore the existence of uncertainty..
29. Naoyasu Ubayashi, Yasutaka Kamei, Ryosuke Sato, When and Why Do Software Developers Face Uncertainty?, Proceedings - 19th IEEE International Conference on Software Quality, Reliability and Security, QRS 2019, 10.1109/QRS.2019.00045, 288-299, 2019.07, © 2019 IEEE. Recently, many developers begin to notice that uncertainty is a crucial problem in software development. Unfortunately, no one knows how often uncertainty appears or what kinds of uncertainty exist in actual projects, because there are no empirical studies on uncertainty. To deal with this problem, we conduct a large-scale empirical study analyzing commit messages and revision histories of 1,444 OSS projects randomly selected from the GitHub repositories. The main findings are as follows: 1) Uncertainty exists in the ratio of 1.44% (average); 2) Uncertain program behavior, uncertain variable/value/name, and uncertain program defects are major kinds of uncertainty; and 3) Sometimes developers tend to take an action for not resolving but escaping or ignoring uncertainty. Uncertainty exists everywhere in a certain percentage and developers cannot ignore the existence of uncertainty..
30. Giancarlo Sierra, Emad Shihab, Yasutaka Kamei, A survey of self-admitted technical debt, Journal of Systems and Software, 10.1016/j.jss.2019.02.056, 152, 70-82, 2019.06, [URL], Technical Debt is a metaphor used to express sub-optimal source code implementations that are introduced for short-term benefits that often need to be paid back later, at an increased cost. In recent years, various empirical studies have focused on investigating source code comments that indicate Technical Debt often referred to as Self-Admitted Technical Debt (SATD). Since the introduction of SATD as a concept, an increasing number of studies have examined various aspects pertaining to SATD. Therefore, in this paper we survey research work on SATD, analyzing the characteristics of current approaches and techniques for SATD detection, comprehension, and repayment. To motivate the submission of novel and improved work, we compile tools, resources, and data sets made available to replicate or extend current SATD research. To set the stage for future work, we identify open challenges in the study of SATD, areas that are missing investigation, and discuss potential future research avenues..
31. Giancarlo Sierra, Emad Shihab, Yasutaka Kamei, A survey of self-admitted technical debt., Journal of Systems and Software, 10.1016/j.jss.2019.02.056, 152, 70-82, 2019.06, © 2019 Elsevier Inc. Technical Debt is a metaphor used to express sub-optimal source code implementations that are introduced for short-term benefits that often need to be paid back later, at an increased cost. In recent years, various empirical studies have focused on investigating source code comments that indicate Technical Debt often referred to as Self-Admitted Technical Debt (SATD). Since the introduction of SATD as a concept, an increasing number of studies have examined various aspects pertaining to SATD. Therefore, in this paper we survey research work on SATD, analyzing the characteristics of current approaches and techniques for SATD detection, comprehension, and repayment. To motivate the submission of novel and improved work, we compile tools, resources, and data sets made available to replicate or extend current SATD research. To set the stage for future work, we identify open challenges in the study of SATD, areas that are missing investigation, and discuss potential future research avenues..
32. Thong Hoang, Hoa Khanh Dam, Yasutaka Kamei, David Lo, Naoyasu Ubayashi, DeepJIT: An end-to-end deep learning framework for just-in-time defect prediction, IEEE International Working Conference on Mining Software Repositories, 10.1109/MSR.2019.00016, 2019-May, 34-45, 2019.05, © 2019 IEEE. Software quality assurance efforts often focus on identifying defective code. To find likely defective code early, change-level defect prediction - aka. Just-In-Time (JIT) defect prediction - has been proposed. JIT defect prediction models identify likely defective changes and they are trained using machine learning techniques with the assumption that historical changes are similar to future ones. Most existing JIT defect prediction approaches make use of manually engineered features. Unlike those approaches, in this paper, we propose an end-to-end deep learning framework, named DeepJIT, that automatically extracts features from commit messages and code changes and use them to identify defects. Experiments on two popular software projects (i.e., QT and OPENSTACK) on three evaluation settings (i.e., cross-validation, short-period, and long-period) show that the best variant of DeepJIT (DeepJIT-Combined), compared with the best performing state-of-the-art approach, achieves improvements of 10.36-11.02% for the project QT and 9.51-13.69% for the project OPENSTACK in terms of the Area Under the Curve (AUC)..
33. Naoyasu Ubayashi, Takuya Watanabe, Yasutaka Kamei, Ryosuke Sato, Git-based integrated uncertainty manager, 41st IEEE/ACM International Conference on Software Engineering: Companion, ICSE-Companion 2019
Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering
Companion, ICSE-Companion 2019
, 10.1109/ICSE-Companion.2019.00047, 95-98, 2019.05, [URL], Nowadays, many software systems are required to be updated and delivered in a short period of time. It is important for developers to make software embrace uncertainty, because user requirements or design decisions are not always completely determined. This paper introduces iArch-U, an Eclipse-based uncertainty-aware software development tool chain, for developers to properly describe, trace, and manage uncertainty crosscutting over UML modeling, Java programming, and testing phases. Integrating with Git, iArch-U can manage why/when/where uncertain concerns arise or are fixed to be certain in a project. In this tool demonstration, we show the world of uncertainty-aware software development using iArch-U. Our tool is open source software released from http://posl.github.io/iArch/..
34. Shaiful Alam Chowdhury, Abram Hindle, Rick Kazman, Takumi Shuto, Ken Matsui, Yasutaka Kamei, GreenBundle: An Empirical Study on the Energy Impact of Bundled Processing, Proceedings - International Conference on Software Engineering, 10.1109/ICSE.2019.00114, 2019-May, 1107-1118, 2019.05, © 2019 IEEE. Energy consumption is a concern in the data-center and at the edge, on mobile devices such as smartphones. Software that consumes too much energy threatens the utility of the end-user's mobile device. Energy consumption is fundamentally a systemic kind of performance and hence it should be addressed at design time via a software architecture that supports it, rather than after release, via some form of refactoring. Unfortunately developers often lack knowledge of what kinds of designs and architectures can help address software energy consumption. In this paper we show that some simple design choices can have significant effects on energy consumption. In particular we examine the Model-View-Controller architectural pattern and demonstrate how converting to Model-View-Presenter with bundling can improve the energy performance of both benchmark systems and real world applications. We show the relationship between energy consumption and bundled and delayed view updates: bundling events in the presenter can often reduce energy consumption by 30%..
35. Hoa Khanh Dam, Truyen Tran, John Grundy, Aditya Ghose, Yasutaka Kamei, Towards effective AI-powered agile project management, 41st IEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Results, ICSE-NIER 2019
Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering
New Ideas and Emerging Results, ICSE-NIER 2019
, 10.1109/ICSE-NIER.2019.00019, 41-44, 2019.05, [URL], The rise of Artificial intelligence (AI) has the potential to significantly transform the practice of project management. Project management has a large socio-technical element with many uncertainties arising from variability in human aspects, e.g. customers' needs, developers' performance and team dynamics. AI can assist project managers and team members by automating repetitive, high-volume tasks to enable project analytics for estimation and risk prediction, providing actionable recommendations, and even making decisions. AI is potentially a game changer for project management in helping to accelerate productivity and increase project success rates. In this paper, we propose a framework where AI technologies can be leveraged to offer support for managing agile projects, which have become increasingly popular in the industry..
36. Thong Hoang, Hoa Khanh Dam, Yasutaka Kamei, David Lo, Naoyasu Ubayashi, DeepJIT: An end-to-end deep learning framework for just-in-time defect prediction, IEEE International Working Conference on Mining Software Repositories, 10.1109/MSR.2019.00016, 2019-May, 34-45, 2019.05, © 2019 IEEE. Software quality assurance efforts often focus on identifying defective code. To find likely defective code early, change-level defect prediction - aka. Just-In-Time (JIT) defect prediction - has been proposed. JIT defect prediction models identify likely defective changes and they are trained using machine learning techniques with the assumption that historical changes are similar to future ones. Most existing JIT defect prediction approaches make use of manually engineered features. Unlike those approaches, in this paper, we propose an end-to-end deep learning framework, named DeepJIT, that automatically extracts features from commit messages and code changes and use them to identify defects. Experiments on two popular software projects (i.e., QT and OPENSTACK) on three evaluation settings (i.e., cross-validation, short-period, and long-period) show that the best variant of DeepJIT (DeepJIT-Combined), compared with the best performing state-of-the-art approach, achieves improvements of 10.36-11.02% for the project QT and 9.51-13.69% for the project OPENSTACK in terms of the Area Under the Curve (AUC)..
37. Naoyasu Ubayashi, Takuya Watanabe, Yasutaka Kamei, Ryosuke Sato, Git-based integrated uncertainty manager, Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion, ICSE-Companion 2019, 10.1109/ICSE-Companion.2019.00047, 95-98, 2019.05, © 2019 IEEE. Nowadays, many software systems are required to be updated and delivered in a short period of time. It is important for developers to make software embrace uncertainty, because user requirements or design decisions are not always completely determined. This paper introduces iArch-U, an Eclipse-based uncertainty-aware software development tool chain, for developers to properly describe, trace, and manage uncertainty crosscutting over UML modeling, Java programming, and testing phases. Integrating with Git, iArch-U can manage why/when/where uncertain concerns arise or are fixed to be certain in a project. In this tool demonstration, we show the world of uncertainty-aware software development using iArch-U. Our tool is open source software released from http://posl.github.io/iArch/..
38. Shaiful Alam Chowdhury, Abram Hindle, Rick Kazman, Takumi Shuto, Ken Matsui, Yasutaka Kamei, GreenBundle: An Empirical Study on the Energy Impact of Bundled Processing, Proceedings - International Conference on Software Engineering, 10.1109/ICSE.2019.00114, 2019-May, 1107-1118, 2019.05, © 2019 IEEE. Energy consumption is a concern in the data-center and at the edge, on mobile devices such as smartphones. Software that consumes too much energy threatens the utility of the end-user's mobile device. Energy consumption is fundamentally a systemic kind of performance and hence it should be addressed at design time via a software architecture that supports it, rather than after release, via some form of refactoring. Unfortunately developers often lack knowledge of what kinds of designs and architectures can help address software energy consumption. In this paper we show that some simple design choices can have significant effects on energy consumption. In particular we examine the Model-View-Controller architectural pattern and demonstrate how converting to Model-View-Presenter with bundling can improve the energy performance of both benchmark systems and real world applications. We show the relationship between energy consumption and bundled and delayed view updates: bundling events in the presenter can often reduce energy consumption by 30%..
39. Hoa Khanh Dam, Truyen Tran, John Grundy, Aditya Ghose, Yasutaka Kamei, Towards effective AI-powered agile project management, Proceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results, ICSE-NIER 2019, 10.1109/ICSE-NIER.2019.00019, 41-44, 2019.05, © 2019 IEEE. The rise of Artificial intelligence (AI) has the potential to significantly transform the practice of project management. Project management has a large socio-technical element with many uncertainties arising from variability in human aspects, e.g. customers' needs, developers' performance and team dynamics. AI can assist project managers and team members by automating repetitive, high-volume tasks to enable project analytics for estimation and risk prediction, providing actionable recommendations, and even making decisions. AI is potentially a game changer for project management in helping to accelerate productivity and increase project success rates. In this paper, we propose a framework where AI technologies can be leveraged to offer support for managing agile projects, which have become increasingly popular in the industry..
40. Naoyasu Ubayashi, Yasutaka Kamei, Ryosuke Sato, IARCH-U/MC
An uncertainty-aware model checker for embracing known unknowns, 13th International Conference on Software Technologies, ICSOFT 2018
ICSOFT 2018 - Proceedings of the 13th International Conference on Software Technologies
, 176-184, 2019.01, Embracing uncertainty in software development is one of the crucial research topics in software engineering. In most projects, we have to deal with uncertain concerns by using informal ways such as documents, mailing lists, or issue tracking systems. This task is tedious and error-prone. Especially, uncertainty in programming is one of the challenging issues to be tackled, because it is difficult to verify the correctness of a program when there are uncertain user requirements, unfixed design choices, and alternative algorithms. This paper proposes iArch-U/MC, an uncertainty-aware model checker for verifying whether or not some important properties are guaranteed even if Known Unknowns remain in a program. Our tool is based on LTSA (Labelled Transition System Analyzer) and is implemented as an Eclipse plug-in..
41. Gopi Krishnan Rajbahadur, Shaowei Wang, Yasutaka Kamei, Ahmed E. Hassan, Impact of Discretization Noise of the Dependent variable on Machine Learning Classifiers in Software Engineering, IEEE Transactions on Software Engineering, 10.1109/TSE.2019.2924371, 2019.01, [URL], Researchers usually discretize a continuous dependent variable into two target classes by introducing an artificial discretization threshold (e.g., median). However, such discretization may introduce noise (i.e., discretization noise) due to ambiguous class loyalty of data points that are close to the artificial threshold. Previous studies do not provide a clear directive on the impact of discretization noise on the classifiers and how to handle such noise. In this paper, we propose a framework to help researchers and practitioners systematically estimate the impact of discretization noise on classifiers in terms of its impact on various performance measures and the interpretation of classifiers. Through a case study of 7 software engineering datasets, we find that: 1) discretization noise affects the different performance measures of a classifier differently for different datasets; 2) Though the interpretation of the classifiers are impacted by the discretization noise on the whole, the top 3 most important features are not affected by the discretization noise. Therefore, we suggest that practitioners and researchers use our framework to understand the impact of discretization noise on the performance of their built classifiers and estimate the exact amount of discretization noise to be discarded from the dataset to avoid the negative impact of such noise..
42. Masanari Kondo, Cor Paul Bezemer, Yasutaka Kamei, Ahmed E. Hassan, Osamu Mizuno, The impact of feature reduction techniques on defect prediction models, Empirical Software Engineering, 10.1007/s10664-018-9679-5, 2019.01, [URL], Defect prediction is an important task for preserving software quality. Most prior work on defect prediction uses software features, such as the number of lines of code, to predict whether a file or commit will be defective in the future. There are several reasons to keep the number of features that are used in a defect prediction model small. For example, using a small number of features avoids the problem of multicollinearity and the so-called ‘curse of dimensionality’. Feature selection and reduction techniques can help to reduce the number of features in a model. Feature selection techniques reduce the number of features in a model by selecting the most important ones, while feature reduction techniques reduce the number of features by creating new, combined features from the original features. Several recent studies have investigated the impact of feature selection techniques on defect prediction. However, there do not exist large-scale studies in which the impact of multiple feature reduction techniques on defect prediction is investigated. In this paper, we study the impact of eight feature reduction techniques on the performance and the variance in performance of five supervised learning and five unsupervised defect prediction models. In addition, we compare the impact of the studied feature reduction techniques with the impact of the two best-performing feature selection techniques (according to prior work). The following findings are the highlights of our study: (1) The studied correlation and consistency-based feature selection techniques result in the best-performing supervised defect prediction models, while feature reduction techniques using neural network-based techniques (restricted Boltzmann machine and autoencoder) result in the best-performing unsupervised defect prediction models. In both cases, the defect prediction models that use the selected/generated features perform better than those that use the original features (in terms of AUC and performance variance). (2) Neural network-based feature reduction techniques generate features that have a small variance across both supervised and unsupervised defect prediction models. Hence, we recommend that practitioners who do not wish to choose a best-performing defect prediction model for their data use a neural network-based feature reduction technique..
43. Yasutaka Kamei, Takahiro Matsumoto, Kazuhiro Yamashita, Naoyasu Ubayashi, Takashi Iwasaki, Shuichi Takayama, Studying the Cost and Effectiveness of OSS Quality Assessment Models: An Experience Report of Fujitsu QNET, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.2018EDP7163, E101D, 11, 2744-2753, 2018.11, Nowadays, open source software (OSS) systems are adopted by proprietary software projects. To reduce the risk of using problematic OSS systems (e.g., causing system crashes), it is important for proprietary software projects to assess OSS systems in advance. Therefore, OSS quality assessment models are studied to obtain information regarding the quality of OSS systems. Although the OSS quality assessment models are partially validated using a small number of case studies, to the best of our knowledge, there are few studies that empirically report how industrial projects actually use OSS quality assessment models in their own development process. In this study, we empirically evaluate the cost and effectiveness of OSS quality assessment models at Fujitsu Kyushu Network Technologies Limited (Fujitsu QNET). To conduct the empirical study, we collect datasets from (a) 120 OSS projects that Fujitsu QNET's projects actually used and (b) 10 problematic OSS projects that caused major problems in the projects. We find that (1) it takes average and median times of 51 and 49 minutes, respectively, to gather all assessment metrics per OSS project and (2) there is a possibility that we can filter problematic OSS systems by using the threshold derived from a pool of assessment metrics. Fujitsu QNET's developers agree that our results lead to improvements in Fujitsu QNET's OSS assessment process. We believe that our work significantly contributes to the empirical knowledge about applying OSS assessment techniques to industrial projects..
44. Yasutaka Kamei, Takahiro Matsumoto, Kazuhiro Yamashita, Naoyasu Ubayashi, Takashi Iwasaki, Shuichi Takayama, Studying the Cost and Effectiveness of OSS Quality Assessment Models: An Experience Report of Fujitsu QNET, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.2018EDP7163, E101D, 11, 2744-2753, 2018.11, Nowadays, open source software (OSS) systems are adopted by proprietary software projects. To reduce the risk of using problematic OSS systems (e.g., causing system crashes), it is important for proprietary software projects to assess OSS systems in advance. Therefore, OSS quality assessment models are studied to obtain information regarding the quality of OSS systems. Although the OSS quality assessment models are partially validated using a small number of case studies, to the best of our knowledge, there are few studies that empirically report how industrial projects actually use OSS quality assessment models in their own development process. In this study, we empirically evaluate the cost and effectiveness of OSS quality assessment models at Fujitsu Kyushu Network Technologies Limited (Fujitsu QNET). To conduct the empirical study, we collect datasets from (a) 120 OSS projects that Fujitsu QNET's projects actually used and (b) 10 problematic OSS projects that caused major problems in the projects. We find that (1) it takes average and median times of 51 and 49 minutes, respectively, to gather all assessment metrics per OSS project and (2) there is a possibility that we can filter problematic OSS systems by using the threshold derived from a pool of assessment metrics. Fujitsu QNET's developers agree that our results lead to improvements in Fujitsu QNET's OSS assessment process. We believe that our work significantly contributes to the empirical knowledge about applying OSS assessment techniques to industrial projects..
45. Junji Shimagaki, Yasutaka Kamei, Naoyasu Ubayashi, Abram Hindle, Automatic topic classification of test cases using text mining at an Android smartphone vendor, 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018
Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018
, 10.1145/3239235.3268927, 2018.10, [URL], Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested and verified?". This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called "feature labels (FLs)", and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups. Aim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling. Method: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company. Results: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label. Conclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results..
46. Junji Shimagaki, Yasutaka Kamei, Naoyasu Ubayashi, Abram Hindle, Automatic topic classification of test cases using text mining at an Android smartphone vendor, International Symposium on Empirical Software Engineering and Measurement, 10.1145/3239235.3268927, 32-10, 2018.10, © 2018 ACM. Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested and verified?". This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called "feature labels (FLs)", and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups. Aim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling. Method: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company. Results: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label. Conclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results..
47. Takashi Watanabe, Akito Monden, Zeynep Yucel, Yasutaka Kamei, Shuji Morisaki, Cross-Validation-Based Association Rule Prioritization Metric for Software Defect Characterization, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.2018EDP7020, E101D, 9, 2269-2278, 2018.09, Association rule mining discovers relationships among variables in a data set, representing them as rules. These are expected to often have predictive abilities, that is, to be able to predict future events, but commonly used rule interestingness measures, such as support and confidence, do not directly assess their predictive power. This paper proposes a cross-validation -based metric that quantifies the predictive power of such rules for characterizing software defects. The results of evaluation this metric experimentally using four open-source data sets (Mylyn, NetBeans, Apache Ant and jEdit) show that it can improve rule prioritization performance over conventional metrics (support, confidence and odds ratio) by 72.8% for Mylyn, 15.0% for NetBeans, 10.5% for Apache Ant and 0 for jEdit in terms of SumNormPre(100) precision criterion. This suggests that the proposed metric can provide better rule prioritization performance than conventional metrics and can at least provide similar performance even in the worst case..
48. Takashi Watanabe, Akito Monden, Zeynep Yucel, Yasutaka Kamei, Shuji Morisaki, Cross-Validation-Based Association Rule Prioritization Metric for Software Defect Characterization, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 10.1587/transinf.2018EDP7020, E101D, 9, 2269-2278, 2018.09, Association rule mining discovers relationships among variables in a data set, representing them as rules. These are expected to often have predictive abilities, that is, to be able to predict future events, but commonly used rule interestingness measures, such as support and confidence, do not directly assess their predictive power. This paper proposes a cross-validation -based metric that quantifies the predictive power of such rules for characterizing software defects. The results of evaluation this metric experimentally using four open-source data sets (Mylyn, NetBeans, Apache Ant and jEdit) show that it can improve rule prioritization performance over conventional metrics (support, confidence and odds ratio) by 72.8% for Mylyn, 15.0% for NetBeans, 10.5% for Apache Ant and 0 for jEdit in terms of SumNormPre(100) precision criterion. This suggests that the proposed metric can provide better rule prioritization performance than conventional metrics and can at least provide similar performance even in the worst case..
49. Shane McIntosh, Yasutaka Kamei, Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction, IEEE Transactions on Software Engineering, 10.1109/TSE.2017.2693980, 44, 5, 412-428, 2018.05, [URL], Just-In-Time (JIT) models identify fix-inducing code changes. JIT models are trained using techniques that assume that past fix-inducing changes are similar to future ones. However, this assumption may not hold, e.g., as system complexity tends to accrue, expertise may become more important as systems age. In this paper, we study JIT models as systems evolve. Through a longitudinal case study of 37,524 changes from the rapidly evolving Qt and OpenStack systems, we find that fluctuations in the properties of fix-inducing changes can impact the performance and interpretation of JIT models. More specifically: (a) the discriminatory power (AUC) and calibration (Brier) scores of JIT models drop considerably one year after being trained; (b) the role that code change properties (e.g., Size, Experience) play within JIT models fluctuates over time; and (c) those fluctuations yield over- and underestimates of the future impact of code change properties on the likelihood of inducing fixes. To avoid erroneous or misleading predictions, JIT models should be retrained using recently recorded data (within three months). Moreover, quality improvement plans should be informed by JIT models that are trained using six months (or more) of historical data, since they are more resilient to period-specific fluctuations in the importance of code change properties..
50. Ariel Rodriguez, Fumiya Tanaka, Yasutaka Kamei, Empirical study on the relationship between developer's working habits and efficiency, 15th ACM/IEEE International Conference on Mining Software Repositories, MSR 2018, co-located with the 40th International Conference on Software Engineering, ICSE 2018
Proceedings - 2018 ACM/IEEE 15th International Conference on Mining Software Repositories, MSR 2018
, 10.1145/3196398.3196458, 74-77, 2018.05, [URL], Software developers can have a reputation for frequently working long and irregular hours which are widely considered to inhibit mental capacity and negatively affect work quality. This paper analyzes the working habits of software developers and the effects these habits have on efficiency based on a large amount of data extracted from the actions of developers in the IDE (Integrated Development Environment), Visual Studio. We use events that recorded the times at which all developer actions were performed along with the numbers of successful and failed build and test events. Due to the high level of detail of the events provided by KaVE project's tool, we were able to analyze the data in a way that previous studies have not been able to. We structure our study along three dimensions: (1) days of the week, (2) time of the day, and (3) continuous work. Our findings will help software developers and team leaders to appropriatly allocate working times and to maximize work quality..
51. Naoyasu Ubayashi, Hokuto Muraoka, Daiki Muramoto, Yasutaka Kamei, Ryosuke Sato, Poster: Exploring uncertainty in GitHub OSS projects: When and how do developers face uncertainty?, Proceedings - International Conference on Software Engineering, 10.1145/3183440.3194966, 272-273, 2018.05, © 2018 Authors. Recently, many developers begin to notice that uncertainty is a crucial problem in software development. Unfortunately, no one knows how often uncertainty appears or what kinds of uncertainty exist in actual projects, because there are no empirical studies on uncertainty. To deal with this problem, we conduct a large-scale empirical study analyzing commit messages and revision histories of 1,444 OSS projects selected from the GitHub repositories..
52. Shane McIntosh, Yasutaka Kamei, Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction, IEEE Transactions on Software Engineering, 10.1109/TSE.2017.2693980, 44, 5, 412-428, 2018.05, © 2017 IEEE. Just-In-Time (JIT) models identify fix-inducing code changes. JIT models are trained using techniques that assume that past fix-inducing changes are similar to future ones. However, this assumption may not hold, e.g., as system complexity tends to accrue, expertise may become more important as systems age. In this paper, we study JIT models as systems evolve. Through a longitudinal case study of 37,524 changes from the rapidly evolving Qt and OpenStack systems, we find that fluctuations in the properties of fix-inducing changes can impact the performance and interpretation of JIT models. More specifically: (a) the discriminatory power (AUC) and calibration (Brier) scores of JIT models drop considerably one year after being trained; (b) the role that code change properties (e.g., Size, Experience) play within JIT models fluctuates over time; and (c) those fluctuations yield over- and underestimates of the future impact of code change properties on the likelihood of inducing fixes. To avoid erroneous or misleading predictions, JIT models should be retrained using recently recorded data (within three months). Moreover, quality improvement plans should be informed by JIT models that are trained using six months (or more) of historical data, since they are more resilient to period-specific fluctuations in the importance of code change properties..
53. Ariel Rodriguez, Fumiya Tanaka, Yasutaka Kamei, Empirical study on the relationship between developer's working habits and efficiency, Proceedings - International Conference on Software Engineering, 10.1145/3196398.3196458, 74-77, 2018.05, © 2018 ACM. Software developers can have a reputation for frequently working long and irregular hours which are widely considered to inhibit mental capacity and negatively affect work quality. This paper analyzes the working habits of software developers and the effects these habits have on efficiency based on a large amount of data extracted from the actions of developers in the IDE (Integrated Development Environment), Visual Studio. We use events that recorded the times at which all developer actions were performed along with the numbers of successful and failed build and test events. Due to the high level of detail of the events provided by KaVE project's tool, we were able to analyze the data in a way that previous studies have not been able to. We structure our study along three dimensions: (1) days of the week, (2) time of the day, and (3) continuous work. Our findings will help software developers and team leaders to appropriatly allocate working times and to maximize work quality..
54. Naoyasu Ubayashi, Hokuto Muraoka, Daiki Muramoto, Yasutaka Kamei, Ryosuke Sato, Poster: Exploring uncertainty in GitHub OSS projects: When and how do developers face uncertainty?, Proceedings - International Conference on Software Engineering, 10.1145/3183440.3194966, 272-273, 2018.05, © 2018 Authors. Recently, many developers begin to notice that uncertainty is a crucial problem in software development. Unfortunately, no one knows how often uncertainty appears or what kinds of uncertainty exist in actual projects, because there are no empirical studies on uncertainty. To deal with this problem, we conduct a large-scale empirical study analyzing commit messages and revision histories of 1,444 OSS projects selected from the GitHub repositories..
55. 中野 大扉, 亀井 靖高, 佐藤 亮介, 鵜林 尚靖, 高山 修一, 岩崎 孝司, OSS事前品質評価における重み付け手法の実証実験, コンピュータソフトウェア, 2018.03.
56. 廣瀬 賢幸, 鵜林 尚靖, 亀井 靖高, 佐藤 亮介, Stack Overflowを利用した自動バグ修正の検討, コンピュータソフトウェア, 2018.03.
57. Xiaochen Li, He Jiang, Yasutaka Kamei, Xin Chen, Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding, IEEE Transactions on Software Engineering, 10.1109/TSE.2018.2876006, 2018.01, [URL], Developers increasingly rely on text matching tools to analyze the relation between natural language words and APIs. However, semantic gaps, namely textual mismatches between words and APIs, negatively affect these tools. Previous studies have transformed words or APIs into low-dimensional vectors for matching; however, inaccurate results were obtained due to the failure of modeling words and APIs simultaneously. To resolve this problem, two main challenges are to be addressed: the acquisition of massive words and APIs for mining and the alignment of words and APIs for modeling. Therefore, this study proposes Word2API to effectively estimate relatedness of words and APIs. Word2API collects millions of commonly used words and APIs from code repositories to address the acquisition challenge. Then, a shuffling strategy is used to transform related words and APIs into tuples to address the alignment challenge. Using these tuples, Word2API models words and APIs simultaneously. Word2API outperforms baselines by 10%-49.6% of relatedness estimation in terms of precision and NDCG. Word2API is also effective on solving typical software tasks, e.g., query expansion and API documents linking. A simple system with Word2API-expanded queries recommends up to 21.4% more related APIs for developers. Meanwhile, Word2API improves comparison algorithms by 7.9%-17.4% in linking questions in Question&Answer communities to API documents..
58. Keisuke Watanabe, Naoyasu Ubayashi, Takuya Fukamachi, Shunya Nakamura, Hokuto Muraoka, Yasutaka Kamei, IArch-U: Interface-Centric Integrated Uncertainty-Aware Development Environment, Proceedings - 2017 IEEE/ACM 9th International Workshop on Modelling in Software Engineering, MiSE 2017, 10.1109/MiSE.2017.7, 40-46, 2017.06, © 2017 IEEE. Uncertainty can appear in all aspects of software development: Uncertainty in requirements analysis, design decisions, implementation and testing. If uncertainty can be dealt with modularly, we can add or delete uncertain concerns to/from models, code and tests whenever these concerns arise or are fixed to certain concerns. To deal with this problem, we developed iArch-U, an IDE (Integrated Development Environment) for managing uncertainty modularly in all phases in software development. In this paper, we introduce an overview of iArch-U. The iArch-U IDE is open source software and can be downloaded from GitHub..
59. Gopi Krishnan Rajbahadur, Shaowei Wang, Yasutaka Kamei, Ahmed E. Hassan, The impact of using regression models to build defect classifiers, IEEE International Working Conference on Mining Software Repositories, 10.1109/MSR.2017.4, 135-145, 2017.06, © 2017 IEEE. It is common practice to discretize continuous defect counts into defective and non-defective classes and use them as a target variable when building defect classifiers (discretized classifiers). However, this discretization of continuous defect counts leads to information loss that might affect the performance and interpretation of defect classifiers. Another possible approach to build defect classifiers is through the use of regression models then discretizing the predicted defect counts into defective and non-defective classes (regression-based classifiers). In this paper, we compare the performance and interpretation of defect classifiers that are built using both approaches (i.e., discretized classifiers and regression-based classifiers) across six commonly used machine learning classifiers (i.e., linear/logistic regression, random forest, KNN, SVM, CART, and neural networks) and 17 datasets. We find that: i) Random forest based classifiers outperform other classifiers (best AUC) for both classifier building approaches, ii) In contrast to common practice, building a defect classifier using discretized defect counts (i.e., discretized classifiers) does not always lead to better performance. Hence we suggest that future defect classification studies should consider building regression-based classifiers (in particular when the defective ratio of the modeled dataset is low). Moreover, we suggest that both approaches for building defect classifiers should be explored, so the best-performing classifier can be used when determining the most influential features..
60. Keisuke Watanabe, Naoyasu Ubayashi, Takuya Fukamachi, Shunya Nakamura, Hokuto Muraoka, Yasutaka Kamei, iArch-U: Interface-Centric Integrated Uncertainty-aware Development Environment, International Workshop on Modeling in Software Engineering (MiSE2017), 2017.05.
61. Gopi Krishnan Rajbahadur, Shaowei Wang, Yasutaka Kamei, Ahmed E. Hassan, The Impact Of Using Regression Models to Build Defect Classifiers, International Conference on Mining Software Repositories (MSR 2017), 2017.05.
62. Keisuke Watanabe, Takuya Fukamachi, Naoyasu Ubayashi, Yasutaka Kamei, Automated A/B Testing with Declarative Variability Expressions, Proceedings - 10th IEEE International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2017, 10.1109/ICSTW.2017.72, 387-388, 2017.04, © 2017 IEEE. A/B testing is the experiment strategy, which is often used on web or mobile application development. In A/B testing, a developer has to implement multiple variations of application, assign each variation to a subset of the entire user population randomly, and analyze log data to decide which variation should be used as a final product. Therefore, it is challenging to keep the application code clean in A/B testing, because defining variations of software or assigning user to each variation needs the modification of code. In fact there are some existing tools to approach this problem. Considering such a context of A/B testing research, we propose the solution based on the interface Archface-U and AOP (Aspect Oriented Programming) which aims to minimize the complication of code in A/B testing..
63. Ayse Tosun, Emad Shihab, Yasutaka Kamei, Erratum to: Studying high impact fix-inducing changes (Empirical Software Engineering, (2016), 21, 2, (605-641), 10.1007/s10664-015-9370-z), Empirical Software Engineering, 10.1007/s10664-016-9455-3, 22, 2, 848, 2017.04, © 2016, Springer Science+Business Media New York The original version of this article unfortunately contained a mistake. The name of the third author was incorrectly displayed as BYasukata Kamei^. The correct information is as shown above..
64. Keisuke Watanabe, Takuya Fukamachi, Naoyasu Ubayashi, Yasutaka Kamei, Automated A/B Testing with Declarative Variability Expressions, Proceedings - 10th IEEE International Conference on Software Testing, Verification and Validation Workshops, ICSTW 2017, 10.1109/ICSTW.2017.72, 387-388, 2017.04, © 2017 IEEE. A/B testing is the experiment strategy, which is often used on web or mobile application development. In A/B testing, a developer has to implement multiple variations of application, assign each variation to a subset of the entire user population randomly, and analyze log data to decide which variation should be used as a final product. Therefore, it is challenging to keep the application code clean in A/B testing, because defining variations of software or assigning user to each variation needs the modification of code. In fact there are some existing tools to approach this problem. Considering such a context of A/B testing research, we propose the solution based on the interface Archface-U and AOP (Aspect Oriented Programming) which aims to minimize the complication of code in A/B testing..
65. Ayse Tosun, Emad Shihab, Yasutaka Kamei, Erratum to: Studying high impact fix-inducing changes (Empirical Software Engineering, (2016), 21, 2, (605-641), 10.1007/s10664-015-9370-z), Empirical Software Engineering, 10.1007/s10664-016-9455-3, 22, 2, 848, 2017.04, © 2016, Springer Science+Business Media New York The original version of this article unfortunately contained a mistake. The name of the third author was incorrectly displayed as BYasukata Kamei^. The correct information is as shown above..
66. Pawin Suthipornopas, Pattara Leelaprute, Akito Monden, Hidetake Uwano, Yasutaka Kamei, Naoyasu Ubayashi, Kenji Araki, Kingo Yamada, Ken-ichi Matsumoto, Industry Application of Software Development Task Measurement System : TaskPit, IEICE Transactions on Information and Systems, Vol.E100-D, No.3, pp.(To Appear), 2017.03.
67. 戸田 航史, 亀井 靖高, 吉田 則裕,, コードレビュー分析におけるデータクレンジングの影響調査, 情報処理学会論文誌, 2017.03.
68. Pawin Suthipornopas, Pattara Leelaprute, Akito Monden, Hidetake Uwano, Yasutaka Kamei, Naoyasu Ubayashi, Kenji Araki, Kingo Yamada, Ken Ichi Matsumoto, Industry Application of Software Development Task Measurement System: TaskPit, IEICE Transactions on Information and Systems, 10.1587/transinf.2016EDP7222, E100D, 3, 462-472, 2017.03, © 2017 The Institute of Electronics, Information and Communication Engineers. To identify problems in a software development process, we have been developing an automated measurement tool called TaskPit, which monitors software development tasks such as programming, testing and documentation based on the execution history of software applications. This paper introduces the system requirements, design and implementation of TaskPit; then, presents two real-world case studies applying TaskPit to actual software development. In the first case study, we applied TaskPit to 12 software developers in a certain software development division. As a result, several concerns (to be improved) have been revealed such as (a) a project leader spent too much time on development tasks while he was supposed to be a manager rather than a developer, (b) several developers rarely used e-mails despite the company's instruction to use e-mail as much as possible to leave communication records during development, and (c) several developers wrote too long e-mails to their customers. In the second case study, we have recorded the planned, actual, and self reported time of development tasks. As a result, we found that (d) there were unplanned tasks in more than half of days, and (e) the declared time became closer day by day to the actual time measured by TaskPit. These findings suggest that TaskPit is useful not only for a project manager who is responsible for process monitoring and improvement but also for a developer who wants to improve by him/herself..
69. 渡辺 啓介, 深町 拓也, 鵜林 尚靖, 細合 晋太郎, 亀井 靖高, 宣言的な可変性記述によるA/Bテストの自動化, コンピュータソフトウェア, 2017.02.
70. Junji Shimagaki, Yasutaka Kamei, Shane McIntosh, David Pursehouse, Naoyasu Ubayashi, Why are commits being reverted? A comparative study of industrial and open source projects, Proceedings - 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016, 10.1109/ICSME.2016.83, 301-311, 2017.01, © 2016 IEEE. Software development is a cyclic process of integrating new features while introducing and fixing defects. During development, commits that modify source code files are uploaded to version control systems. Occasionally, these commits need to be reverted, i.e., the code changes need to be completely backed out of the software project. While one can often speculate about the purpose of reverted commits (e.g., the commit may have caused integration or build problems), little empirical evidence exists to substantiate such claims. The goal of this paper is to better understand why commits are reverted in large software systems. To that end, we quantitatively and qualitatively study two proprietary and four open source projects to measure: (1) the proportion of commits that are reverted, (2) the amount of time that commits that are eventually reverted linger within a codebase, and (3) the most frequent reasons why commits are reverted. Our results show that 1%-5% of the commits in the studied systems are reverted. Those commits that are eventually reverted linger within the studied codebases for 1-35 days (median). Furthermore, we identify 13 common reasons for reverting commits, and observe that the frequency of reverted commits of each reason varies broadly from project to project. A complementary qualitative analysis suggests that many reverted commits could have been avoided with better team communication and change awareness. Our findings made Sony Mobile's stakeholders aware that internally reverted commits can be reduced by paying more attention to their own changes. On the other hand, externally reverted commits could be minimized only if external stakeholders are involved to improve inter-company communication or requirements elicitation..
71. Yasutaka Kamei, Everton Maldonado, Emad Shihab, Naoyasu Ubayashi, Using Analytics to Quantify the Interest of Self-Admitted Technical Debt, International Workshop on Technical Debt Analytics (TDA2016), pp.1-4, December 2016., 2016.12.
72. Junji Shimagaki, Yasutaka Kamei, Shane Mcintosh, David Pursehouse and Naoyasu Ubayashi, Why are Commits being Reverted? A Comparative Study of Industrial and Open Source Projects, International Conference on Software Maintenance and Evolution (ICSME2016), pp.301-311, October 2016. (Raleigh, North Carolina, USA), 2016.10, Software development is a cyclic process of integrating new features while introducing and fixing defects. During development, commits that modify source code files are uploaded to version control systems. Occasionally, these commits need to be reverted, i.e., the code changes need to be completely backed out of the software project. While one can often speculate about the purpose of reverted commits (e.g., the commit may have caused integration or build problems), little empirical evidence exists to substantiate such claims. The goal of this paper is to better understand why commits are reverted in large software systems. To that end, we quantitatively and qualitatively study two proprietary and four open source projects to measure: (1) the proportion of commits that are reverted, (2) the amount of time that commits that are eventually reverted linger within a codebase, and (3) the most frequent reasons why commits are reverted. Our results show that 1%-5% of the commits in the studied systems are reverted. Those commits that are eventually reverted linger within the studied codebases for 1-35 days (median). Furthermore, we identify 13 common reasons for reverting commits, and observe that the frequency of reverted commits of each reason varies broadly from project to project. A complementary qualitative analysis suggests that many reverted commits could have been avoided with better team communication and change awareness. Our findings made Sony Mobile’s stakeholders aware that internally reverted commits can be reduced by paying more attention to their own changes. On the other hand, externally reverted commits could be minimized only if external stakeholders are involved to improve inter-company communication or requirements elicitation..
73. Shane McIntosh, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan, An empirical study of the impact of modern code review practices on software quality, Empirical Software Engineering, 10.1007/s10664-015-9381-9, 21, 5, 2146-2189, 2016.10, © 2015, Springer Science+Business Media New York. Software code review, i.e., the practice of having other team members critique changes to a software system, is a well-established best practice in both open source and proprietary software domains. Prior work has shown that formal code inspections tend to improve the quality of delivered software. However, the formal code inspection process mandates strict review criteria (e.g., in-person meetings and reviewer checklists) to ensure a base level of review quality, while the modern, lightweight code reviewing process does not. Although recent work explores the modern code review process, little is known about the relationship between modern code review practices and long-term software quality. Hence, in this paper, we study the relationship between post-release defects (a popular proxy for long-term software quality) and: (1) code review coverage, i.e., the proportion of changes that have been code reviewed, (2) code review participation, i.e., the degree of reviewer involvement in the code review process, and (3) code reviewer expertise, i.e., the level of domain-specific expertise of the code reviewers. Through a case study of the Qt, VTK, and ITK projects, we find that code review coverage, participation, and expertise share a significant link with software quality. Hence, our results empirically confirm the intuition that poorly-reviewed code has a negative impact on software quality in large systems using modern reviewing tools..
74. Kwabena Ebo Bennin, Koji Toda, Yasutaka Kamei, Jacky Keung, Akito Monden, Naoyasu Ubayashi, Empirical Evaluation of Cross-Release Effort-Aware Defect Prediction Models, Proceedings - 2016 IEEE International Conference on Software Quality, Reliability and Security, QRS 2016, 10.1109/QRS.2016.33, 214-221, 2016.10, © 2016 IEEE. To prioritize quality assurance efforts, various fault prediction models have been proposed. However, the best performing fault prediction model is unknown due to three major drawbacks: (1) comparison of few fault prediction models considering small number of data sets, (2) use of evaluation measures that ignore testing efforts and (3) use of n-fold cross-validation instead of the more practical cross-release validation. To address these concerns, we conducted cross-release evaluation of 11 fault density prediction models using data sets collected from 2 releases of 25 open source software projects with an effort-Aware performance measure known as Norm(Popt). Our result shows that, whilst M5 and K∗ had the best performances, they were greatly influenced by the percentage of faulty modules present and size of data set. Using Norm(Popt) produced an overall average performance of more than 50% across all the selected models clearly indicating the importance of considering testing efforts in building fault-prone prediction models..
75. Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, Ahmed E. Hassan, Studying just-in-time defect prediction using cross-project models, Empirical Software Engineering, 10.1007/s10664-015-9400-x, 21, 5, 2072-2106, 2016.10, © 2015, Springer Science+Business Media New York. Unlike traditional defect prediction models that identify defect-prone modules, Just-In-Time (JIT) defect prediction models identify defect-inducing changes. As such, JIT defect models can provide earlier feedback for developers, while design decisions are still fresh in their minds. Unfortunately, similar to traditional defect models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this limitation in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from other projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT models in a cross-project context. Through an empirical study on 11 open source projects, we find that while JIT models rarely perform well in a cross-project context, their performance tends to improve when using approaches that: (1) select models trained using other projects that are similar to the testing project, (2) combine the data of several other projects to produce a larger pool of training data, and (3) combine the models of several other projects to produce an ensemble model. Our findings empirically confirm that JIT models learned using other projects are a viable solution for projects with limited historical data. However, JIT models tend to perform best in a cross-project context when the data used to learn them are carefully selected..
76. Xin Xia, Emad Shihab, Yasutaka Kamei, David Lo and Xinyu Wang, Predicting Crashing Releases of Mobile Applications, International Symposium on Empirical Software Engineering and Measurement (ESEM), (To appear). (Ciudad Real, Spain)., pp.29:1-29:10, September 2016. (Ciudad Real, Spain)., 2016.09, Context: The quality of mobile applications has a vital impact on their user’s experience, ratings and ultimately overall success. Given the high competition in the mobile application market, i.e., many mobile applications perform the same or similar functionality, users of mobile apps tend to be less tolerant to quality issues.
Goal: Therefore, identifying these crashing releases early on so that they can be avoided will help mobile app developers keep their user base and ensure the overall success of their apps.
Method: To help mobile developers, we use machine learning techniques to effectively predict mobile app releases that are more likely to cause crashes, i.e., crashing releases. To perform our prediction, we mine and use a number of factors about the mobile releases, that are grouped into six unique dimensions: complexity, time, code, diffusion, commit, and text, and use a Naive Bayes classified to perform our prediction.
Results: We perform an empirical study on 10 open source mobile applications containing a total of 2,638 releases from the F-Droid repository. On average, our approach can achieve F1 and AUC scores that improve over a baseline (random) predictor by 50% and 28%, respectively. We also find that factors related to text extracted from the commit logs prior to a release are the best predictors of crashing releases and have the largest effect.
Conclusions: Ourproposedapproachcouldhelptoidentifycrash releases for mobile apps..
77. Keisuke Miura, Shane Mcintosh, Yasutaka Kamei, Ahmed E. Hassan and Naoyasu Ubayashi, The Impact of Task Granularity on Co-evolution Analyses, International Symposium on Empirical Software Engineering and Measurement (ESEM), (To appear). (Ciudad Real, Spain)., 2016.09, Aim: In this paper, we set out to understand the impact that the revision granularity has on co-change analyses. Method: We conduct an empirical study of 14 open source systems that are developed by the Apache Software Foundation. To understand the impact that the revision granularity may have on co-change activity, we study work items, i.e., logical groups of revisions that address a single issue. Results: We find that work item grouping has the poten- tial to impact co-change activity, since 29% of work items consist of 2 or more revisions in 7 of the 14 studied systems. Deeper quantitative analysis shows that, in 7 of the 14 studied systems: (1) 11% of largest work items are entirely composed of small revisions, and would be missed by traditional approaches to filter or analyze large changes, (2) 83% of revisions that co-change under a single work item cannot be grouped using the typical configuration of the sliding time window technique and (3) 48% of work items that involve multiple developers cannot be grouped at the revision-level. Conclusions: Since the work item granularity is the natural means that practitioners use to separate development tasks, future software evolution studies, especially co-change analyses, should be conducted at the work item level..
78. Xin Xia, Emad Shihab, Yasutaka Kamei, David Lo, Xinyu Wang, Predicting Crashing Releases of Mobile Applications, International Symposium on Empirical Software Engineering and Measurement, 10.1145/2961111.2962606, 08-09-September-2016, 2016.09, © 2016 ACM. Context: The quality of mobile applications has a vital impact on their user's experience, ratings and ultimately overall success. Given the high competition in the mobile application market, i.e., many mobile applications perform the same or similar functionality, users of mobile apps tend to be less tolerant to quality issues. Goal: Therefore, identifying these crashing releases early on so that they can be avoided will help mobile app developers keep their user base and ensure the overall success of their apps. Method: To help mobile developers, we use machine learning techniques to effectively predict mobile app releases that are more likely to cause crashes, i.e., crashing releases. To perform our prediction, we mine and use a number of factors about the mobile releases, that are grouped into six unique dimensions: complexity, time, code, diffusion, commit, and text, and use a Naive Bayes classified to perform our prediction. Results: We perform an empirical study on 10 open source mobile applications containing a total of 2,638 releases from the F-Droid repository. On average, our approach can achieve F1 and AUC scores that improve over a baseline (random) predictor by 50% and 28%, respectively. We also find that factors related to text extracted from the commit logs prior to a release are the best predictors of crashing releases and have the largest effect. Conclusions: Our proposed approach could help to identify crash releases for mobile apps..
79. Keisuke Miura, Shane McIntosh, Yasutaka Kamei, Ahmed E. Hassan, Naoyasu Ubayashi, The Impact of Task Granularity on Co-evolution Analyses, International Symposium on Empirical Software Engineering and Measurement, 10.1145/2961111.2962607, 08-09-September-2016, 2016.09, © 2016 ACM. Background: Substantial research in the software evolution field aims to recover knowledge about development from the project history that is archived in repositories, such as a Version Control System (VCS). However, the data that is archived in these repositories can be analyzed at different levels of granularity. Although software evolution is a well-studied phenomenon at the revision-level, revisions may be too fine-grained to accurately represent development tasks. Aim: In this paper, we set out to understand the impact that the revision granularity has on co-change analyses. Method: We conduct an empirical study of 14 open source systems that are developed by the Apache Software Foundation. To understand the impact that the revision granularity may have on co-change activity, we study work items, i.e., logical groups of revisions that address a single issue. Results: We find that work item grouping has the potential to impact co-change activity, since 29% of work items consist of 2 or more revisions in 7 of the 14 studied systems. Deeper quantitative analysis shows that, in 7 of the 14 studied systems: (1) 11% of largest work items are entirely composed of small revisions, and would be missed by traditional approaches to filter or analyze large changes, (2) 83% of revisions that co-change under a single work item cannot be grouped using the typical configuration of the sliding time window technique and (3) 48% of work items that involve multiple developers cannot be grouped at the revision-level. Conclusions: Since the work item granularity is the natural means that practitioners use to separate development tasks, future software evolution studies, especially co-change analyses, should be conducted at the work item level..
80. Kwabena Ebo Bennin, Koji Toda, Yasutaka Kamei, Jacky Keung, Akito Monden and Naoyasu Ubayashi, Empirical Evaluation of Cross-Release Effort-Aware Defect Prediction Models, International Conference on Software Quality, Reliability and Security (QRS2016), pp.214-221, August 2016. (Vienna, Austria)., 2016.08.
81. Kazuhiro Yamashita, Changyun Huang, Meiyappan Nagappan, Yasutaka Kamei, Audris Mockus, Ahmed E. Hassan and Naoyasu Ubayashi, Thresholds for Size and Complexity Metrics: A Case Study from the Perspective of Defect Density, International Conference on Software Quality, Reliability and Security (QRS2016), pp.191-201, August 2016. (Vienna, Austria)., 2016.08.
82. Takashi Watanabe, Akito Monden, Yasutaka Kamei, Shuji Morisaki, Identifying recurring association rules in software defect prediction, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science, ICIS 2016 - Proceedings, 10.1109/ICIS.2016.7550867, 861-866, 2016.08, © 2016 IEEE. Association rule mining discovers patterns of co-occurrences of attributes as association rules in a data set. The derived association rules are expected to be recurrent, that is, the patterns recur in future in other data sets. This paper defines the recurrence of a rule, and aims to find a criteria to distinguish between high recurrent rules and low recurrent ones using a data set for software defect prediction. An experiment with the Eclipse Mylyn defect data set showed that rules of lower than 30 transactions showed low recurrence. We also found that the lower bound of transactions to select high recurrence rules is dependent on the required precision of defect prediction..
83. Kwabena Ebo Bennin, Jacky Keung, Akito Monden, Yasutaka Kamei, Naoyasu Ubayashi, Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models, Proceedings - International Computer Software and Applications Conference, 10.1109/COMPSAC.2016.144, 1, 154-163, 2016.08, © 2016 IEEE. To prioritize software quality assurance efforts, faultprediction models have been proposed to distinguish faulty modules from clean modules. The performances of such models are often biased due to the skewness or class imbalance of the datasets considered. To improve the prediction performance of these models, sampling techniques have been employed to rebalance the distribution of fault-prone and non-fault-prone modules. The effect of these techniques have been evaluated in terms of accuracy/geometric mean/F1-measure in previous studies, however, these measures do not consider the effort needed to fixfaults. To empirically investigate the effect of sampling techniqueson the performance of software fault prediction models in a morerealistic setting, this study employs Norm(Popt), an effort-awaremeasure that considers the testing effort. We performed two setsof experiments aimed at (1) assessing the effects of samplingtechniques on effort-aware models and finding the appropriateclass distribution for training datasets (2) investigating the roleof balanced training and testing datasets on performance ofpredictive models. Of the four sampling techniques applied, the over-sampling techniques outperformed the under-samplingtechniques with Random Over-sampling performing best withrespect to the Norm (Popt) evaluation measure. Also, performanceof all the prediction models improved when sampling techniqueswere applied between the rates of (20-30)% on the trainingdatasets implying that a strictly balanced dataset (50% faultymodules and 50% clean modules) does not result in the bestperformance for effort-aware models. Our results also indicatethat performances of effort-aware models are significantly dependenton the proportions of the two types of the classes in thetesting dataset. Models trained on moderately balanced datasetsare more likely to withstand fluctuations in performance as theclass distribution in the testing data varies..
84. Takashi Watanabe, Akito Monden, Yasutaka Kamei, Shuji Morisaki, Identifying Recurring Association Rules in Software Defect Prediction, International Conference on Computer and Information Science (ICIS2016), pp.1-6, June 2016. (Okayama, Japan)., 2016.06.
85. Kwabena Ebo Bennin, Jacky Keung, Akito Monden, Yasutaka Kamei and Naoyasu Ubayashi, Investigating the Effects of Balanced Training and Testing Data Sets on Effort-Aware Fault Prediction Models, International Conference on Computers, Software and Applications (COMPSAC), 2016.06.
86. Junji Shimagaki, Yasutaka Kamei, Shane Mcintosh, Ahmed E. Hassan and Naoyasu Ubayashi, A Study of the Quality-Impacting Practices of Modern Code Review at Sony Mobile, the International Conference on Software Engineering (ICSE2016) Software Engineering in Practice (SEIP), 2016.05, Nowadays, a flexible, lightweight variant of the code review process (i.e., the practice of having other team members critique software changes) is adopted by open source and pro prietary software projects. While this flexibility is a blessing (e.g., enabling code reviews to span the globe), it does not mandate minimum review quality criteria like the formal code inspections of the past. Recent work shows that lax reviewing can impact the quality of open source systems. In this paper, we investigate the impact that code review- ing practices have on the quality of a proprietary system that is developed by Sony Mobile. We begin by replicating open source analyses of the relationship between software quality (as approximated by post-release defect-proneness) and: (1) code review coverage, i.e., the proportion of code changes that have been reviewed and (2) code review partic ipation, i.e., the degree of reviewer involvement in the code review process. We also perform a qualitative analysis, with a survey of 93 stakeholders, semi-structured interviews with 15 stakeholders, and a follow-up survey of 25 senior engineers. Our results indicate that while past measures of review coverage and participation do not share a relationship with defect-proneness at Sony Mobile, reviewing measures that are aware of the Sony Mobile development context are associated with defect-proneness. Our results have lead to improvements of the Sony Mobile code review process..
87. Masateru Tsunoda, Yasutaka Kamei, Atsushi Sawada, Assessing the differences of clone detection methods used in the fault-prone module prediction, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, 10.1109/SANER.2016.65, 15-16, 2016.05, © 2016 IEEE. We have investigated through several experiments the differences in the fault-prone module prediction accuracy caused by the differences in the constituent code clone metrics of the prediction model. In the previous studies, they use one or more code clone metrics as independent variables to build an accurate prediction model. While they often use the clone detection method proposed by Kamiya et al. to calculate these metrics, the effect of the detection method on the prediction accuracy is not clear. In the experiment, we built prediction models using a dataset collected from an open source software project. The result suggests that the prediction accuracy is improved, when clone metrics derived from the various clone detection tool are used..
88. Junji Shimagaki, Yasutaka Kamei, Shane McIntosh, Ahmed E. Hassan, Naoyasu Ubayashi, A study of the quality-impacting practices of modern code review at Sony mobile, Proceedings - International Conference on Software Engineering, 10.1145/2889160.2889243, 212-221, 2016.05, © 2016 ACM. Nowadays, a flexible, lightweight variant of the code review process (i.e., the practice of having other team members critique software changes) is adopted by open source and proprietary software projects. While this flexibility is a blessing (e.g., enabling code reviews to span the globe), it does not mandate minimum review quality criteria like the formal code inspections of the past. Recent work shows that lax reviewing can impact the quality of open source systems. In this paper, we investigate the impact that code reviewing practices have on the quality of a proprietary system that is developed by Sony Mobile. We begin by replicating open source analyses of the relationship between software quality (as approximated by post-release defect-proneness) and: (1) code review coverage, i.e., the proportion of code changes that have been reviewed and (2) code review participation, i.e., the degree of reviewer involvement in the code review process. We also perform a qualitative analysis, with a survey of 93 stakeholders, semi-structured interviews with 15 stakeholders, and a follow-up survey of 25 senior engineers. Our results indicate that while past measures of review coverage and participation do not share a relationship with defect-proneness at Sony Mobile, reviewing measures that are aware of the Sony Mobile development context are associated with defect-proneness. Our results have lead to improvements of the Sony Mobile code review process..
89. Masateru Tsunoda, Yasutaka Kamei, Atsushi Sawada, Assessing the differences of clone detection methods used in the fault-prone module prediction, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2016, 10.1109/SANER.2016.65, 15-16, 2016.05, © 2016 IEEE. We have investigated through several experiments the differences in the fault-prone module prediction accuracy caused by the differences in the constituent code clone metrics of the prediction model. In the previous studies, they use one or more code clone metrics as independent variables to build an accurate prediction model. While they often use the clone detection method proposed by Kamiya et al. to calculate these metrics, the effect of the detection method on the prediction accuracy is not clear. In the experiment, we built prediction models using a dataset collected from an open source software project. The result suggests that the prediction accuracy is improved, when clone metrics derived from the various clone detection tool are used..
90. Bodin Chinthanet, Passakorn Phannachitta, Yasutaka Kamei, Pattara Leelaprute, Arnon Rungsawang, Naoyasu Ubayashi and Kenichi Matsumoto, A Review and Comparison of Methods for Determining the Best Analogies in Analogy-based Software Effort Estimation, International Symposium on Applied Computing (SAC 2016) Poster Session, 2016.04.
91. Bodin Chinthanet, Pattara Leelaprute, Arnon Rungsawang, Passakorn Phannachitta, Naoyasu Ubayashi, Yasutaka Kamei, Kenichi Matsumoto, A review and comparison of methods for determining the best analogies in analogy-based software effort estimation, Proceedings of the ACM Symposium on Applied Computing, 10.1145/2851613.2851974, 04-08-April-2016, 1554-1557, 2016.04, © 2016 ACM. Analogy-based effort estimation (ABE) is a commonly used software development effort estimation method. The processes of ABE are based on a reuse of effort values from similar past projects, where the appropriate numbers of past projects (k values) to be reused is one of the long-standing debates in ABE research studies. To date, many approaches to find this k value have been continually proposed. One important reason for this inconclusive debate is that different studies appear to produce different conclusions of the k value to be appropriate. Therefore, in this study, we revisit 8 common approaches to the k value being most appropriate in general situations. With a more robust and comprehensive evaluation methodology using 5 robust error measures subject to the Wilcoxon rank-sum statistical test, we found that conflicting results in the previous studies were not mainly due to the use of different methodologies nor different datasets, but the performance of the different approaches are actually varied widely..
92. Ayse Tosun Misirli, Emad Shihab, Yasukata Kamei, Studying high impact fix-inducing changes, Empirical Software Engineering, 10.1007/s10664-015-9370-z, 21, 2, 605-641, 2016.04, © 2015, Springer Science+Business Media New York. As software systems continue to play an important role in our daily lives, their quality is of paramount importance. Therefore, a plethora of prior research has focused on predicting components of software that are defect-prone. One aspect of this research focuses on predicting software changes that are fix-inducing. Although the prior research on fix-inducing changes has many advantages in terms of highly accurate results, it has one main drawback: It gives the same level of impact to all fix-inducing changes. We argue that treating all fix-inducing changes the same is not ideal, since a small typo in a change is easier to address by a developer than a thread synchronization issue. Therefore, in this paper, we study high impact fix-inducing changes (HIFCs). Since the impact of a change can be measured in different ways, we first propose a measure of impact of the fix-inducing changes, which takes into account the implementation work that needs to be done by developers in later (fixing) changes. Our measure of impact for a fix-inducing change uses the amount of churn, the number of files and the number of subsystems modified by developers during an associated fix of the fix-inducing change. We perform our study using six large open source projects to build specialized models that identify HIFCs, determine the best indicators of HIFCs and examine the benefits of prioritizing HIFCs. Using change factors, we are able to predict 56 % to 77 % of HIFCs with an average false alarm (misclassification) rate of 16 %. We find that the lines of code added, the number of developers who worked on a change, and the number of prior modifications on the files modified during a change are the best indicators of HIFCs. Lastly, we observe that a specialized model for HIFCs can provide inspection effort savings of 4 % over the state-of-the-art models. We believe our results would help practitioners prioritize their efforts towards the most impactful fix-inducing changes and save inspection effort..
93. Yasutaka Kamei, Emad Shihab, Defect Prediction: Accomplishments and Future Challenges, Leaders of Tomorrow / Future of Software Engineering Track at International Conference on Software Analysis Evolution and Reengineering (SANER2016), Issue 2, pp.99-104., 2016.03.
94. Felienne Hermans, Janet Siegmund, Thomas Fritz, Gabriele Bavota, Meiyappan Nagappan, Abram Hindle, Yasutaka Kamei, Ali Mesbah, Bram Adams, Leaders of tomorrow on the future of software engineering: A roundtable, IEEE Software, 10.1109/MS.2016.55, 33, 2, 99-104, 2016.03, © 1984-2012 IEEE. Nine rising stars in software engineering describe how software engineering research will evolve, highlighting emerging opportunities and groundbreaking solutions. They predict the rise of end-user programming, the monitoring of developers through neuroimaging and biometrics sensors, analysis of data from unstructured documents, the mining of mobile marketplaces, and changes to how we create and release software..
95. Kazuhiro Yamashita, Yasutaka Kamei, Shane McIntosh, Ahmed E. Hassan and Naoyasu Ubayashi, Magnet or Sticky? Measuring Project Characteristics from the Perspective of Developer Attraction and Retention, Journal of Information Processing, Vol.24, No.2, pp.339-348, 2016.03.
96. Yasutaka Kamei, Software Quality Assurance 2.0: Proactive, Practical, and Relevant, IEEE SOFTWARE, 33, 2, 102-103, 2016.03.
97. Felienne Hermans, Janet Siegmund, Thomas Fritz, Gabriele Bavota, Meiyappan Nagappan, Abram Hindle, Yasutaka Kamei, Ali Mesbah, Bram Adams, Leaders of tomorrow on the future of software engineering: A roundtable, IEEE Software, 10.1109/MS.2016.55, 33, 2, 99-104, 2016.03, © 1984-2012 IEEE. Nine rising stars in software engineering describe how software engineering research will evolve, highlighting emerging opportunities and groundbreaking solutions. They predict the rise of end-user programming, the monitoring of developers through neuroimaging and biometrics sensors, analysis of data from unstructured documents, the mining of mobile marketplaces, and changes to how we create and release software..
98. Kazuhiro Yamashita, Yasutaka Kamei, Shane McIntosh, Ahmed E. Hassan, Naoyasu Ubayashi, Magnet or sticky? Measuring project characteristics from the perspective of developer attraction and retention, Journal of Information Processing, 10.2197/ipsjjip.24.339, 24, 2, 339-348, 2016.03, © 2016 Information Processing Society of Japan. Open Source Software (OSS) is vital to both end users and enterprises. As OSS systems are becoming a type of infrastructure, long-term OSS projects are desired. For the survival of OSS projects, the projects need to not only retain existing developers, but also attract new developers to grow. To better understand how projects retain and attract contributors, our preliminary study aimed to measure the personnel attraction and retention of OSS projects using a pair of population migration metrics, called Magnet (personnel attraction) and Sticky (retention) metrics. Because the preliminary study analyzed only 90 projects and the 90 projects are not representative of GitHub, this paper extend the preliminary study to better understand the generalizability of the results by analyzing 16,552 projects of GitHub. Furthermore, we also add a pilot study to investigate the typical duration between releases to find more appropriate release duration. The study results show that (1) approximately 23% of developers remain in the same projects that the developers contribute to, (2) the larger projects are likely to attract and retain more developers, (3) 53% of terminal projects eventually decay to a state of fewer than ten developers and (4) 55% of attractive projects remain in an attractive category..
99. Yasutaka Kamei, Software Quality Assurance 2.0: Proactive, Practical, and Relevant, IEEE SOFTWARE, 33, 2, 102-103, 2016.03.
100. 小須田 光, 亀井 靖高, 鵜林 尚靖, クラッシュレポートの送信頻度と不具合との関連付けに関する実証的評価, コンピュータソフトウェア, Vol.32, No.4, pp.131-140,, 2015.12.
101. 中川 尊雄, 亀井 靖高, 上野 秀剛, 門田 暁人, 鵜林 尚靖, 松本 健一, 脳活動に基づくプログラム理解の困難さ測定, コンピュータソフトウェア, 2015.11.
102. Meiyappan Nagappan, Romain Robbes, Yasutaka Kamei, Eric Tanter, Shane Mcintosh, Audris Mockus, Ahmed E. Hassan, An Empirical Study of goto in C Code from GitHub Repositories, the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE2015), pp.404-414, 2015.09.
103. Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi and Ahmed E. Hassan, Studying Just-In-Time Defect Prediction using Cross-Project Models, Journal of Empirical Software Engineering, Online first (pp.1-35), 2015.09.
104. Kazuhiro Yamashita, Shane McIntosh, Yasutaka Kamei, Ahmed E. Hassan and Naoyasu Ubayashi, Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects, International Workshop on Principles of Software Evolution (IWPSE 2015), pp.46-55, 2015.08.
105. Meiyappan Nagappan, Romain Robbes, Yasutaka Kamei, Éric Tanter, Shane Mcintosh, Audris Mockus, Ahmed E. Hassan, An empirical study of goto in C code from github repositories, 2015 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2015 - Proceedings, 10.1145/2786805.2786834, 404-414, 2015.08, © 2015 ACM. It is nearly 50 years since Dijkstra argued that goto obscures the ow of control in program execution and urged programmers to abandon the goto statement. While past research has shown that goto is still in use, little is known about whether goto is used in the unrestricted manner that Dijkstra feared, and if it is harmful' enough to be a part of a post-release bug. We, therefore, conduct a two part empirical study - (1) qualitatively analyze a statistically representative sample of 384 files from a population of almost 250K C programming language files collected from over 11K GitHub repositories and find that developers use goto in C files for error handling (80:21 ± 5%) and cleaning up resources at the end of a procedure (40:36 ± 5%); and (2) quantitatively analyze the commit history from the release branches of six OSS projects and find that no goto statement was re- moved/modified in the post-release phase of four of the six projects. We conclude that developers limit themselves to using goto appropriately in most cases, and not in an un- restricted manner like Dijkstra feared, thus suggesting that goto does not appear to be harmful in practice..
106. Takuya Fukamachi, Naoyasu Ubayashi, Shintaro Hosoai, Yasutaka Kamei, Poster: Conquering Uncertainty in Java Programming, Proceedings - International Conference on Software Engineering, 10.1109/ICSE.2015.266, 2, 823-824, 2015.08, © 2015 IEEE. Uncertainty in programming is one of the challenging issues to be tackled, because it is error-prone for many programmers to temporally avoid uncertain concerns only using simple language constructs such as comments and conditional statements. This paper proposes ucJava, a new Java programming environment for conquering uncertainty. Our environment provides a modular programming style for uncertainty and supports test-driven development taking uncertainty into consideration..
107. Kazuhiro Yamashita, Shane McIntosh, Yasutaka Kamei, Ahmed E. Hassan, Naoyasu Ubayashi, Revisiting the applicability of the pareto principle to core development teams in open source software projects, International Workshop on Principles of Software Evolution (IWPSE), 10.1145/2804360.2804366, 30-Aug-2015, 46-55, 2015.08, © 2015 ACM. It is often observed that the majority of the development work of an Open Source Software (OSS) project is contributed by a core team, i.e., a small subset of the pool of active developers. In fact, recent work has found that core development teams follow the Pareto principle-roughly 80% of the code contributions are produced by 20% of the active developers. However, those findings are based on samples of between one and nine studied systems. In this paper, we revisit prior studies about core developers using 2,496 projects hosted on GitHub. We find that even when we vary the heuristic for detecting core developers, and when we control for system size, team size, and project age: (1) the Pareto principle does not seem to apply for 40%-87% of GitHub projects; and (2) more than 88% of GitHub projects have fewer than 16 core developers. Moreover, we find that when we control for the quantity of contributions, bug fixing accounts for a similar proportion of the contributions of both core (18%-20%) and non-core developers (21%-22%). Our findings suggest that the Pareto principle is not compatible with the core teams of many GitHub projects. In fact, several of the studied GitHub projects are susceptible to the bus factor, where the impact of a core developer leaving would be quite harmful..
108. Shane Mcintosh, Yasutaka Kamei, Bram Adams and Ahmed E. Hassan, An Empirical Study of the Impact of Modern Code Review Practices on Software Quality, Journal of Empirical Software Engineering, Online first (pp.1-45), 2015.05.
109. Takuya Fukamachi, Naoyasu Ubayashi, Shintaro Hosoai, Yasutaka Kamei, Modularity for Uncertainty, International Workshop on Modeling in Software Engineering (MiSE2015), pp.7-12, 2015.05.
110. Takuya Fukamachi, Naoyasu Ubayashi, Shintaro Hosoai, Yasutaka Kamei, Poster: Conquering Uncertainty in Java Programming, International Conference on Software Engineering (ICSE2015), Poster Session., 2015.05.
111. Ayse Tosun Misirli, Emad Shihab, Yasutaka Kamei, Studying High Impact Fix-Inducing Changes, Journal of Empirical Software Engineering, Online first (pp.1-37), 2015.05.
112. Changyun Huang, Ataru Osaka, Yasutaka Kamei, Naoyasu Ubayashi, Automated DSL Construction Based on Software Product Lines, International Conference on Model-Driven Engineering and Software Development (MODELSWARD2015), Poster Session, 2015.02.
113. 戸田 航史, 亀井 靖高, 濵﨑 一樹, 吉田 則裕, Chromiumプロジェクトにおけるレビュー・パッチ開発経験がレビューに要する時間に与える影響の分析, コンピュータソフトウェア, Vol.32, No.1, pp.227-233, 2015.02.
114. 柏 祐太郎, 大平 雅雄, 阿萬 裕久, 亀井 靖高, 大規模OSS開発における不具合修正時間の短縮化を目的としたバグトリアージ手法, 情報処理学会論文誌, Vol.56, No.2, pp,669-681, 2015.02.
115. 柏 祐太郎, 大平 雅雄, 阿萬 裕久, 亀井 靖高, 大規模OSS開発における不具合修正時間の短縮化を目的としたバグトリアージ手法, 情報処理学会論文誌, 56, 2, 669-681, 2015.02, 本論文では,大規模OSS開発における不具合修正時間の短縮化を目的としたバグトリアージ手法を提案する.提案手法は,開発者の適性に加えて,開発者が一定期間に取り組めるタスク量の上限を考慮している点に特徴がある.Mozilla FirefoxおよびEclipse Platformプロジェクトを対象としたケーススタディを行った結果,提案手法について以下の3つの効果を確認した.(1)一部の開発者へタスクが集中するという問題を緩和できること.(2)現状のタスク割当て方法に比べFirefoxでは50%(Platformでは誤差が大きすぎるため計測不能),既存手法に比べFirefoxでは34%,Platformでは38%の不具合修正時間を削減できること.(3)提案手法で用いた2つの設定,プリファレンス(開発者の適性)と上限(開発者が取り組むことのできる時間の上限)が,タスクの分散効果にそれぞれ同程度寄与すること.This paper proposes a bug triaging method to reduce the time to fix bugs in large-scale open source software development. Our method considers the upper limit of tasks which can be fixed by a developer in a certain period. In this paper, we conduct a case study of applying our method to Mozilla Firefox and Eclipse Platform projects and show the following findings: (1) using our method mitigates the situation where the majority of bug-fixing tasks are assigned to particular developers, (2) our method can reduce up to 50%-83% of time to fix bugs compared with the manual bug triaging method and up to 34%-38% compared with the existing method, and (3) the two factors, Preference (adequate for fixing a bug) and Limit (limits of developers' working hours), used in our method have an dispersion effect on the task assignment..
116. Peiyuan Li, Naoyasu Ubayashi, Di Ai, Yu Ning Li, Shintaro Hosoai, Yasutaka Kamei, Sketch-Based Gradual Model-Driven Development, International Workshop on Innovative Software Development Methodologies and Practices (InnoSWDev 2014), pp.100-105, 2014.11.
117. Naoyasu Ubayashi, Di Ai, Peiyuan Li, Yu Ning Li, Shintaro Hosoai, Yasutaka Kamei, Uncertainty-Aware Architectural Interface, International Workshop on Advanced Modularization Techniques (AOAsia/Pacific 2014), pp.4-6, 2014.11.
118. Akinori Ihara, Yasutaka Kamei, Masao Ohira, Ahmed E. Hassan, Naoyasu Ubayashi, Ken Ichi Matsumoto, Early identification of future committers in open source software projects, Proceedings - International Conference on Quality Software, 10.1109/QSIC.2014.30, 47-56, 2014.11, © 2014 IEEE. There exists two types of developers in Open Source Software (OSS) projects: 1) Committers who have permission to commit edited source code to the Version Control System (VCS), 2) Developers who contribute source code but cannot commit to the VCS directly. In order to develop and evolve high quality OSS, projects are always in search of new committers. OSS projects often promote strong developers to become committers. When existing committers find strong developers, they propose their promotion to a committer role. Delaying the committer-promotion might lead to strong developers departing from an OSS project and the project losing them. However early committer-promotion comes with its own slew of risks as well (e.g., the promotion of inexperienced developers). Hence, committer-promotion decisions are critical for the quality and successful evolution of OSS projects. In this paper, we examine the committer-promotion phenomena for two OSS projects (Eclipse and Firefox). We find that the amount of activities by future committers was higher than the amount of activities by developers who did not become committers). We also find that some developers are promoted to a committer role very rapidly (within a few month) while some of developers take over one year to become a committer. Finally, we develop a committer-identification model to assist OSS projects identifying future committers..
119. Peiyuan Li, Naoyasu Ubayashi, Di Ai, Yu Ning Li, Shintaro Hosoai, Yasutaka Kamei, Sketch-Based gradual model-driven development, International Workshop on Innovative Software Development Methodologies and Practices, InnoSWDev 2014 - Proceedings, 10.1145/2666581.2666595, 100-105, 2014.11, Copyright © 2014 ACM. This paper proposes an abstraction-aware reverse engineering method in which a developer just makes a mark on an important code region as if he or she draws a quick sketch on the program list. A support tool called iArch slices a program from marked program points and generates an abstract design model faithful to the intention of the developer. The developer can modify the design model and re-generate the code again while preserving the abstraction level and the traceability. Archface, an interface mechanism between design and code, plays an important role in abstraction-aware traceability check. If the developer wants to obtain a more concrete design model from the code, he or she only has to make additional marks on the program list. We can gradually transition to model-driven development style..
120. Naoyasu Ubayashi, Di Ai, Peiyuan Li, Yu Ning Li, Shintaro Hosoai, Yasutaka Kamei, Uncertainty-aware architectural interface, 9th International Workshop on Advanced Modularization Techniques, AOAsia 2014 - Proceedings, 10.1145/2666358.2666579, 4-6, 2014.11, Copyright 2014 ACM. In most software development projects, design models tend to contain uncertainty, because all of the design concerns cannot be captured at the early development phase. It is preferable to be able to check consistency or traceability among design models and programs even if they contain uncertain concerns. To deal with this problem, we propose the notion of uncertainty-aware Archface, an interface mechanism exposing a set of architectural points that should be shared between design and code. We can explicitly describe uncertainty in design models or programs by specifying uncertain architectural points..
121. Akinori Ihara, Yasutaka Kamei, Masao Ohira, Ahmed E. Hassan, Naoyasu Ubayashi and Kenichi Matsumoto, Early Identification of Future Committers in Open Source Software Projects, International Conference on Quality Software (QSIC2014), pp.47-56, 2014.10.
122. Naoyasu Ubayashi, Di Ai, Peiyuan Li, Yu Ning Li, Shintaro Hosoai and Yasutaka Kamei, Abstraction-aware Verifying Compiler for Yet Another MDD, International Conference on Automated Software Engineering (ASE 2014) [new ideas paper track], pp.557-562, 2014.09.
123. 中川 尊雄, 亀井 靖高, 上野 秀剛, 門田 暁人, 松本 健一, プログラム理解の困難さの脳血流による計測の試み, コンピュータ ソフトウェア, 10.11309/jssst.31.3_270, 31, 3, 3_270-3_276, 2014.09, 本論文では,脳血流を計測する近赤外分光法(Near InfraRed Spectroscopy; NIRS)を用いて,開発者がプログラム理解時に困難を感じている状態にあるかどうかを定量的に観測することを試みた.10名の被験者に対して,難易度の異なる二種類のプログラムの理解時の脳血流を計測する実験を行い,10名中8名において難度の高いプログラムの理解時に脳活動がより活発化するという結果が得られた.また,被験者ごとに脳活動をZ-scoreで正規化し,難度ごとに集計した上でt検定を実施した結果,脳活動の平均値に有意な差(p
124. 中川 尊雄, 亀井 靖高, 上野 秀剛, 門田 暁人, 松本 健一, プログラム理解の困難さの脳血流による計測の試み, コンピュータソフトウェア, Vol.31, No.3, pp.270-276, 2014.08.
125. Shuhei Ohsako, Yasutaka Kamei, Shintaro Hosoai, Weiqiang Kong, Kimitaka Kato, Akihiko Ishizuka, Kazutoshi Sakaguchi, Miyuki Kawataka, Yoshitsugu Morita, Naoyasu Ubayashi and Akira Fukuda, A Case Study on Introducing the Design Thinking into PBL, International Conference on Frontiers in Education: CS and CE (FECS 2014), 2014.07.
126. Takafumi Fukushima, Yasutaka Kamei, Shane McIntosh, Kazuhiro Yamashita and Naoyasu Ubayashi, An Empirical Study of Just-In-Time Defect Prediction Using Cross-Project Models, International Working Conference on Mining Software Repositories (MSR 2014), pp.172-181, 2014.06.
127. Kazuhiro Yamashita, Shane McIntosh, Yasutaka Kamei and Naoyasu Ubayashi, Magnet or Sticky?: An OSS Project-by-Project Typology, International Working Conference on Mining Software Repositories (MSR 2014), pp.344-347, 2014.06.
128. Takao Nakagawa, Yasutaka Kamei, Hidetake Uwano, Akito Monden, Kenichi Matsumoto and Daniel M. German, Quantifying Programmers' Mental Workload during Program Comprehension Based on Cerebral Blood Flow Measurement: A Controlled Experiment, International Conference on Software Engineering (ICSE2014), NIER Track, pp.448-451, 2014.06.
129. Shane Mcintosh, Yasutaka Kamei, Bram Adams and Ahmed E. Hassan, The Impact of Code Review Coverage and Code Review Participation on Software Quality: A Case Study of the Qt, VTK, and ITK Projects, International Working Conference on Mining Software Repositories (MSR 2014), pp.192-201, 2014.06.
130. Daisuke Nakano, Akito Monden, Yasutaka Kamei, Kenichi Matsumoto, Simulation of effort allocation strategies in software testing using bug module, Computer Software, 31, 2, 118-128, 2014.05, To date, various techniques for predicting fault-prone modules have been proposed; however, test strategies, which assign a certain amount of test effort to each module, have been rarely studied. This paper proposes a simulation model of software testing that can evaluate various test strategies. The simulation model estimates the number of discoverable faults with respect to the given test resources, the test strategy, complexity metrics of a set of modules to be tested, and the fault prediction results. Based on a case study of simulation applying fault prediction to two open source software (Eclipse and Mylyn), we show the relationship between the available test effort and the effective test strategy..
131. 中野 大輔, 門田 暁人, 亀井 靖高, 松本 健一, バグモジュール予測を用いたテスト工数割り当て戦略のシミュレーション, コンピュータソフトウェア, Vol.31, No.2, pp.118-128, 2014.05.
132. 角田 雅照, 戸田 航史, 伏田 享平, 亀井 靖高, Meiyappan Nagappan, 鵜林 尚靖, 上流工程での活動実績を用いたソフトウェア開発工数見積もり方法の定量的評価, コンピュータソフトウェア, Vol.31, No.2, pp.129-143, 2014.05.
133. Takafumi Fukushima, Yasutaka Kamei, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, An empirical study of just-in-time defect prediction using cross-project models, 11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings, 10.1145/2597073.2597075, 172-181, 2014.05, Copyright 2014 ACM. Prior research suggests that predicting defect-inducing changes, i.e., Just-In-Time (JIT) defect prediction is a more practical alternative to traditional defect prediction techniques, providing immediate feedback while design decisions are still fresh in the minds of developers. Unfortunately, similar to traditional defect prediction models, JIT models require a large amount of training data, which is not available when projects are in initial development phases. To address this flaw in traditional defect prediction, prior work has proposed cross-project models, i.e., models learned from older projects with sufficient history. However, cross-project models have not yet been explored in the context of JIT prediction. Therefore, in this study, we empirically evaluate the performance of JIT cross-project models. Through a case study on 11 open source projects, we find that in a JIT cross-project context: (1) high performance within-project models rarely perform well; (2) models trained on projects that have similar correlations between predictor and dependent variables often perform well; and (3) ensemble learning techniques that leverage historical data from several other projects (e.g., voting experts) often perform well. Our findings empirically confirm that JIT cross-project models learned using other projects are a viable solution for projects with little historical data. However, JIT cross-project models perform best when the data used to learn them is carefully selected..
134. Kazuhiro Yamashita, Shane McIntosh, Yasutaka Kamei, Naoyasu Ubayashi, Magnet or sticky? An OSS project-by-project typology, 11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings, 10.1145/2597073.2597116, 344-347, 2014.05, Copyright 2014 ACM. For Open Source Software (OSS) projects, retaining existing contributors and attracting new ones is a major concern. In this paper, we expand and adapt a pair of population migration metrics to analyze migration trends in a collection of open source projects. Namely, we study: (1) project stickiness, i.e., its tendency to retain existing contributors and (2) project magnetism, i.e., its tendency to attract new contributors. Using quadrant plots, we classify projects as attractive (highly magnetic and sticky), stagnant (highly sticky, weakly magnetic), fluctuating (highly magnetic, weakly sticky), or terminal (weakly magnetic and sticky). Through analysis of the MSR challenge dataset, we find that: (1) quadrant plots can effectively identify at-risk projects, (2) stickiness is often motivated by professional activity and (3) transitions among quadrants as a project ages often coincides with interesting events in the evolution history of a project..
135. Daisuke Nakano, Akito Monden, Yasutaka Kamei, Kenichi Matsumoto, Simulation of effort allocation strategies in software testing using bug module, Computer Software, 31, 2, 118-128, 2014.05, To date, various techniques for predicting fault-prone modules have been proposed; however, test strategies, which assign a certain amount of test effort to each module, have been rarely studied. This paper proposes a simulation model of software testing that can evaluate various test strategies. The simulation model estimates the number of discoverable faults with respect to the given test resources, the test strategy, complexity metrics of a set of modules to be tested, and the fault prediction results. Based on a case study of simulation applying fault prediction to two open source software (Eclipse and Mylyn), we show the relationship between the available test effort and the effective test strategy..
136. Shane McIntosh, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan, The impact of code review coverage and code review participation on Software quality: A case study of the Qt, VTK, and ITK projects, 11th Working Conference on Mining Software Repositories, MSR 2014 - Proceedings, 10.1145/2597073.2597076, 192-201, 2014.05, Copyright 2014 ACM. Software code review, i.e., the practice of having third-party team members critique changes to a software system, is a well-established best practice in both open source and proprietary software domains. Prior work has shown that the formal code inspections of the past tend to improve the quality of software delivered by students and small teams. However, the formal code inspection process mandates strict review criteria (e.g., in-person meetings and reviewer checklists) to ensure a base level of review quality, while the modern, lightweight code reviewing process does not. Although recent work explores the modern code review process qualitatively, little research quantitatively explores the relationship between properties of the modern code review process and software quality. Hence, in this paper, we study the relationship between software quality and: (1) code review coverage, i.e., the proportion of changes that have been code reviewed, and (2) code review participation, i.e., the degree of reviewer involvement in the code review process. Through a case study of the Qt, VTK, and ITK projects, we find that both code review coverage and participation share a significant link with software quality. Low code review coverage and participation are estimated to produce components with up to two and five additional post-release defects respectively. Our results empirically confirm the intuition that poorly reviewed code has a negative impact on software quality in large systems using modern reviewing tools..
137. Di Ai, Naoyasu Ubayashi, Peiyuan Li, Daisuke Yamamoto, Yu Ning Li, Shintaro Hosoai, Yasutaka Kamei, iArch: An IDE for Supporting Fluid Abstraction, International Conference on Modularity'14, Tool Demo Session, 2014.04.
138. Changyun Huang, Naoyasu Ubayashi and Yasutaka Kamei, Towards Language-Oriented Software Development, International Workshop on Open and Original Problems in Software Language Engineering (OOPSLE 2014), 2014.02.
139. Di Ai, Naoyasu Ubayashi, Peiyuan Li, Shintaro Hosoai and Yasutaka Kamei, iArch - An IDE for Supporting Abstraction-aware Design Traceability, International Conference on Model-Driven Engineering and Software Development (MODELSWARD2014), Poster Session, 2014.01.
140. Emad Shihab, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan, Is Lines of Code a Good Measure of Effort in Effort-Aware Models?, Information and Software Technology, Vol.55, No.11, 2013.11.
141. Emad Shihab, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan, Is lines of code a good measure of effort in effort-aware models?, Information and Software Technology, 10.1016/j.infsof.2013.06.002, 55, 11, 1981-1993, 2013.11, Context Effort-aware models, e.g., effort-aware bug prediction models aim to help practitioners identify and prioritize buggy software locations according to the effort involved with fixing the bugs. Since the effort of current bugs is not yet known and the effort of past bugs is typically not explicitly recorded, effort-aware bug prediction models are forced to use approximations, such as the number of lines of code (LOC) of the predicted files. Objective Although the choice of these approximations is critical for the performance of the prediction models, there is no empirical evidence on whether LOC is actually a good approximation. Therefore, in this paper, we investigate the question: is LOC a good measure of effort for use in effort-aware models? Method We perform an empirical study on four open source projects, for which we obtain explicitly-recorded effort data, and compare the use of LOC to various complexity, size and churn metrics as measures of effort. Results We find that using a combination of complexity, size and churn metrics are a better measure of effort than using LOC alone. Furthermore, we examine the impact of our findings on previous effort-aware bug prediction work and find that using LOC as a measure for effort does not significantly affect the list of files being flagged, however, using LOC under-estimates the amount of effort required compared to our best effort predictor by approximately 66%. Conclusion Studies using effort-aware models should not assume that LOC is a good measure of effort. For the case of effort-aware bug prediction, using LOC provides results that are similar to combining complexity, churn, size and LOC as a proxy for effort when prioritizing the most risky files. However, we find that for the purpose of effort-estimation, using LOC may under-estimate the amount of effort required. © 2013 Elsevier B.V. All rights reserved..
142. Emad Shihab, Akinori Ihara, Yasutaka Kamei, Walid M. Ibrahim, Masao Ohira, Bram Adams, Ahmed E. Hassan and Ken-ichi Matsumoto, Studying Re-opened Bugs in Open Source Software, Journal of Empirical Software Engineering, Vol.18, No.5, pp.1005-1042, 2013.10.
143. Emad Shihab, Akinori Ihara, Yasutaka Kamei, Walid M. Ibrahim, Masao Ohira, Bram Adams, Ahmed E. Hassan, Ken Ichi Matsumoto, Studying re-opened bugs in open source software, Empirical Software Engineering, 10.1007/s10664-012-9228-6, 18, 5, 1005-1042, 2013.10, Bug fixing accounts for a large amount of the software maintenance resources. Generally, bugs are reported, fixed, verified and closed. However, in some cases bugs have to be re-opened. Re-opened bugs increase maintenance costs, degrade the overall user-perceived quality of the software and lead to unnecessary rework by busy practitioners. In this paper, we study and predict re-opened bugs through a case study on three large open source projects - namely Eclipse, Apache and OpenOffice. We structure our study along four dimensions: (1) the work habits dimension (e.g., the weekday on which the bug was initially closed), (2) the bug report dimension (e.g., the component in which the bug was found) (3) the bug fix dimension (e.g., the amount of time it took to perform the initial fix) and (4) the team dimension (e.g., the experience of the bug fixer). We build decision trees using the aforementioned factors that aim to predict re-opened bugs. We perform top node analysis to determine which factors are the most important indicators of whether or not a bug will be re-opened. Our study shows that the comment text and last status of the bug when it is initially closed are the most important factors related to whether or not a bug will be re-opened. Using a combination of these dimensions, we can build explainable prediction models that can achieve a precision between 52.1-78.6 % and a recall in the range of 70.5-94.1 % when predicting whether a bug will be re-opened. We find that the factors that best indicate which bugs might be re-opened vary based on the project. The comment text is the most important factor for the Eclipse and OpenOffice projects, while the last status is the most important one for Apache. These factors should be closely examined in order to reduce maintenance cost due to re-opened bugs. © 2012 Springer Science+Business Media, LLC..
144. 小林 寛武, 戸田 航史, 亀井 靖高, 門田 暁人, 峯 恒憲, 鵜林 尚靖, 11種類のfault密度予測モデルの実証的評価, 電子情報通信学会論文誌, Vol.J96-D, No.8, pp.1892-1902, 2013.08.
145. Changyun Huang, Yasutaka Kamei, Kazuhiro Yamashita and Naoyasu Ubayashi, Using Alloy to Support Feature-Based DSL Construction for Mining Software Repositories, International Workshop on Model-driven Approaches in Software Product Line Engineering and Workshop on Scalable Modeling Techniques for Software Product Lines (MAPLE/SCALE 2013), 2013.08.
146. 小林 寛武, 戸田 航史, 亀井 靖高, 門田 暁人, 峯 恒憲, 鵜林 尚靖, 11種類のfault密度予測モデルの実証的評価, 電子情報通信学会論文誌. D, 情報・システム = The IEICE transactions on information and systems (Japanese edition), 96, 8, 1892-1902, 2013.08, ソフトウェアテスト・保守において,限られたリソースで信頼性を確保することを目的として,モジュールに含まれるfaultの有無や数を予測するモデルが数多く提案されている.しかし,従来研究では,評価用のプロジェクト(データセット)の数が少ない,比較に用いたモデル数が少ない,モジュールの規模を考慮した評価を行っていない(fault密度を予測していない),モデル構築用と評価用のデータセットを同じサンプルから抽出している(バージョンをまたがる評価を行っていない),といった問題があり,いずれのfault予測モデルが優れているのか明らかでなかった.そこで本論文では,25のオープンソースプロジェクトからそれぞれ2バージョンずつ収集したデータセットを用いて,11種類のfault密度予測モデルの性能を実験的に評価した.実験の結果,樹木モデル(M5)が最も高い性能を有することが分かった..
147. Masateru Tsunoda, Kyohei Fushida, Yasutaka Kamei, Masahide Nakamura, Kohei Mitsui, Keita Goto, and Ken-ichi Matsumoto, An Authentication Method with Spatiotemporal Interval and Partial Matching, International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2013), 2013.07.
148. Tetsuya Oishi, Weiqiang Kong, Yasutaka Kamei, Norimichi Hiroshige, Naoyasu Ubayashi and Akira Fukuda, An Empirical Study on Remote Lectures Using Video Conferencing Systems, International Conference on Frontiers in Education: CS and CE (FECS 2013), 2013.07.
149. Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E. Hassan, Audris Mockus, Anand Sinha and Naoyasu Ubayashi, A Large-Scale Empirical Study of Just-In-Time Quality Assurance, IEEE Transactions on Software Engineering, Vol.39, No.6, pp.757-773, 2013.06.
150. Naoyasu Ubayashi and Yasutaka Kamei, Design Module: A Modularity Vision Beyond Code -Not Only Program Code But Also a Design Model Is a Module-, International Workshop on Modeling in Software Engineering (MiSE2013), 2013.05.
151. Changyun Huang, Kazuhiro Yamashita, Yasutaka Kamei, Kenji Hisazumi and Naoyasu Ubayashi, Domain Analysis for Mining Software Repositories -Towards Feature-based DSL Construction-, International Workshop on Product LinE Approaches in Software Engineering (PLEASE 2013), 2013.05.
152. Masateru Tsunoda, Koji Toda, Kyohei Fushida, Yasutaka Kamei, Meiyappan Nagappan and Naoyasu Ubayashi, Revisiting Software Development Effort Estimation Based on Early Phase Development Activities, International Working Conference on Mining Software Repositories (MSR 2013), 2013.05.
153. Tetsuya Oishi, Yasutaka Kamei, Weiqiang Kong, Norimichi Hiroshige, Naoyasu Ubayashi, Akira Fukuda, An Experience Report on Remote Lecture Using Multi-point Control Unit, International Conference on Education and Teaching (ICET 2013), pp.1-8, 2013.03.
154. Naoyasu Ubayashi and Yasutaka Kamei, UML-based Design and Verification Method for Developing Dependable Context-Aware Systems, International Conference on Model-Driven Engineering and Software Development (MODELSWARD 2013), pp.89-94, 2013.02.
155. Akito Monden, Jacky Keung, Shuji Morisaki, Yasutaka Kamei and Kenichi Matsumoto, A Heuristic Rule Reduction Approach to Software Fault-proneness Prediction, Asia-Pacific Software Engineering Conference (APSEC 2012), pp.838-847, 2012.12.
156. Akinori Ihara, Yasutaka Kamei, Akito Monden, Masao Ohira, Jacky Keung, Naoyasu Ubayashi and Kenichi Matsumoto, An Investigation on Software Bug Fix Prediction for Open Source Software Projects -A Case Study on the Eclipse Project-, International Workshop on Software Analysis, Testing and Applications (SATA2012), pp.112-119, 2012.12.
157. Phiradet Bangcharoensap, Akinori Ihara, Yasutaka Kamei, Ken-ichi Matsumoto, Locating Source Code to be Fixed based on Initial Bug Reports -A Case Study on the Eclipse Project, International Workshop on Empirical Software Engineering in Practice (IWESEP2012), pp.10-15, 2012.10.
158. Hiroki Nakamura, Rina Nagano, Kenji Hisazumi, Yasutaka Kamei, Naoyasu Ubayashi and Akira Fukuda, QORAL : External Domain-Specific Language for Mining Software Repositories., International Workshop on Empirical Software Engineering in Practice (IWESEP2012), pp.23-29, 2012.10.
159. Naoyasu Ubayashi and Yasutaka Kamei, UML4COP: UML-based DSML for Context-Aware Systems, International Workshop on Domain-Specific Modeling (DSM2012), pp.33-38, 2012.10.
160. 内尾 静, 鵜林 尚靖, 亀井 靖高, SMTソルバーを用いたコンテキスト指向プログラミングのためのデバッグ支援, コンピュータソフトウェア, Vol.29, No.3, pp.108-114, 2012.08.
161. Rina Nagano, Hiroki Nakamura, Yasutaka Kamei, Bram Adams, Kenji Hisazumi, Naoyasu Ubayashi and Akira Fukuda, Using the GPGPU for Scaling Up Mining Software Repositories, International Conference on Software Engineering (ICSE2012), Poster Session, pp.1435-1436, 2012.06.
162. Naoyasu Ubayashi, Yasutaka Kamei, Verifiable Architectural Interface for Supporting Model-Driven Development with Adequate Abstraction Level, International Workshop on Modeling in Software Engineering (MiSE2012), pp.15-21, 2012.06.
163. Naoyasu Ubayashi, Yasutaka Kamei, An Extensible Aspect-oriented Modeling Environment for Constructing Domain-Specific Languages, IEICE Transactions on Information and Systems, Vol.E95-D No.4 pp.942-958., 2012.04.
164. Naoyasu Ubayashi, Yasutaka Kamei, An extensible aspect-oriented modeling environment for constructing domain-specific languages, IEICE Transactions on Information and Systems, 10.1587/transinf.E95.D.942, E95-D, 4, 942-958, 2012.04, AspectM, an aspect oriented modeling (AOM) language, provides not only basic modeling constructs but also an extension mechanism called metamodel access protocol (MMAP) that allows a modeler to modify the metamodel. MMAP consists of metamodel extension points, extension operations, and primitive predicates for navigating the metamodel. Although the notion of MMAP is useful, it needs tool support. This paper proposes a method for implementing a MMAP based AspectM support tool. It consists of model editor, model weaver, and model verifier. We introduce the notion of edit-time structural reflection and extensible model weaving. Using these mechanisms, a modeler can easily construct domain-specific languages (DSLs). We show a case study using the AspectM support tool and discuss the effectiveness of the extension mechanism provided by MMAP. As a case study, we show a UML based DSL for describing the external contexts of embedded systems. Copyright © 2012 The Institute of Electronics, Information and Communication Engineers..
165. Naoyasu Ubayashi and Yasutaka Kamei, Architectural Point Mapping for Design Traceability, Foundations of Aspect-Oriented Languages workshop (FOAL2012), pp.39-44, 2012.03.
166. 塩塚 大, 鵜林 尚靖, 亀井 靖高, dcNavi: デバッグを支援する関心事指向推薦システム, 情報処理学会論文誌, Vol.53, No.2, pp.631-643., 2012.03.
167. 亀井 靖高, 大平 雅雄, 伊原 彰紀, 小山 貴和子, まつ本 真佑, 松本 健一, 鵜林 尚靖, グローバル環境下におけるOSS開発者の情報交換に対する時差の影響, 情報社会学会学会誌, Vol.6, No.2, pp.17-32., 2012.03.
168. 藏本 達也, 亀井 靖高, 門田 暁人, 松本 健一, ソフトウェア開発プロジェクトをまたがるfault-prone モジュール判別の試み ― 18 プロジェクトの実験から得た教訓, 電子情報通信学会論文誌, Vol.J95-D, No.3, pp.425-436., 2012.03.
169. 藏本 達也, 亀井 靖高, 門田 暁人, 松本 健一, ソフトウェア開発プロジェクトをまたがる fault-prone モジュール判別の試み : 18プロジェクトの実験から得た教訓, 電子情報通信学会論文誌. D, 情報・システム = The IEICE transactions on information and systems (Japanese edition), 95, 3, 425-436, 2012.03, ソフトウェアテスト・保守において,限られたリソースで信頼性を確保するために,faultの有無を推定するモデル(fault-proneモジュール判別モデル)が数多く提案されている.しかし,fault-proneモジュール判別モデルの構築には,同一プロジェクトの過去バージョンの開発で計測されたメトリックスと欠陥データが必要であり,開発データの計測・蓄積が行われていない企業や,新規開発プロジェクトでは導入が困難であった.そこで本論文では,他のプロジェクトのデータを利用してモデル構築・判別を行う上で有用と考えられる手法を明らかにするため,四つのリサーチクエスチョンを実験的に検証した.18個のプロジェクトデータを用いた実験を通して,(1)ランダムフォレストはプロジェクトをまたがる判別に効果を発揮する,(2)学習データに対する前処理(正規化)の効果はない,(3)データセット間に類似性が確認できるならば,高い精度での判別が期待できる.(4)複数のプロジェクトのデータを用いた集団学習の効果はある,といった教訓が得られた..
170. 伊原 彰紀, 亀井 靖高, 大平 雅雄, 松本 健一, 鵜林 尚靖, OSSプロジェクトにおける開発者の活動量を用いたコミッター候補者予測, 電子情報通信学会論文誌, Vol.J95-D, No.2, pp.237-249., 2012.02.
171. 伊原 彰紀, 亀井 靖高, 大平 雅雄, 松本 健一, 鵜林 尚靖, OSSプロジェクトにおける開発者の活動量を用いたコミッター候補者予測, 電子情報通信学会論文誌. D, 情報・システム = The IEICE transactions on information and systems (Japanese edition), 95, 2, 237-249, 2012.02, 本論文では,オープンソースソフトウェア(OSS)プロジェクトに参加する一般開発者の中からコミッターに推薦されるべき有能な開発者(コミッター候補者)を見つけ出すことを目的とする.近年,数百件もの不具合がプロジェクトに日々,報告されている現状から,コミッターへの過度な負担が原因となり不具合修正の長期化を招いている.コミッターの負担を軽減させるためにコミッターを増員するという手段があるが,プロジェクトに参加する一般開発者の中からコミッター候補者を見つけることは容易ではない.本論文では,コミッター候補者を見つけ出すために既存コミッターの過去の活動とその活動量を分析し,コミッター予測モデルを構築した.モデル構築には,コミッター候補者と一般開発者の活動(パッチの投稿,パッチの検証,開発に伴う議論)履歴とプロジェクトでの活動期間を用いた.分析の結果,継続的にパッチの投稿,パッチの検証を行う開発者がコミッターに昇格していることが分かった.また,構築したコミッター予測モデルは,ランダムに予測する場合に比べて予測精度が5~7倍高いことが分かった..
172. 塩塚 大, 鵜林尚靖, 亀井 靖高, dcNavi:デバッグを支援する関心事指向推薦システム, 情報処理学会論文誌, 53, 2, 631-643, 2012.02, プログラマはデバッグに多くの時間を費やす傾向がある.プログラマはエラーの現象をチェックしたり,様々なコード上の箇所を行き来したり,あるいはバグ修正履歴を探索したりして,コードを修正する.このような一連のデバッグ作業を自動化することができれば,他の生産的な作業に時間を割くことができる.本論文では,この問題に対処する方法として,デバッグ支援のための関心事指向推薦システムdcNavi(Debug Concern Navigator)を提案する.dcNaviでは,リポジトリに含まれるプログラム情報やテスト結果,および修正履歴を活用することで,デバッグの関心事に応じた適切なヒントを提供する.本論文では,Eclipseプラグインプロジェクトに関する9つのオープンソースリポジトリを対象に,再利用性の観点からバグ修正履歴に基づいた推薦の有効性についても評価する.Programmers tend to spend a lot of time debugging code. They check the erroneous phenomena, navigate the code, search the past bug fixes, and modify the code. If a sequence of these debug activities can be automated, programmers can use their time for more creative tasks. To deal with this problem, we propose dcNavi (Debug Concern Navigator), a concern-oriented recommendation system for debugging. The dcNavi provides appropriate hints to programmers according to their debug concerns by mining a repository containing not only program information but also test results and program modification history. In this paper, we evaluate the effectiveness of our approach in terms of the reusability of past bug fixes by using nine open source repositories created in the Eclipse plug-in projects..
173. 角田 雅照,伏田 享平,亀井 靖高,中村 匡秀,三井 康平,後藤 慶多,松本 健一, 時空間情報と動作に基づく認証方法, 知能と情報(日本知能情報ファジィ学会誌), Vol.23, No.6, pp.874-881., 2011.12.
174. Hidetake Uwano, Yasutaka Kamei, Akito Monden, Ken-Ichi Matsumoto, An Analysis of Cost-overrun Projects using Financial Data and Software Metrics, The Joint Conference of the 21th International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement (IWSM/MENSURA2011), pp.227-232, 2011.11.
175. Yasutaka Kamei, Hiroki Sato, Akito Monden, Shinji Kawaguchi, Hidetake Uwano, Masataka Nagura, Ken-Ichi Matsumoto, Naoyasu Ubayashi, An Empirical Study of Fault Prediction with Code Clone Metrics, The Joint Conference of the 21th International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement (IWSM/MENSURA2011), pp.55-61, 2011.11.
176. Ryosuke Nakashiro, Yasutaka Kamei, Naoyasu Ubayashi, Shin Nakajima, Akihito Iwai, Translation Pattern of BPEL Process into Promela Code, The Joint Conference of the 21th International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement (IWSM/MENSURA2011), pp.285-290, 2011.11.
177. Emad Shihab, Audris Mockus, Yasutaka Kamei, Bram Adams, Ahmed E. Hassan,, High-Impact Defects: A Study of Breakage and Surprise Defects, the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE2011), pp.300-310, 2011.09.
178. Naoyasu Ubayashi, Yasutaka Kamei, Masayuki Hirayama, Tetsuo Tamai, Context Analysis Method for Embedded Systems ---Exploring a Requirement Boundary between a System and Its Context, 3rd Workshop on Context-Oriented Programming (COP 2011), pp.143-152, 2011.08.
179. Shuji Morisaki, Yasutaka Kamei, and Ken-ichi Matsumoto, Experimental Evaluation of Effect of Specifying a Focused Defect Classification in Software Inspection, JSSST Journal, Vol.28, No.3, pp.173-178, 2011.08.
180. Shizuka Uchio, Naoyasu Ubayashi, Yasutaka Kamei, CJAdviser: SMT-based Debugging Support for ContextJ*, 3rd Workshop on Context-Oriented Programming (COP 2011), pp.1-6, 2011.07.
181. Masaru Shiozuka, Naoyasu Ubayashi, Yasutaka Kamei, Debug Concern Navigator, the 23rd International Conference on Software Engineering and Knowledge Engineering (SEKE 2011), pp.197-202, 2011.07.
182. Naoyasu Ubayashi, Yasutaka Kamei, Stepwise Context Boundary Exploration Using Guide Words, the 23rd International Conference on Advanced Information Systems Engineering (CAiSE 2011 Forum), pp.131-138., 2011.06.
183. Shane McIntosh, Bram Adams, Thanh H. D. Nguyen, Yasutaka Kamei and Ahmed E. Hassan, An Empirical Study of Build Maintenance Effort, the 33rd International Conference on Software Engineering (ICSE2011), pp.141-150, 2011.05.
184. [マツ]本 真佑, 亀井 靖高, 門田 暁人, 松本 健一, 開発者メトリックスに基づくソフトウェア信頼性の分析, 電子情報通信学会論文誌. D, 情報・システム = The IEICE transactions on information and systems (Japanese edition), 93, 8, 1576-1589, 2010.08, ソフトウェアの信頼性に影響を及ぼす要因として,ソフトウェアプロダクトの特徴から算出されたメトリックスを用いた信頼性の分析が数多く行われている.本論文ではプロダクトそのものの特性ではなく,プロダクトを作成した開発者の特性(開発者メトリックス)に基づいたソフトウェア信頼性の分析を行う.用いる開発者メトリックスは開発者ごとの変更行数やコミット回数などと,モジュールごとの開発にかかわった開発者の数などである.本論文の分析は以下の四つの仮説に基づく.仮説1a:バグの混入のさせやすさには個人差がある.仮説1b:バグの混入のさせやすさは開発者の特性(変更行数やコミット回数など)から判断できる.仮説2:多くの開発者が変更したモジュールにはバグが混入されやすい.仮説3:開発者メトリックスはfault-proneモジュールの判別に役立つ.Eclipseプロジェクトから収集したメトリックスデータを用いた分析の結果,すべての仮説が支持され,開発者のバグの混入のさせやすさには少なくとも5倍以上の個人差があること,及び多くの開発者が関与したモジュールほど,わずかではあるがバグが混入されやすいことが明らかとなった..
185. [マツ]本 真佑, 亀井 靖高, 門田 暁人, 松本 健一, 開発者メトリックスに基づくソフトウェア信頼性の分析, 電子情報通信学会論文誌. D, 情報・システム = The IEICE transactions on information and systems (Japanese edition), 93, 8, 1576-1589, 2010.08, ソフトウェアの信頼性に影響を及ぼす要因として,ソフトウェアプロダクトの特徴から算出されたメトリックスを用いた信頼性の分析が数多く行われている.本論文ではプロダクトそのものの特性ではなく,プロダクトを作成した開発者の特性(開発者メトリックス)に基づいたソフトウェア信頼性の分析を行う.用いる開発者メトリックスは開発者ごとの変更行数やコミット回数などと,モジュールごとの開発にかかわった開発者の数などである.本論文の分析は以下の四つの仮説に基づく.仮説1a:バグの混入のさせやすさには個人差がある.仮説1b:バグの混入のさせやすさは開発者の特性(変更行数やコミット回数など)から判断できる.仮説2:多くの開発者が変更したモジュールにはバグが混入されやすい.仮説3:開発者メトリックスはfault-proneモジュールの判別に役立つ.Eclipseプロジェクトから収集したメトリックスデータを用いた分析の結果,すべての仮説が支持され,開発者のバグの混入のさせやすさには少なくとも5倍以上の個人差があること,及び多くの開発者が関与したモジュールほど,わずかではあるがバグが混入されやすいことが明らかとなった..
186. 亀井 靖高, 左藤 裕紀, 門田 暁人, 川口 真司, 上野 秀剛, 名倉 正剛, 松本 健一, クローンメトリックスを用いた fault-prone モジュール判別の追実験, 電子情報通信学会論文誌. D, 情報・システム = The IEICE transactions on information and systems (Japanese edition), 93, 4, 544-547, 2010.04, 本論文では,馬場らによるクローンメトリックスを用いたfault-proneモジュール判別の追実験を行った.Eclipseプロジェクトより収集した3バージョン分(バージョン3.0,3.1,3.2)のモジュールデータを用いた実験の結果,先行研究とは異なり精度の向上は確認できなかった.本論文では,精度が向上しなかった要因を調べるためにクローンメトリックスとfaultの関係を分析した.分析結果から,クローンメトリックスは規模の小さいモジュールに対するfault-proneモジュール判別には効果がないものの,馬場らが対象とするようなある程度規模の大きいモジュールに対しては効果があることが示唆された..
187. 亀井 靖高, 左藤 裕紀, 門田 暁人, 川口 真司, 上野 秀剛, 名倉 正剛, 松本 健一, クローンメトリックスを用いた fault-prone モジュール判別の追実験, 電子情報通信学会論文誌. D, 情報・システム = The IEICE transactions on information and systems (Japanese edition), 93, 4, 544-547, 2010.04, 本論文では,馬場らによるクローンメトリックスを用いたfault-proneモジュール判別の追実験を行った.Eclipseプロジェクトより収集した3バージョン分(バージョン3.0,3.1,3.2)のモジュールデータを用いた実験の結果,先行研究とは異なり精度の向上は確認できなかった.本論文では,精度が向上しなかった要因を調べるためにクローンメトリックスとfaultの関係を分析した.分析結果から,クローンメトリックスは規模の小さいモジュールに対するfault-proneモジュール判別には効果がないものの,馬場らが対象とするようなある程度規模の大きいモジュールに対しては効果があることが示唆された..
188. 田村 晃一, 亀井 靖高, 上野 秀剛, 森崎 修司, 松本 健一, 修正確認テスト規模の低減を目的としたコードレビュー手法, 情報処理学会論文誌, 50, 12, 3074-3083, 2009.12, ソフトウェア開発におけるテスト工程での欠陥修正には,修正部分の確認,および修正による新たな欠陥の混入がないことを確認するテストの両方が必要となるケースが多い.本論文では,欠陥の修正にともなって必要となる修正部分の確認ならびに再テスト規模の低減を目的としたコードレビュー手法を提案する.提案手法では,テスト規模が想定できる情報をレビューアに与えることにより,潜在的に修正確認テスト規模が大きくなる欠陥を予想しながら優先的に検出する.商用開発の実務経験者6名を含む18名の被験者の間で,提案手法とTest Case Based Reading(TCBR),Ad-Hoc Reading(AHR)を比較したところ,TCBRと比較して平均2.1倍,AHRと比較して平均1.9倍の修正確認テスト規模の削減が確認できた..
189. 柿元 健, 門田 暁人, 亀井 靖高, 柗本 真佑, 松本 健一, 楠本 真二, Fault-proneモジュール判別におけるテスト工数割当てとソフトウェア信頼性のモデル化, 情報処理学会論文誌, 50, 7, 1716-1724, 2009.07, ソフトウェアの信頼性確保を目的として,faultの有無を推定するモデル(faultproneモジュール判別モデル)が数多く提案されている.しかし,どのようにテスト工数を割り当てるのかといったfault-proneモジュール判別モデルの判別結果の利用方法についての議論はほとんどされておらず,信頼性確保の効果は不明確であった.そこで,本論文では,faultの有無の判別,テスト工数の割当て,ソフトウェア信頼性の関係のモデル化を行い,TEAR(Test Effort Allocation and software Reliability)モデルを提案する.TEARモデルにより,与えられた総テスト工数の枠内で,ソフトウェア信頼性が最大となるような(モジュールごとの)テスト工数割当ての計画立案が可能となる.TEARモデルを用いてシミュレーションを行った結果,推定される判別精度が高い,もしくは,fault含有モジュールが少ない場合には,fault-proneモジュールに多くのテスト工数を割り当てた方がよく,推定される判別精度が低い,もしくは,fault含有モジュールを多く含む場合には,判別結果に基づいてテスト工数を割り当てるべきではないことが分かった.; Various fault-prone detection models have been proposed to improve software reliability. However, while improvement of prediction accuracy was discussed, there was few discussion about how the models shuld be used in the field, i.e. how test effort should be allocated. Thus, improvement of software reliability by fault-prone module detection was not clear. In this paper, we proposed TEAR (Test Effort Allocation and software Reliability) model that represents the relationship among fault-prone detection, test effort allocation and software reliability. The result of simulations based on TEAR model showed that greater test effort should be allocated for fault-prone modules when prediction accuracy was high and/or when the number of faulty modules were small. On the other hand, fault-prone module detection should not be use when prediction accuracy was small or the number of faulty modules were large..
190. 柿元 健, 門田 暁人, 亀井 靖高, 柗本 真佑, 松本 健一, 楠本 真二, Fault-proneモジュール判別におけるテスト工数割当てとソフトウェア信頼性のモデル化, 情報処理学会論文誌, 50, 7, 1716-1724, 2009.07, ソフトウェアの信頼性確保を目的として,faultの有無を推定するモデル(faultproneモジュール判別モデル)が数多く提案されている.しかし,どのようにテスト工数を割り当てるのかといったfault-proneモジュール判別モデルの判別結果の利用方法についての議論はほとんどされておらず,信頼性確保の効果は不明確であった.そこで,本論文では,faultの有無の判別,テスト工数の割当て,ソフトウェア信頼性の関係のモデル化を行い,TEAR(Test Effort Allocation and software Reliability)モデルを提案する.TEARモデルにより,与えられた総テスト工数の枠内で,ソフトウェア信頼性が最大となるような(モジュールごとの)テスト工数割当ての計画立案が可能となる.TEARモデルを用いてシミュレーションを行った結果,推定される判別精度が高い,もしくは,fault含有モジュールが少ない場合には,fault-proneモジュールに多くのテスト工数を割り当てた方がよく,推定される判別精度が低い,もしくは,fault含有モジュールを多く含む場合には,判別結果に基づいてテスト工数を割り当てるべきではないことが分かった.; Various fault-prone detection models have been proposed to improve software reliability. However, while improvement of prediction accuracy was discussed, there was few discussion about how the models shuld be used in the field, i.e. how test effort should be allocated. Thus, improvement of software reliability by fault-prone module detection was not clear. In this paper, we proposed TEAR (Test Effort Allocation and software Reliability) model that represents the relationship among fault-prone detection, test effort allocation and software reliability. The result of simulations based on TEAR model showed that greater test effort should be allocated for fault-prone modules when prediction accuracy was high and/or when the number of faulty modules were small. On the other hand, fault-prone module detection should not be use when prediction accuracy was small or the number of faulty modules were large..
191. 亀井 靖高, 角田 雅照, 柿元 健, 大杉 直樹, 門田 暁人, 松本 健一, ソフトウェアコンポーネント推薦における協調フィルタリングの効果, 情報処理学会論文誌, 50, 3, 1139-1143, 2009.03, 本論文では,汎用性の高い/低いコンポーネントに対する,協調フィルタリングを用いたコンポーネント推薦(亀井ら,2006年)の効果を明らかにするために,2つの仮説,(1) 汎用性の高いコンポーネントに対する推薦精度が従来手法(ランダム推薦法,平均値に基づく推薦法)より高い,(2) 汎用性の低いコンポーネントに対する推薦精度が従来手法より高い,を実験的に検証した.推薦精度を調べるために,29件のプロジェクト(使用されているコンポーネントの総数は2,558個)を用いて評価実験を行った.その結果,仮説(2) が支持され,協調フィルタリングを用いたコンポーネント推薦は汎用性の低いコンポーネントに対して特に効果がある(NDPMの中央値が0.55から0.33へ改善した)ことが分かった.; To clarify the effect of collaborative filtering (CF) on recommending highgenerality / low-generality software components, we experimentally verified two hypotheses; (1) the recommendation accuracy of CF for high-generality components is better than that of conventional methods (random algorithm and user average algorithm) and (2) the recommendation accuracy of CF for lowgenerality components is better than that of the conventional methods. We evaluated recommendation accuracy of CF with a dataset containing 29 open source software development projects (including 2,558 used components). As a result, the hypothesis (2) was supported, and the recommendation accuracy of CF showed better performance than the conventional methods and the median of NDPM was improved from 0.55 to 0.33 for low-generality components..

九大関連コンテンツ

pure2017年10月2日から、「九州大学研究者情報」を補完するデータベースとして、Elsevier社の「Pure」による研究業績の公開を開始しました。