maven dataset

An example dataset of Breiman's variable importance scores

An example dataset of Breiman's variable importance scores

A dataset containing software metrics of 1,000 calculation of Breiman's variable importance scores data

Format

A data frame with 1,000 rows and 27 variables:

  • Avg_CloneLineCount: An average physical lines of clone siblings of a clone.
  • Avg_CountLineComment: An average comment lines in the methods that contain clone siblings of a clone.
  • Avg_Cyclomatic: McCabe Cyclomatic complexity of the method that contains the clone.
  • Avg_ImproveCommitCount: Number of commits that impact the method containing the clone.
  • Avg_LineAdded: Number of lines added into the method that contains the clone.
  • Avg_LineCodeCount: Number of source code lines in the method that contains the clone.
  • Avg_MaxNesting: Maximum nesting level of control constructs in the method that contains the clone.
  • Avg_NewFeatureCommitCount: Number of commits that introduce new feature and that impact the method containing the clone.
  • Avg_RatioCommentToCode: Ratio of CommentLineCount to LineCodeCount.
  • Avg_RatioLineCodeCount: Ratio of LineCount to CloneLineCount.
  • Avg_TokenCount: Number of tokens in the clone.
  • CloneType: Type of clone class to which the clone belongs.
  • Diff_CloneLineCount: Number of physical lines in the clone.
  • Diff_CountLineComment: Number of comment lines in the method that contains the clone.
  • Diff_Cyclomatic: McCabe Cyclomatic complexity of the method that contains the clone.
  • Diff_DeveloperCount: Number of distinct developers who modified the method that contains the clone.
  • Diff_Essential: Numberical measure of structuredness of the method that contains the clone.
  • Diff_FanIn: Number of unique methods that call the method containg the clone.
  • Diff_FanOut: Number of unique methods that are called by the method containing the clone.
  • Diff_FixCommitCount: Number of commits with a description of fixing bugs and that impact the method containing the clone.
  • Diff_LineCodeDeclCount: Number of declarative source code lines in the method that contains the clone.
  • Diff_LineCount: Number of lines in the method that contains the clone.
  • Diff_LineDeleted: Number of lines deleted from the method that contains the clone.
  • Diff_NewFeatureCommitCount: Number of commits that introduce new feature and that impact the method containing the clone.
  • Diff_TokenCount: Number of tokens in the clone.
  • Max_DirectoryDistance: Number of directories that are traversed from the method containing one sibling to the method containing another sibling of the clone.
  • SiblingCount: Number of clone siblings in the clone.

Source

https://github.com/klainfo/ScottKnottESD/

maven
  • Maintainer: Chakkrit Tantithamthavorn
  • License: GPL (>= 2)
  • Last published: 2018-05-08

About the dataset

  • Number of rows: 1000
  • Number of columns: 27
  • Class: data.frame

Column names and types (First 10)

  • Avg_CloneLineCount:numeric
  • Avg_CountLineComment:numeric
  • Avg_Cyclomatic:numeric
  • Avg_ImproveCommitCount:numeric
  • Avg_LineAdded:numeric
  • Avg_LineCodeCount:numeric
  • Avg_MaxNesting:numeric
  • Avg_NewFeatureCommitCount:numeric
  • Avg_RatioCommentToCode:numeric
  • Avg_RatioLineCodeCount:numeric