ge_cluster function

Cluster genotypes or environments

Cluster genotypes or environments

Performs clustering for genotypes or tester environments based on a dissimilarity matrix.

ge_cluster( .data, env = NULL, gen = NULL, resp = NULL, table = FALSE, distmethod = "euclidean", clustmethod = "ward.D", scale = TRUE, cluster = "env", nclust = NULL )

Arguments

  • .data: The dataset containing the columns related to Environments, Genotypes and the response variable. It is also possible to use a two-way table with genotypes in lines and environments in columns as input. In this case you must use table = TRUE.
  • env: The name of the column that contains the levels of the environments. Defaults to NULL, in case of the input data is a two-way table.
  • gen: The name of the column that contains the levels of the genotypes. Defaults to NULL, in case of the input data is a two-way table.
  • resp: The response variable(s). Defaults to NULL, in case of the input data is a two-way table.
  • table: Logical values indicating if the input data is a two-way table with genotypes in the rows and environments in the columns. Defaults to FALSE.
  • distmethod: The distance measure to be used. This must be one of 'euclidean', 'maximum', 'manhattan', 'canberra', 'binary', or 'minkowski'.
  • clustmethod: The agglomeration method to be used. This should be one of 'ward.D' (Default), 'ward.D2', 'single', 'complete', 'average' (= UPGMA), 'mcquitty' (= WPGMA), 'median' (= WPGMC) or 'centroid' (= UPGMC).
  • scale: Should the data be scaled befor computing the distances? Set to TRUE. Let YijY_{ij} be the yield of Hybrid i in Location j, Yˉ.j\bar Y_{.j} be the mean yield, and SjS_j be the standard deviation of Location j. The standardized yield (Zij) is computed as (Ouyang et al. 1995): Zij=(YijY.j)/SjZ_{ij} = (Y_{ij} - Y_{.j}) / S_j.
  • cluster: What should be clustered? Defaults to cluster = "env" (cluster environments). To cluster the genotypes use cluster = "gen".
  • nclust: The number of clust to be formed. Set to NULL.

Returns

  • data The data that was used to compute the distances.
  • cutpoint The cutpoint of the dendrogram according to Mojena (1977).
  • distance The matrix with the distances.
  • de The distances in an object of class dist.
  • hc The hierarchical clustering.
  • cophenetic The cophenetic correlation coefficient between distance matrix and cophenetic matrix
  • Sqt The total sum of squares.
  • tab A table with the clusters and similarity.
  • clusters The sum of square and the mean of the clusters for each genotype (if cluster = "env" or environment (if cluster = "gen").
  • labclust The labels of genotypes/environments within each cluster.

Examples

library(metan) d1 <- ge_cluster(data_ge, ENV, GEN, GY, nclust = 3) plot(d1, nclust = 3)

References

Mojena, R. 2015. Hierarchical grouping methods and stopping rules: an evaluation. Comput. J. 20:359-363. tools:::Rd_expr_doi("10.1093/comjnl/20.4.359")

Ouyang, Z., R.P. Mowers, A. Jensen, S. Wang, and S. Zheng. 1995. Cluster analysis for genotype x environment interaction with unbalanced data. Crop Sci. 35:1300-1305. tools:::Rd_expr_doi("10.2135/cropsci1995.0011183X003500050008x")

Author(s)

Tiago Olivoto tiagoolivoto@gmail.com