coupling_similarity function

Calculating the Coupling Similarity Measure for Edges

Calculating the Coupling Similarity Measure for Edges

This function calculates a refined similarity measure of coupling links, from a direct citation data frame. It is sinpired by if(!exists(".Rdpack.currefs")) .Rdpack.currefs <-new.env();Rdpack::insert_citeOnly(keys="shen2019",package="biblionetwork",cached_env=.Rdpack.currefs) . To a certain extent, it mixes the coupling_strength() function with the cosine measure of the biblio_coupling() function.

coupling_similarity( dt, source, ref, weight_threshold = 1, output_in_character = TRUE )

Arguments

  • dt: The table with citing and cited documents.
  • source: The column name of the source identifiers, that is the documents that are citing. In bibliographic coupling, these documents are the nodes of the network.
  • ref: The column name of the references that are cited.
  • weight_threshold: Corresponds to the value of the non-normalized weights of edges. The function just keeps the edges that have a non-normalized weight superior to the weight_threshold. In other words, if you set the parameter to 2, the function keeps only the edges between nodes that share at least two references in common in their bibliography. In a large bibliographic coupling network, you can consider for instance that sharing only one reference is not sufficient/significant for two articles to be linked together. This parameter could also be modified to avoid creating intractable networks with too many edges.
  • output_in_character: If TRUE, the function ends by transforming the from and to columns in character, to make the creation of a tidygraph network easier.

Returns

A data.table with the articles identifiers in from and to columns, with the similarity measure in another column. It also keeps a copy of from and to in the Source and Target columns. This is useful is you are using the tidygraph package then, where from and to values are modified when creating a graph.

Details

The function use the following formalisation:

RS(A)RS(B)RS(A).RS(B) \frac{R_{S}(A) \bullet R_{S}(B)}{\sqrt{R_{S}(A).R_{S}(B)}}
  1. with
RS(A)RS(B)=jlog(Nfreq(Rj)) R_{S}(A) \bullet R_{S}(B) = \sum_{j}\sqrt{log({\frac{N}{freq(R_{j})}})}

that is a measure similar to the coupling strength measure; 2. and

RS(A).RS(B)=jlog(Nfreq(Rj(A))).jlog(Nfreq(Rj(B))) R_{S}(A).R_{S}(B) = \sum_{j}\sqrt{log({\frac{N}{freq(R_{j}(A))}})} . \sum_{j}\sqrt{log({\frac{N}{freq(R_{j}(B))}})}

which is the separated sum for each article of the normalized value of a citation. It is the cosine measure of documents A and B but adapted to the spirit of the coupling strength.

Examples

library(biblionetwork) coupling_similarity(Ref_stagflation, source = "Citing_ItemID_Ref", ref = "ItemID_Ref")

References

if(!exists(".Rdpack.currefs")) .Rdpack.currefs <-new.env();Rdpack::insert_all_ref(.Rdpack.currefs)