glue function

AWS Glue

AWS Glue

Glue

Defines the public endpoint for the Glue service.

glue(config = list(), credentials = list(), endpoint = NULL, region = NULL)

Arguments

  • config: Optional configuration of credentials, endpoint, and/or region.

    • credentials :

      • creds :

        • access_key_id : AWS access key ID
        • secret_access_key : AWS secret access key
        • session_token : AWS temporary session token
      • profile : The name of a profile to use. If not given, then the default profile is used.

      • anonymous : Set anonymous credentials.

    • endpoint : The complete URL to use for the constructed client.

    • region : The AWS Region used in instantiating the client.

    • close_connection : Immediately close all HTTP connections.

    • timeout : The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds.

    • s3_force_path_style : Set this to true to force the request to use path-style addressing, i.e. http://s3.amazonaws.com/BUCKET/KEY.

    • sts_regional_endpoint : Set sts regional endpoint resolver to regional or legacy https://docs.aws.amazon.com/sdkref/latest/guide/feature-sts-regionalized-endpoints.html

  • credentials: Optional credentials shorthand for the config parameter

    • creds :

      • access_key_id : AWS access key ID
      • secret_access_key : AWS secret access key
      • session_token : AWS temporary session token
    • profile : The name of a profile to use. If not given, then the default profile is used.

    • anonymous : Set anonymous credentials.

  • endpoint: Optional shorthand for complete URL to use for the constructed client.

  • region: Optional shorthand for AWS Region used in instantiating the client.

Returns

A client for the service. You can call the service's operations using syntax like svc$operation(...), where svc is the name you've assigned to the client. The available operations are listed in the Operations section.

Service syntax

svc <- glue(
  config = list(
    credentials = list(
 creds = list(
   access_key_id = "string",
   secret_access_key = "string",
   session_token = "string"
 ),
 profile = "string",
 anonymous = "logical"
    ),
    endpoint = "string",
    region = "string",
    close_connection = "logical",
    timeout = "numeric",
    s3_force_path_style = "logical",
    sts_regional_endpoint = "string"
  ),
  credentials = list(
    creds = list(
 access_key_id = "string",
 secret_access_key = "string",
 session_token = "string"
    ),
    profile = "string",
    anonymous = "logical"
  ),
  endpoint = "string",
  region = "string"
)

Operations

batch_create_partitionCreates one or more partitions in a batch operation
batch_delete_connectionDeletes a list of connection definitions from the Data Catalog
batch_delete_partitionDeletes one or more partitions in a batch operation
batch_delete_tableDeletes multiple tables at once
batch_delete_table_versionDeletes a specified batch of versions of a table
batch_get_blueprintsRetrieves information about a list of blueprints
batch_get_crawlersReturns a list of resource metadata for a given list of crawler names
batch_get_custom_entity_typesRetrieves the details for the custom patterns specified by a list of names
batch_get_data_quality_resultRetrieves a list of data quality results for the specified result IDs
batch_get_dev_endpointsReturns a list of resource metadata for a given list of development endpoint names
batch_get_jobsReturns a list of resource metadata for a given list of job names
batch_get_partitionRetrieves partitions in a batch request
batch_get_table_optimizerReturns the configuration for the specified table optimizers
batch_get_triggersReturns a list of resource metadata for a given list of trigger names
batch_get_workflowsReturns a list of resource metadata for a given list of workflow names
batch_put_data_quality_statistic_annotationAnnotate datapoints over time for a specific data quality statistic
batch_stop_job_runStops one or more job runs for a specified job definition
batch_update_partitionUpdates one or more partitions in a batch operation
cancel_data_quality_rule_recommendation_runCancels the specified recommendation run that was being used to generate rules
cancel_data_quality_ruleset_evaluation_runCancels a run where a ruleset is being evaluated against a data source
cancel_ml_task_runCancels (stops) a task run
cancel_statementCancels the statement
check_schema_version_validityValidates the supplied schema
create_blueprintRegisters a blueprint with Glue
create_catalogCreates a new catalog in the Glue Data Catalog
create_classifierCreates a classifier in the user's account
create_column_statistics_task_settingsCreates settings for a column statistics task
create_connectionCreates a connection definition in the Data Catalog
create_crawlerCreates a new crawler with specified targets, role, configuration, and optional schedule
create_custom_entity_typeCreates a custom pattern that is used to detect sensitive data across the columns and rows of your structured data
create_databaseCreates a new database in a Data Catalog
create_data_quality_rulesetCreates a data quality ruleset with DQDL rules applied to a specified Glue table
create_dev_endpointCreates a new development endpoint
create_integrationCreates a Zero-ETL integration in the caller's account between two resources with Amazon Resource Names (ARNs): the SourceArn and TargetArn
create_integration_resource_propertyThis API can be used for setting up the ResourceProperty of the Glue connection (for the source) or Glue database ARN (for the target)
create_integration_table_propertiesThis API is used to provide optional override properties for the the tables that need to be replicated
create_jobCreates a new job definition
create_ml_transformCreates an Glue machine learning transform
create_partitionCreates a new partition
create_partition_indexCreates a specified partition index in an existing table
create_registryCreates a new registry which may be used to hold a collection of schemas
create_schemaCreates a new schema set and registers the schema definition
create_scriptTransforms a directed acyclic graph (DAG) into code
create_security_configurationCreates a new security configuration
create_sessionCreates a new session
create_tableCreates a new table definition in the Data Catalog
create_table_optimizerCreates a new table optimizer for a specific function
create_triggerCreates a new trigger
create_usage_profileCreates an Glue usage profile
create_user_defined_functionCreates a new function definition in the Data Catalog
create_workflowCreates a new workflow
delete_blueprintDeletes an existing blueprint
delete_catalogRemoves the specified catalog from the Glue Data Catalog
delete_classifierRemoves a classifier from the Data Catalog
delete_column_statistics_for_partitionDelete the partition column statistics of a column
delete_column_statistics_for_tableRetrieves table statistics of columns
delete_column_statistics_task_settingsDeletes settings for a column statistics task
delete_connectionDeletes a connection from the Data Catalog
delete_crawlerRemoves a specified crawler from the Glue Data Catalog, unless the crawler state is RUNNING
delete_custom_entity_typeDeletes a custom pattern by specifying its name
delete_databaseRemoves a specified database from a Data Catalog
delete_data_quality_rulesetDeletes a data quality ruleset
delete_dev_endpointDeletes a specified development endpoint
delete_integrationDeletes the specified Zero-ETL integration
delete_integration_table_propertiesDeletes the table properties that have been created for the tables that need to be replicated
delete_jobDeletes a specified job definition
delete_ml_transformDeletes an Glue machine learning transform
delete_partitionDeletes a specified partition
delete_partition_indexDeletes a specified partition index from an existing table
delete_registryDelete the entire registry including schema and all of its versions
delete_resource_policyDeletes a specified policy
delete_schemaDeletes the entire schema set, including the schema set and all of its versions
delete_schema_versionsRemove versions from the specified schema
delete_security_configurationDeletes a specified security configuration
delete_sessionDeletes the session
delete_tableRemoves a table definition from the Data Catalog
delete_table_optimizerDeletes an optimizer and all associated metadata for a table
delete_table_versionDeletes a specified version of a table
delete_triggerDeletes a specified trigger
delete_usage_profileDeletes the Glue specified usage profile
delete_user_defined_functionDeletes an existing function definition from the Data Catalog
delete_workflowDeletes a workflow
describe_connection_typeThe DescribeConnectionType API provides full details of the supported options for a given connection type in Glue
describe_entityProvides details regarding the entity used with the connection type, with a description of the data model for each field in the selected entity
describe_inbound_integrationsReturns a list of inbound integrations for the specified integration
describe_integrationsThe API is used to retrieve a list of integrations
get_blueprintRetrieves the details of a blueprint
get_blueprint_runRetrieves the details of a blueprint run
get_blueprint_runsRetrieves the details of blueprint runs for a specified blueprint
get_catalogThe name of the Catalog to retrieve
get_catalog_import_statusRetrieves the status of a migration operation
get_catalogsRetrieves all catalogs defined in a catalog in the Glue Data Catalog
get_classifierRetrieve a classifier by name
get_classifiersLists all classifier objects in the Data Catalog
get_column_statistics_for_partitionRetrieves partition statistics of columns
get_column_statistics_for_tableRetrieves table statistics of columns
get_column_statistics_task_runGet the associated metadata/information for a task run, given a task run ID
get_column_statistics_task_runsRetrieves information about all runs associated with the specified table
get_column_statistics_task_settingsGets settings for a column statistics task
get_connectionRetrieves a connection definition from the Data Catalog
get_connectionsRetrieves a list of connection definitions from the Data Catalog
get_crawlerRetrieves metadata for a specified crawler
get_crawler_metricsRetrieves metrics about specified crawlers
get_crawlersRetrieves metadata for all crawlers defined in the customer account
get_custom_entity_typeRetrieves the details of a custom pattern by specifying its name
get_databaseRetrieves the definition of a specified database
get_databasesRetrieves all databases defined in a given Data Catalog
get_data_catalog_encryption_settingsRetrieves the security configuration for a specified catalog
get_dataflow_graphTransforms a Python script into a directed acyclic graph (DAG)
get_data_quality_modelRetrieve the training status of the model along with more information (CompletedOn, StartedOn, FailureReason)
get_data_quality_model_resultRetrieve a statistic's predictions for a given Profile ID
get_data_quality_resultRetrieves the result of a data quality rule evaluation
get_data_quality_rule_recommendation_runGets the specified recommendation run that was used to generate rules
get_data_quality_rulesetReturns an existing ruleset by identifier or name
get_data_quality_ruleset_evaluation_runRetrieves a specific run where a ruleset is evaluated against a data source
get_dev_endpointRetrieves information about a specified development endpoint
get_dev_endpointsRetrieves all the development endpoints in this Amazon Web Services account
get_entity_recordsThis API is used to query preview data from a given connection type or from a native Amazon S3 based Glue Data Catalog
get_integration_resource_propertyThis API is used for fetching the ResourceProperty of the Glue connection (for the source) or Glue database ARN (for the target)
get_integration_table_propertiesThis API is used to retrieve optional override properties for the tables that need to be replicated
get_jobRetrieves an existing job definition
get_job_bookmarkReturns information on a job bookmark entry
get_job_runRetrieves the metadata for a given job run
get_job_runsRetrieves metadata for all runs of a given job definition
get_jobsRetrieves all current job definitions
get_mappingCreates mappings
get_ml_task_runGets details for a specific task run on a machine learning transform
get_ml_task_runsGets a list of runs for a machine learning transform
get_ml_transformGets an Glue machine learning transform artifact and all its corresponding metadata
get_ml_transformsGets a sortable, filterable list of existing Glue machine learning transforms
get_partitionRetrieves information about a specified partition
get_partition_indexesRetrieves the partition indexes associated with a table
get_partitionsRetrieves information about the partitions in a table
get_planGets code to perform a specified mapping
get_registryDescribes the specified registry in detail
get_resource_policiesRetrieves the resource policies set on individual resources by Resource Access Manager during cross-account permission grants
get_resource_policyRetrieves a specified resource policy
get_schemaDescribes the specified schema in detail
get_schema_by_definitionRetrieves a schema by the SchemaDefinition
get_schema_versionGet the specified schema by its unique ID assigned when a version of the schema is created or registered
get_schema_versions_diffFetches the schema version difference in the specified difference type between two stored schema versions in the Schema Registry
get_security_configurationRetrieves a specified security configuration
get_security_configurationsRetrieves a list of all security configurations
get_sessionRetrieves the session
get_statementRetrieves the statement
get_tableRetrieves the Table definition in a Data Catalog for a specified table
get_table_optimizerReturns the configuration of all optimizers associated with a specified table
get_tablesRetrieves the definitions of some or all of the tables in a given Database
get_table_versionRetrieves a specified version of a table
get_table_versionsRetrieves a list of strings that identify available versions of a specified table
get_tagsRetrieves a list of tags associated with a resource
get_triggerRetrieves the definition of a trigger
get_triggersGets all the triggers associated with a job
get_unfiltered_partition_metadataRetrieves partition metadata from the Data Catalog that contains unfiltered metadata
get_unfiltered_partitions_metadataRetrieves partition metadata from the Data Catalog that contains unfiltered metadata
get_unfiltered_table_metadataAllows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog
get_usage_profileRetrieves information about the specified Glue usage profile
get_user_defined_functionRetrieves a specified function definition from the Data Catalog
get_user_defined_functionsRetrieves multiple function definitions from the Data Catalog
get_workflowRetrieves resource metadata for a workflow
get_workflow_runRetrieves the metadata for a given workflow run
get_workflow_run_propertiesRetrieves the workflow run properties which were set during the run
get_workflow_runsRetrieves metadata for all runs of a given workflow
import_catalog_to_glueImports an existing Amazon Athena Data Catalog to Glue
list_blueprintsLists all the blueprint names in an account
list_column_statistics_task_runsList all task runs for a particular account
list_connection_typesThe ListConnectionTypes API provides a discovery mechanism to learn available connection types in Glue
list_crawlersRetrieves the names of all crawler resources in this Amazon Web Services account, or the resources with the specified tag
list_crawlsReturns all the crawls of a specified crawler
list_custom_entity_typesLists all the custom patterns that have been created
list_data_quality_resultsReturns all data quality execution results for your account
list_data_quality_rule_recommendation_runsLists the recommendation runs meeting the filter criteria
list_data_quality_ruleset_evaluation_runsLists all the runs meeting the filter criteria, where a ruleset is evaluated against a data source
list_data_quality_rulesetsReturns a paginated list of rulesets for the specified list of Glue tables
list_data_quality_statistic_annotationsRetrieve annotations for a data quality statistic
list_data_quality_statisticsRetrieves a list of data quality statistics
list_dev_endpointsRetrieves the names of all DevEndpoint resources in this Amazon Web Services account, or the resources with the specified tag
list_entitiesReturns the available entities supported by the connection type
list_jobsRetrieves the names of all job resources in this Amazon Web Services account, or the resources with the specified tag
list_ml_transformsRetrieves a sortable, filterable list of existing Glue machine learning transforms in this Amazon Web Services account, or the resources with the specified tag
list_registriesReturns a list of registries that you have created, with minimal registry information
list_schemasReturns a list of schemas with minimal details
list_schema_versionsReturns a list of schema versions that you have created, with minimal information
list_sessionsRetrieve a list of sessions
list_statementsLists statements for the session
list_table_optimizer_runsLists the history of previous optimizer runs for a specific table
list_triggersRetrieves the names of all trigger resources in this Amazon Web Services account, or the resources with the specified tag
list_usage_profilesList all the Glue usage profiles
list_workflowsLists names of workflows created in the account
modify_integrationModifies a Zero-ETL integration in the caller's account
put_data_catalog_encryption_settingsSets the security configuration for a specified catalog
put_data_quality_profile_annotationAnnotate all datapoints for a Profile
put_resource_policySets the Data Catalog resource policy for access control
put_schema_version_metadataPuts the metadata key value pair for a specified schema version ID
put_workflow_run_propertiesPuts the specified workflow run properties for the given workflow run
query_schema_version_metadataQueries for the schema version metadata information
register_schema_versionAdds a new version to the existing schema
remove_schema_version_metadataRemoves a key value pair from the schema version metadata for the specified schema version ID
reset_job_bookmarkResets a bookmark entry
resume_workflow_runRestarts selected nodes of a previous partially completed workflow run and resumes the workflow run
run_statementExecutes the statement
search_tablesSearches a set of tables based on properties in the table metadata as well as on the parent database
start_blueprint_runStarts a new run of the specified blueprint
start_column_statistics_task_runStarts a column statistics task run, for a specified table and columns
start_column_statistics_task_run_scheduleStarts a column statistics task run schedule
start_crawlerStarts a crawl using the specified crawler, regardless of what is scheduled
start_crawler_scheduleChanges the schedule state of the specified crawler to SCHEDULED, unless the crawler is already running or the schedule state is already SCHEDULED
start_data_quality_rule_recommendation_runStarts a recommendation run that is used to generate rules when you don't know what rules to write
start_data_quality_ruleset_evaluation_runOnce you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table)
start_export_labels_task_runBegins an asynchronous task to export all labeled data for a particular transform
start_import_labels_task_runEnables you to provide additional labels (examples of truth) to be used to teach the machine learning transform and improve its quality
start_job_runStarts a job run using a job definition
start_ml_evaluation_task_runStarts a task to estimate the quality of the transform
start_ml_labeling_set_generation_task_runStarts the active learning workflow for your machine learning transform to improve the transform's quality by generating label sets and adding labels
start_triggerStarts an existing trigger
start_workflow_runStarts a new run of the specified workflow
stop_column_statistics_task_runStops a task run for the specified table
stop_column_statistics_task_run_scheduleStops a column statistics task run schedule
stop_crawlerIf the specified crawler is running, stops the crawl
stop_crawler_scheduleSets the schedule state of the specified crawler to NOT_SCHEDULED, but does not stop the crawler if it is already running
stop_sessionStops the session
stop_triggerStops a specified trigger
stop_workflow_runStops the execution of the specified workflow run
tag_resourceAdds tags to a resource
test_connectionTests a connection to a service to validate the service credentials that you provide
untag_resourceRemoves tags from a resource
update_blueprintUpdates a registered blueprint
update_catalogUpdates an existing catalog's properties in the Glue Data Catalog
update_classifierModifies an existing classifier (a GrokClassifier, an XMLClassifier, a JsonClassifier, or a CsvClassifier, depending on which field is present)
update_column_statistics_for_partitionCreates or updates partition statistics of columns
update_column_statistics_for_tableCreates or updates table statistics of columns
update_column_statistics_task_settingsUpdates settings for a column statistics task
update_connectionUpdates a connection definition in the Data Catalog
update_crawlerUpdates a crawler
update_crawler_scheduleUpdates the schedule of a crawler using a cron expression
update_databaseUpdates an existing database definition in a Data Catalog
update_data_quality_rulesetUpdates the specified data quality ruleset
update_dev_endpointUpdates a specified development endpoint
update_integration_resource_propertyThis API can be used for updating the ResourceProperty of the Glue connection (for the source) or Glue database ARN (for the target)
update_integration_table_propertiesThis API is used to provide optional override properties for the tables that need to be replicated
update_jobUpdates an existing job definition
update_job_from_source_controlSynchronizes a job from the source control repository
update_ml_transformUpdates an existing machine learning transform
update_partitionUpdates a partition
update_registryUpdates an existing registry which is used to hold a collection of schemas
update_schemaUpdates the description, compatibility setting, or version checkpoint for a schema set
update_source_control_from_jobSynchronizes a job to the source control repository
update_tableUpdates a metadata table in the Data Catalog
update_table_optimizerUpdates the configuration for an existing table optimizer
update_triggerUpdates a trigger definition
update_usage_profileUpdate an Glue usage profile
update_user_defined_functionUpdates an existing function definition in the Data Catalog
update_workflowUpdates an existing workflow

Examples

## Not run: svc <- glue() svc$batch_create_partition( Foo = 123 ) ## End(Not run)
  • Maintainer: Dyfan Jones
  • License: Apache License (>= 2.0)
  • Last published: 2025-03-17