bdpar3.1.0 package

Big Data Preprocessing Architecture

ToLowerCasePipe

Class to convert the data field of an Instance to lower case

AbbreviationPipe

Class to find and/or replace the abbreviations on the data field of an...

bdpar.log

Write messages to the log at a given priority level using the custom b...

bdpar.Options

Object to handle the keys/attributes/options common to all pipeline fl...

Bdpar

Class to manage the preprocess of the files throughout the flow of pip...

Connections

Class to manage the connections with YouTube

ContractionPipe

Class to find and/or replace the contractions on the data field of a I...

DefaultPipeline

Class implementing a default pipelining process.

DynamicPipeline

Class implementing a dynamic pipelining process

ExtractorEml

Class to handle email files with eml extension

ExtractorFactory

Class to handle the creation of Instance types

ExtractorSms

Class to handle SMS files with tsms extension

ExtractorYtbid

Class to handle comments of YouTube files with ytbid extension

File2Pipe

Class to obtain the source field of an Instance

FindEmojiPipe

Class to find and/or replace the emoji on the data field of an Instanc...

FindEmoticonPipe

Class to find and/or remove the emoticons on the data field of an Inst...

FindHashtagPipe

Class to find and/or remove the hashtags on the data field of an Insta...

FindUrlPipe

Class to find and/or remove the URLs on the data field of an Instance

FindUserNamePipe

Class to find and/or remove the users on the data field of an Instance

GenericPipe

Abstract super class that handles the management of the Pipes

GenericPipeline

Abstract super class implementing the pipelining process

GuessDatePipe

Class to obtain the date field of an Instance

GuessLanguagePipe

Class to guess the language of an Instance

Instance

Abstract super class that handles the management of the Instances

InterjectionPipe

Class to find and/or remove the interjections on the data field of an ...

MeasureLengthPipe

Class to obtain the length of the data field of an Instance

operator-pipe

bdpar customized forward-pipe operator

ResourceHandler

Class that handles different types of resources

runPipeline

Initiates the pipelining process

SlangPipe

Class to find and/or replace the slangs on the data field of an Instan...

StopWordPipe

Class to find and/or remove the stop words on the data field of an Ins...

StoreFileExtPipe

Class to get the file's extension field of an Instance

TargetAssigningPipe

Class to get the target field of the Instance

TeeCSVPipe

Class to handle a CSV with the properties field of the preprocessed In...

Provide a tool to easily build customized data flows to pre-process large volumes of information from different sources. To this end, 'bdpar' allows to (i) easily use and create new functionalities and (ii) develop new data source extractors according to the user needs. Additionally, the package provides by default a predefined data flow to extract and pre-process the most relevant information (tokens, dates, ... ) from some textual sources (SMS, Email, YouTube comments).

  • Maintainer: Miguel Ferreiro-Díaz
  • License: GPL-3
  • Last published: 2023-12-12