logFileRead function

Given a list of file names, read them as log files

Given a list of file names, read them as log files

This function reads a file, parsing it for the fields specified, and normalises the values that have been read.

The log file is assumed to be space delimited, which is the case for Apache and IIS.

logFileRead(fileName, columnList=c("MSTimestamp", "clientip", "url", "httpcode", "elapsed"), logTimeZone = "", timeFormat = "")

Arguments

  • fileName: The name, including path, of the file to read

  • columnList: The columns in the file, in order. Columns are:

    ApacheTimestampOptionalApache log format timestamp
    MSTimestampOptionalIIS log format timestamp
    servernameOptionalName of the web server
    serveripOptionalIP of the server
    httpopOptionalHTTP verb
    urlRequiredPath part of the request
    parmsOptionalQuery string
    portOptionalTCP/IP port that the request arrived on
    usernameOptionalUser name logged by the web server
    useripOptionalIP that the request was seen to originate from.
    useragentOptionalUser agent string in the request
    httpcodeRequiredHTTP response code
    windowscodeOptionalWindows return code recorded by IIS
    windowssubcodeOptionalWindows sub code recorded by IIS
    responsebytesOptionalNumber of bytes in the HTTP response
    requestbytesOptionalNumber of bytes in the HTTP request
    elapsedmsOptionalRequest elapsed time in milliseconds
    elapsedusOptionalRequest elapsed time in microseconds (will be rounded to milliseconds)
    elapsedsOptionalRequest elapsed time in seconds (not recommended, will be expanded to milliseconds)
    jsessionidOptionalUser session identifier
    ignore*OptionalColumns with names starting with 'ignore' are dropped

    One timestamp and one elapsed time column name must be specified.

    The Apache URL is handled partly in the fix data procedure in the config file because it wraps the operation and URL path in one field. The IIS URL does not need this additional parsing.

  • logTimeZone: The timezone to use to adjust the timestamps in the log. This is used primarily for IIS logs where the log may be either UTC or local time.

  • timeFormat: If the timestamp in the log is not in the default for IIS or Apache this can be used to override the timestamp parsing. The format is the r strptime format.

Returns

The function returns a dataframe that contains the contents of the file.

Author(s)

Greg Hunt greg@firmansyah.com

Examples

logFileName = logFileNamesGetLast(dataDirectory=datd, directoryNames=c(".", "."), fileNamePattern="*[.]log")[[1]] cols = logFileFieldsGetIIS(logFileName) logdf = logFileRead(logFileName, columnList=cols, logTimeZone = "", timeFormat = "")