“The first duty in life is to be as artificial as possible. What the second duty is no one has as yet discovered.” Oscar Wilde
 
R-caching (and scheduling)

R-caching (and scheduling)

This is in preparation for running a custom Shiny server. We want to accelerate the server by using caching. In the this post we take a look at a candidate caching package.

In this post we’ll explore a the package DataCache. It is a very useful package, however, I found that for some reason the provided weather data example was not working.  So I wanted to simulate a  datafeed using a scheduler, preferably within R. There is a scheduler package for R tcltk2. It worked for me from the command line, however when running this in RStudio or Rscript there is a small complication, which we will cover further below.

The data function here  outputs the system time, using Sys.time(). When it is cached it uses a previous version of the cached time, therefore it is is smaller than the current Sys.time(). In general

Current time >=Cached time  .

Let’s look at the output first:

No cached data found. Loading intial data...
[1] "Current:2017-12-22 14:58:09.2|Cached:2017-12-22 14:58:09.2"
[1] "Current:2017-12-22 14:58:09.5|Cached:2017-12-22 14:58:09.2"
[1] "Current:2017-12-22 14:58:09.7|Cached:2017-12-22 14:58:09.2"
[1] "Current:2017-12-22 14:58:09.9|Cached:2017-12-22 14:58:09.2"
[1] "Current:2017-12-22 14:58:10.1|Cached:2017-12-22 14:58:09.2"
Loading more recent data, returning lastest available.
[1] "Current:2017-12-22 14:58:10.3|Cached:2017-12-22 14:58:09.2"
[1] "Current:2017-12-22 14:58:10.5|Cached:2017-12-22 14:58:10.3"
[1] "Current:2017-12-22 14:58:10.7|Cached:2017-12-22 14:58:10.3"
[1] "Current:2017-12-22 14:58:10.9|Cached:2017-12-22 14:58:10.3"
[1] "Current:2017-12-22 14:58:11.1|Cached:2017-12-22 14:58:10.3"
Loading more recent data, returning lastest available.
[1] "Current:2017-12-22 14:58:11.3|Cached:2017-12-22 14:58:10.3"
[1] "Current:2017-12-22 14:58:11.5|Cached:2017-12-22 14:58:11.3"
[1] "Current:2017-12-22 14:58:11.7|Cached:2017-12-22 14:58:11.3"
[1] "Current:2017-12-22 14:58:11.9|Cached:2017-12-22 14:58:11.3"
[1] "Current:2017-12-22 14:58:12.1|Cached:2017-12-22 14:58:11.3"
Loading more recent data, returning lastest available.
[1] "Current:2017-12-22 14:58:12.4|Cached:2017-12-22 14:58:11.3"
[1] "Current:2017-12-22 14:58:12.6|Cached:2017-12-22 14:58:12.4"
[1] "Current:2017-12-22 14:58:12.8|Cached:2017-12-22 14:58:12.4"
[1] "Current:2017-12-22 14:58:13.0|Cached:2017-12-22 14:58:12.4"
[1] "Current:2017-12-22 14:58:13.2|Cached:2017-12-22 14:58:12.4"

So we can see that it works. Basically, the scheduler does a cycle ~ every 200ms, whereas the Cached time is only updated every second, which implies that the update happens after 5 = 1000ms/200ms cycles.

Let’s discuss the code. We have three parts:

  1. Preparations: Loading packages, setting options:
    #!/usr/bin/env Rscript
    
    # load packages
    library(DataCache) # the the caching
    library(tcltk2) # for the scheduler
    
    # set the resolution to printed time values 
    #  so instead of 2017-12-22 14:58:12 we now have 2017-12-22 14:58:12.4
    op <- options(digits.secs = 1)
  2. Define the functions for caching:  the datafeed and custom frequency function :
    # define getTime function: 
    datafeed_getTime = function(varName) {
      
      timeValue = Sys.time()
      
      out = list(timeValue)
      names(out) = paste0('Mycached.' , varName)
      
      return (out)
    }
    
    # define custom frequency for cache updates
    # nMinutes already exists in the package DataCache, but we want faster updates for this test
    customFrequency_nSeconds <- function(seconds) {
      fun <- function(timestamp) {
        return(difftime(Sys.time(), timestamp, units='secs') > seconds)
      }
      return(fun)
    }
    
    
    varName1 = 'test1' # remark : the cached variable for this varName is Mycached.test1
    
    
  3. Define the scheduler:
tclTaskDelete(NULL) # delete all running tasks

tclTaskSchedule(200, {
  cache.timedata1 = data.cache(function() datafeed_getTime(varName1) , cache.name = varName1, frequency = customFrequency_nSeconds(1))
  
  print(paste0('Current:', Sys.time(), '|Cached:', Mycached.test1))

}

, id = "ticktock_test1", redo = 20)

The final part is only necessary when not running the code in the R command line i.e., when using it in Rstudio or Rscript. This is necessary for the scheduler to work.  There are other ways to define schedulers, which are more robust, but less readable than the tclTaskSchedule, therefore for simplicity’s sake I chose tclTaskSchedule for this post.

# Start : special
  # This part is only necessary for the scheduler to run with Rscript or RStudio. In R command line it is not necessary
  #  function for
  runFor = function(totalRunningTime)
  {
  
    startTime <- Sys.time()
    repeat{
      if (Sys.time() - startTime > totalRunningTime) {
        break
      }
    }
  }
  
  
  runFor(totalRunningTime = 7) # totalRunningTime is in seconds
# End : Special

options(op)

 

Final comment: The main hurdle to understanding the way DataCache works are these two points:

  • data.cache expects a function. If we want more than one cache we can can e.g. distinguish these by using a variable name varName1, and wrap the datafeed_getTime(varName1) call  in a anonymous function
    cache.timedata1 = data.cache(function() datafeed_getTime(varName1) , cache.name = varName1, frequency = customFrequency_nSeconds(1))
    

    That variable name is then used in datafeed_getTime to define under which name the value is saved, this is done here:

names(out) = paste0('Mycached.' , varName)

This means because we define varName1 = ‘test1’ that  the cached variable for this varName is Mycached.test1

So here is the entire code (for easy copy and pasting):

#!/usr/bin/env Rscript

library(DataCache) # the the caching
library(tcltk2) # for the scheduler


# set the resolution to printed time values 
#  so instead of 2017-12-22 14:58:12 we now have 2017-12-22 14:58:12.4
op <- options(digits.secs = 1)


# define getTime function: 
datafeed_getTime = function(varName) {
  
  timeValue = Sys.time()
  
  out = list(timeValue)
  names(out) = paste0('Mycached.' , varName)
  
  return (out)
}

# define custom frequency for cache updates
# nMinutes already exists in the package DataCache, but we want faster updates for this test
customFrequency_nSeconds <- function(seconds) {
  fun <- function(timestamp) {
    return(difftime(Sys.time(), timestamp, units='secs') > seconds)
  }
  return(fun)
}


varName1 = 'test1' # remark : the cached variable for this varName is Mycached.test1


tclTaskDelete(NULL) # delete all running tasks

tclTaskSchedule(200, {
  cache.timedata1 = data.cache(function() datafeed_getTime(varName1) , cache.name = varName1, frequency = customFrequency_nSeconds(1))
  
  print(paste0('Current:', Sys.time(), '|Cached:', Mycached.test1))

}

, id = "ticktock_test1", redo = 20)


# Start : special
  # This part is only necessary for the scheduler to run with Rscript or RStudio. In R command line it is not necessary
  #  function for
  runFor = function(totalRunningTime)
  {
  
    startTime <- Sys.time()
    repeat{
      if (Sys.time() - startTime > totalRunningTime) {
        break
      }
    }
  }
  
  
  runFor(totalRunningTime = 7) # totalRunningTime is in seconds
# End : Special

options(op)




 

PHP Code Snippets Powered By : XYZScripts.com