“The first duty in life is to be as artificial as possible. What the second duty is no one has as yet discovered.” Oscar Wilde
 
R caching with financial data

R caching with financial data

In the previous post we looked at a simple data caching example which we used to explore the workings of the R-package DataCache.

In this post we continue with this exploration. Instead of just using system time as the datafeed we now use a more real world example of financial data. This is again in preparation for running a custom Shiny server.

So let’s start.

We start by defining the function get_timeseries  to retrieve and fill stock data.

We put this function inside another function datafeed_timeseries that DataCache can understand.

The line

names(out) = paste0('stockdata.', stock_id)

implies that the actual data is saved in the environment under the name of paste0(‘stockdata.’, stock_id) ie if stockid = ‘ALV.DE’ than the data is saved under stockdata.ALV.DE. For another example see the testing example further below.

So here’s the code:

library(PerformanceAnalytics)
library(DataCache)

get_timeseries = function(stock_id) {
  
  AdjustedPrice = 6
  .stockdata = getSymbols(stock_id, warnings = FALSE, auto.assign = FALSE)
  stockdata = na.fill(.stockdata, fill = "extend")[, AdjustedPrice, drop=FALSE]
  
  return (stockdata)
}

datafeed_timeseries = function(stock_id) {
    
  timeseries = get_timeseries(stock_id)
  out = list(timeseries)
  names(out) = paste0('stockdata.', stock_id)
  
  return(out)
  
}

Now we’ll do several tests to see that the cache actually accelerates loading the data, in this set up in fact it is ~ 100 times faster.

# do timing tests
  varName1 = 'BAS.DE'

# delete the cache (just in case there are any leftovers) 
  junk <- dir(path="~/cache/", pattern=varName1, full.names = TRUE) # ?dir
  file.remove(junk) # ?file.remove
  
# first time
  start_time <- Sys.time()
  cache.stockdata = data.cache(function() datafeed_timeseries(varName1) , cache.name = varName1, frequency = daily, wait = FALSE)
  end_time <- Sys.time()

  timeTaken1 = end_time - start_time
  # Time difference of 0.3825941 secs

# second time
  start_time <- Sys.time()
  cache.stockdata = data.cache(function() datafeed_timeseries(varName1) , cache.name = varName1, frequency = daily, wait = FALSE)
  end_time <- Sys.time()
  
  timeTaken2 = end_time - start_time
  # Time difference of 0.003516197 secs

# this is the actual data
  tail(stockdata.BAS.DE)
  # BAS.DE.Adjusted
  # 2017-12-14           93.75
  # 2017-12-15           93.67
  # 2017-12-18           95.46
  # 2017-12-19           94.28
  # 2017-12-20           93.13
  # 2017-12-21           93.69

# delete the cache 
  junk <- dir(path="~/cache/", pattern=varName1, full.names = TRUE) # ?dir
  file.remove(junk) # ?file.remove

# third time
  start_time <- Sys.time()
  cache.stockdata = data.cache(function() datafeed_timeseries(varName1) , cache.name = varName1, frequency = daily, wait = FALSE)
  end_time <- Sys.time()
  
  timeTaken3 = end_time - start_time
  # Time difference of 0.4717042 secs

# fourth time
  start_time <- Sys.time()
  cache.stockdata = data.cache(function() datafeed_timeseries(varName1) , cache.name = varName1, frequency = daily, wait = FALSE)
  end_time <- Sys.time()
  
  timeTaken4 = end_time - start_time
  # Time difference of 0.003220558 secs

  # so retrieving the cache is significantly faster ...
  # we assert that it is more the 50 times faster
  assertthat::assert_that(timeTaken1>timeTaken2 * 50)
  assertthat::assert_that(timeTaken3>timeTaken4 * 50)
  
  # # in fact in this set up it is around 100 
  # as.numeric(timeTaken1)/as.numeric(timeTaken2)
  # # [1] 146.3878
  # as.numeric(timeTaken3)/as.numeric(timeTaken4)
  # # [1] 94.27676

Hurray, it works!

PHP Code Snippets Powered By : XYZScripts.com