In the previous post we looked at a simple data caching example which we used to explore the workings of the R-package DataCache.
In this post we continue with this exploration. Instead of just using system time as the datafeed we now use a more real world example of financial data. This is again in preparation for running a custom Shiny server.
So let’s start.
We start by defining the function get_timeseries to retrieve and fill stock data.
We put this function inside another function datafeed_timeseries that DataCache can understand.
The line
names(out) = paste0('stockdata.', stock_id)
implies that the actual data is saved in the environment under the name of paste0(‘stockdata.’, stock_id) ie if stockid = ‘ALV.DE’ than the data is saved under stockdata.ALV.DE. For another example see the testing example further below.
So here’s the code:
library(PerformanceAnalytics) library(DataCache) get_timeseries = function(stock_id) { AdjustedPrice = 6 .stockdata = getSymbols(stock_id, warnings = FALSE, auto.assign = FALSE) stockdata = na.fill(.stockdata, fill = "extend")[, AdjustedPrice, drop=FALSE] return (stockdata) } datafeed_timeseries = function(stock_id) { timeseries = get_timeseries(stock_id) out = list(timeseries) names(out) = paste0('stockdata.', stock_id) return(out) }
Now we’ll do several tests to see that the cache actually accelerates loading the data, in this set up in fact it is ~ 100 times faster.
# do timing tests varName1 = 'BAS.DE' # delete the cache (just in case there are any leftovers) junk <- dir(path="~/cache/", pattern=varName1, full.names = TRUE) # ?dir file.remove(junk) # ?file.remove # first time start_time <- Sys.time() cache.stockdata = data.cache(function() datafeed_timeseries(varName1) , cache.name = varName1, frequency = daily, wait = FALSE) end_time <- Sys.time() timeTaken1 = end_time - start_time # Time difference of 0.3825941 secs # second time start_time <- Sys.time() cache.stockdata = data.cache(function() datafeed_timeseries(varName1) , cache.name = varName1, frequency = daily, wait = FALSE) end_time <- Sys.time() timeTaken2 = end_time - start_time # Time difference of 0.003516197 secs # this is the actual data tail(stockdata.BAS.DE) # BAS.DE.Adjusted # 2017-12-14 93.75 # 2017-12-15 93.67 # 2017-12-18 95.46 # 2017-12-19 94.28 # 2017-12-20 93.13 # 2017-12-21 93.69 # delete the cache junk <- dir(path="~/cache/", pattern=varName1, full.names = TRUE) # ?dir file.remove(junk) # ?file.remove # third time start_time <- Sys.time() cache.stockdata = data.cache(function() datafeed_timeseries(varName1) , cache.name = varName1, frequency = daily, wait = FALSE) end_time <- Sys.time() timeTaken3 = end_time - start_time # Time difference of 0.4717042 secs # fourth time start_time <- Sys.time() cache.stockdata = data.cache(function() datafeed_timeseries(varName1) , cache.name = varName1, frequency = daily, wait = FALSE) end_time <- Sys.time() timeTaken4 = end_time - start_time # Time difference of 0.003220558 secs # so retrieving the cache is significantly faster ... # we assert that it is more the 50 times faster assertthat::assert_that(timeTaken1>timeTaken2 * 50) assertthat::assert_that(timeTaken3>timeTaken4 * 50) # # in fact in this set up it is around 100 # as.numeric(timeTaken1)/as.numeric(timeTaken2) # # [1] 146.3878 # as.numeric(timeTaken3)/as.numeric(timeTaken4) # # [1] 94.27676
Hurray, it works!