2.1 More Than a Statistical Language

R language is use in many industries such as research in academia, pharmaceutical for clinical trials, finance for risk management, social media for natural language processing and sentiment analysis, manufacturing for predicting demand and market trends and many more.

Below is an example of a reproducible expression in building a simple metric stock analysis in R combined with shiny. We will look into NIO and SPY with a starting date of Jun 01, 2020.

# We will use all libraries through out this chapter
library(plotly)
library(tidyquant)
library(ggplot2)
library(dplyr)
library(dygraphs)
library(echarts4r)
library(timetk)
library(glue)
library(tidyr)

# You can change ticker to whatever ticker you want
ticker <- c("NIO", "SPY")

# tidyquant package for tidy format
StockData <- tq_get(ticker, 
                    from = "2019-01-01")

# table format
StockData[1:6,] %>% 
  kable(caption = 'NIO vs. SPY') %>% 
  kable_styling()
Table 2.1: NIO vs. SPY
symbol date open high low close volume adjusted
NIO 2019-01-02 6.13 6.24 6.00 6.20 8823600 6.20
NIO 2019-01-03 6.10 6.15 6.02 6.05 7562900 6.05
NIO 2019-01-04 6.19 6.40 6.13 6.36 9405600 6.36
NIO 2019-01-07 6.41 6.59 6.31 6.50 9709000 6.50
NIO 2019-01-08 6.57 6.58 6.17 6.40 9603800 6.40
NIO 2019-01-09 6.41 6.69 6.35 6.63 11489900 6.63

See below for time series chart of NIO’s daily closing price.

NIOData <- StockData %>% filter(symbol=="NIO")

NIOData$date <- factor(NIOData$date)

# using echarts4r package to draw the plot
NIOData %>% 
  e_charts(date) %>%
  # candlestick
  e_candle(open, close, low, high, name= "NIO") %>%
  e_axis_labels(y="Price", x="Date") %>% 
  e_datazoom(type="slider") %>% 
  e_title("NIO Price History") %>% 
  e_tooltip("axis")

Figure 2.1: NIO Price

Imagine communicating these data points to your client without any visual aids. It would take forever. Visualization is a powerful tool, that provides a better understanding of our huge data sets. Figure 2.1, with just a quick glance we can see the x axis as “Date” with a range of Jan 01, 2019 to Aug 04, 2020, y axis as “Price” with a range of 0-18 and ticker symbol NIO. Candlestick charting tells us close, open, low , and high price of the stock on a specific date. We can also use the slider to zoom in on a specific dates.

Visualization amplifies messages and at the same time simplifies communication to an audience. Visualization provides an insight to make better decision.

Lets create another chart comparing the movement of NIO and SPY and see if there is a relationship by creating an imaginary investment of 100 dollars each on NIO and SPY with a starting date of 01-01-2019. Let’s look at the code below:

# clear everything
rm(list = ls())

# ticker symbols
ticker <- c("NIO", "SPY")

# Grab Data using tidyquant package
StockData <- tq_get(ticker, from = "2019-01-01")

# initial investment of 100
NIO <- StockData %>% 
  filter(symbol =="NIO") %>%  
  select(symbol, date, close) %>% 
  mutate(init=100*close[[1]], 
         actual = close*100, 
         ratio =round(((actual/init)-1)*100, digits=3))
NIO[1:6,] %>% 
  kable(caption = "NIO $100 Investment") %>% 
  kable_styling()
Table 2.2: NIO $100 Investment
symbol date close init actual ratio
NIO 2019-01-02 6.20 620 620 0.000
NIO 2019-01-03 6.05 620 605 -2.419
NIO 2019-01-04 6.36 620 636 2.581
NIO 2019-01-07 6.50 620 650 4.839
NIO 2019-01-08 6.40 620 640 3.226
NIO 2019-01-09 6.63 620 663 6.935
SPY <- StockData %>% 
  filter(symbol =="SPY") %>%  
  select(symbol, date, close) %>% 
  mutate(init=100*close[[1]], 
         actual = close*100, 
         ratio =round(((actual/init)-1)*100, digits=3))

SPY[1:6,] %>% 
  kable(caption = "SPY $100 Investment") %>% 
  kable_styling()
Table 2.3: SPY $100 Investment
symbol date close init actual ratio
SPY 2019-01-02 250.18 25018 25018 0.000
SPY 2019-01-03 244.21 25018 24421 -2.386
SPY 2019-01-04 252.39 25018 25239 0.883
SPY 2019-01-07 254.38 25018 25438 1.679
SPY 2019-01-08 256.77 25018 25677 2.634
SPY 2019-01-09 257.97 25018 25797 3.114
# Data Wrangling
NIO <- NIO %>% 
  filter(symbol == "NIO") %>% 
  select(symbol, date, ratio) %>%  
  rename(NIO = ratio)

SPY <- SPY %>% 
  filter(symbol == "SPY") %>% 
  select(symbol, date, ratio) %>%  
  rename(SPY = ratio)

# Combining data frame
StockData <- 
  left_join(NIO, SPY, by="date") %>% 
  select(date, NIO, SPY)

NIO_h <- StockData %>% 
         filter(NIO > 100)


SPy_h <-StockData %>% 
        filter(SPY> 30)

ggplot(StockData, 
       aes(x=NIO, 
           y=SPY),
        color="grey") +
  geom_point(alpha=.5) +
  geom_smooth(method = "lm") +
  labs(title = "Percentage Gain") +
  geom_point(data = NIO_h, 
             aes(x=NIO, y=SPY), 
             color = "red", 
             alpha = .3) +
  geom_point(data = SPy_h, 
             aes(x=NIO, 
                 y=SPY), 
             color = "blue",
             alpha = .3)
Beta

Figure 2.2: Beta

We can see from figure 2.2 that the blue line is almost horizontal with a slightly positive relationship and close to zero - meaning no relationship. The correlation between NIO and SPY is 0.3137118.

StockData$date<-factor(StockData$date)

# using echarts4r package to draw plot
StockData[-1,] %>%  
  e_charts(date) %>% 
  e_line(NIO) %>% 
  e_line(SPY) %>%  
  e_datazoom(type="slider") %>% 
  e_title("NIO vs. SPY") %>%
  e_tooltip("axis")

Figure 2.3: NIO vs. SPY

Stock Price distribution

# clear everything
rm(list = ls())

# ticker symbols
ticker <- c("NIO", "SPY")

# Grab Data using tidyquant package
StockData <- tq_get(ticker, from = "2019-01-01")

# if needed binning function Freedman Diconis Rule [@fd]
# fd=function(x) {n=length(x) r=IQR(x)2*r/n^(1/3)}

Stock_Dist <- StockData %>% 
    filter(symbol=="NIO")
  
Stock_Dist %>%
  e_charts() %>% 
  e_histogram(close, 
              name = "NIO Price Distribution") %>% 
  e_density(close, 
            name = "density", 
            areaStyle = list(
              opacity = .4), 
            smooth = TRUE, y_index = 1) %>% 
  e_tooltip()

Figure 2.4: plotly and ggplot2 package

Plot using base R graphics hist function.

hist(Stock_Dist$close, 
     breaks=30,
     main = "NIO Price Distribution", 
     xlab = "Price"
     )
base r visuals

Figure 2.5: base r visuals

Figure 2.4 and 2.5 shows a bi-modal distribution. There are two groups of traders in these charts. The first group bought and sold at around 7 and the other one at 13. There are many variables that we do not see in these two charts such as date and time, current news, trade environment and many more which falls under standard error. The goal is to minimize the se.

Below is another package that turns your ggplot into 3d-plot using rayrender:

library(rayrender)
library(rayshader)
fd = function(x) {
    n = length(x)
    r = IQR(x)
    2 * r/n^(1/3)
}

p2 <- ggplot(Stock_Dist, aes(x = close)) + 
      geom_histogram(aes
                     (y = ..density..), 
                     color = "black", 
                     fill = "grey") + 
      geom_density(
        alpha = 0.2, 
        fill = "#FF6666") + 
      labs(x = "NIO", 
           y = "Frequency", 
           title = "NIO Price Distribution")


plot_gg(p2, width = 3.5, multicore = TRUE, windowsize = c(800, 800), 
        zoom = .65, phi = 50, theta = 0, sunangle = 225, soliddepth = -100)

Sys.sleep(0.2)

render_snapshot(clear = TRUE)
fancy plot [@ray]

Figure 2.6: fancy plot (Morgan-Wall 2020)

Once we combine all of these graphs. We can build a minimum reproducible product using Shiny. Click to see.

Now, that you’ve seen the power of R. What do you think?

References

Morgan-Wall, Tyler. 2020. “Rayrender.” https://github.com/tylermorganwall/rayrender.