2.4 What is Data?

There are so many data being generated everyday. Take me for example, I’ve sent 5 emails today, reviewed and created trading strategy for next week, check my reddit, stocktwits, webull, and twitter for sentiment analysis, shared several photos to my family, check-out my Tesla app to monitor Solar Panel and battery, and reviewed my nest video monitoring. These are all completed in early morning routine.

Data is coming from many forms your typical emails, social media, internet of things like Tesla solar panels/Powerwall connected to the internet and google’s home monitoring Nest. All of these data are a combination of numerical, categorical, and it some cases are often big which falls under binary large objects or “BLOB”.

# ggplot argument x0, y0, r are all required,
v2 <- ggplot(VennDS, aes(
  x0 = x,
  y0 = y,
  r = 1.5,
  fill = cat
)) + geom_circle(alpha = 0.25,
                 size = 1,
                 color = "transparent",
                 show.legend = FALSE) + 
# using geom_text to draw on the graph
  geom_text(aes(x = -1.5, 
                y = 1, 
                label = "Volume"), 
            size = 5) +       
  geom_text(aes(x = 1.5, 
                y = 1, 
                label = "Velocity"), 
            size = 5) +
  geom_text(aes(x = 0, 
                y = -1, 
                label = "Variety"), 
            size = 5) +
  geom_text(aes(x = 0, 
                y = .75, 
                label = "Big Data"), 
            size = 5) + 
# remove x and y labels
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())

v2
The **V**s of Big Data

Figure 2.8: The Vs of Big Data

The Data Science field is growing due to the massive data being generated. Figure 2.8 shows as the V’s of big data which are Volume, Variety and Velocity. Data is everywhere and we generate on average 1.7mb of data every second by every person during 2020 about 2.5 quintillion bytes are produced by humans every day. There are 4.57 billion active internet users around the world and it will continue to grow (Bulao 2020).

References

Bulao, Jacquelyn. 2020. “How Much Data Is Created Every Day in 2020.” https://techjury.net/blog/how-much-data-is-created-every-day/.