ARIMA Time Series predictions using Box-Jenkins approach with R

This document shows the way to use R to make predictions with ARIMA models. The data used in this document, can be downloaded in the following link: http://www.robhyndman.info/TSDL/data/abraham4.dat
The file contains Monthly car sales in Quebec 1960-1968. Source: Abraham & Ledolter (1983).

Box-Jenkins Methodology has the following steps to make ARIMA models:

  1. Identification:
    • To find the appropriate values of p and q
    • Correlogram and partial correlogram
  2. Estimation:
    • Parameters of the AR and MA
    • Linear (OLS) models
    • Nonlinear models
  3. Diagnostic checking:
    • Check if the residuals from this model are white noise
    • Forecasting

Note: The piece of code shown in this document appears in a red box while the simulation of a R session is shown in a blue box.

Plot Series Model Selection

I. Model identification:

If you want to know if your data is stationary, you will look up your data.

Load this piece of code to see the form of your data.

###################################
# DATA ANALYSIS SCRIPT            #
# Autor: Juan Antonio Breña Moral #
# Email: bren@juanantonio.info    #
###################################

###################################
# R ENGINE SETTINGS               #
###################################

#Sentence written to set working directory
WD <- "C://Documents and Settings/JAB/Escritorio";
setwd(WD);
# make sure working directory is set
getwd();

# set digits
options("digits"=3)

###################################
# ARIMA MODEL SCRIPT              #
###################################

INSTALL_LIBRARIES <- function(){
	install.packages("forecast");
	install.packages("nortest");
	install.packages("quadprog");
	install.packages("zoo");
	install.packages("tseries");
}

#Function used to load libraries
#FORECAST: www-personal.buseco.monash.edu.au/~hyndman/Rlibrary/forecast/
#NORMAL TEST: http://cran.r-project.org/src/contrib/Descriptions/nortest.html
#TIME SERIES: //cran.r-project.org/src/contrib/Descriptions/tseries.html
LOAD_LIBRARIES <- function(){
	library("forecast");
	library("nortest");
	library("tseries");
}
  
# Funcion que genera el mensaje de cabecera.
HEADER <-function(){
cat("\n::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::");
cat("\n:: SALES PREDICTION | AUTHOR: Juan Antonio Breña Moral          ::");
cat("\n:: VERSION: 2006-06-17 | DOWNLOADS: http://www.juanantonio.info ::");
cat("\n:: EMAIL: bren@juanantonio.info                                 ::");
cat("\n::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::");
cat("\n:: DESCRIPTION:                                                 ::"):
cat("\n:: Sales Prediction, is a Script to make prediccions. It was    ::");
cat("\n:: designed to control a sales budget in any company  using     ::");
cat("\n:: ARIMA Time Series statistical techniques. The script needs   ::");
cat("\n:: a sales data to start to build a ARIMA model to make a       ::");
cat("\n:: predictions. This script is very useful to design a sales    ::");
cat("\n:: budget in the following exercise and control it. The         ::");
cat("\n:: methodology used to build ARIMA model is Box-Jenkins.        ::");
cat("\n::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::\n");
}

TIME_SERIES <- function(DATA,F,S){
    DATOS.TS <- ts(DATA,frequency = F, start = S);
    return(DATOS.TS);
}

JAB.PREDICTION.EDA <- function(DATOS){

	FILE_NAME <- format(Sys.time(), "%X");
	FILE_EXTENSION <- "jpeg";

	#Graphical Output
    par(mfrow= c(2,2));
    plot(DATOS,type="l")
    hist(DATOS, probability =T);
    lines(density(DATOS), col="red");
    qqnorm(DATOS);
    abline(0,1,col="red");
    boxplot(DATOS, horizontal=T);
    par(mfrow= c(1,1)); 
    
}

#FROM: http://tolstoy.newcastle.edu.au/R/help/99b/0171.html
mykurtosis <- function(x) { 
    m4 <- mean((x-mean(x))^4) 
    kurt <- m4/(sd(x)^4)-3 
    kurt 
} 

myskewness <- function(x) { 
    m3 <- mean((x-mean(x))^3) 
    skew <- m3/(sd(x)^3) 
    skew 
} 

JAB.PREDICTION.EDA_TABLE <- function(DATAS){
	COLUMN_NAMES <- c("MEAN","MEDIAN","SD","KURTOSIS","SKEWNESS");
	DATA <- c();
	DATA[1] <- mean(DATAS);
	DATA[2] <- median(DATAS);
	DATA[3] <- sd(DATAS);
	DATA[4] <- mykurtosis(DATAS);
	DATA[5] <- myskewness(DATAS);
	names(DATA) <- COLUMN_NAMES;
	
	print("JAB: EDA VIEW 1.0")
	print(DATA);
	stem(DATAS);
}

JAB.PREDICTION.NORMAL_TESTS <- function(DATAS){
COLUMN_NAMES <- c("SHAPIRO","ANDERSON-DARLING",
"CRAMER-VON MISES","KS","JARQUE-BERA");
	DATA <- c();
	DATA[1] <- shapiro.test(DATAS)$p.value;
	DATA[2] <- ad.test(DATAS)$p.value;
	DATA[3] <- cvm.test(DATAS)$p.value;
	DATA[4] <- lillie.test(DATAS)$p.value;
	DATA[5] <- jarque.bera.test(DATAS)$p.value;
	names(DATA) <- COLUMN_NAMES;
	
	print("JAB: NORMAL TEST TABLE 1.0")
	print(DATA);
}

Enter the data in R system using JGR as R GUI or using RClass.

# To predict anything, you have to get the data
SALES <- scan();
  6550  8728 12026 14395 14587 13791  9498  8251  7049  9545  9364  8456
  7237  9374 11837 13784 15926 13821 11143  7975  7610 10015 12759  8816
 10677 10947 15200 17010 20900 16205 12143  8997  5568 11474 12256 10583
 10862 10965 14405 20379 20128 17816 12268  8642  7962 13932 15936 12628
 12267 12470 18944 21259 22015 18581 15175 10306 10792 14752 13754 11738
 12181 12965 19990 23125 23541 21247 15189 14767 10895 17130 17697 16611
 12674 12760 20249 22135 20677 19933 15388 15113 13401 16135 17562 14720
 12225 11608 20985 19692 24081 22114 14220 13434 13598 17187 16119 13713
 13210 14251 20139 21725 26099 21084 18024 16722 14385 21342 17180 14577

LOAD_LIBRARIES();
JAB.PREDICTION.EDA(SALES);
JAB.PREDICTION.EDA_TABLE(SALES);
JAB.PREDICTION.NORMAL_TESTS(SALES);
[1] "JAB: EDA VIEW 1.0"
         MEAN        MEDIAN            SD      KURTOSIS      SKEWNESS 
14595.1111111 14076.0000000  4525.2139131    -0.6600995     0.3106385 


   5 | 6
   6 | 6
   7 | 026
   8 | 0035678
   9 | 04455
  10 | 03678999
  11 | 015678
  12 | 012233356788
  13 | 02446788889
  14 | 2344466788
  15 | 1222499
  16 | 11267
  17 | 0122678
  18 | 069
  19 | 79
  20 | 0112479
  21 | 012337
  22 | 011
  23 | 15
  24 | 1
  25 | 
  26 | 1

[1] "JAB: NORMAL TEST TABLE 1.0"
    SHAPIRO ANDERSON-DARLING CRAMER-VON MISES               KS      JARQUE-BERA 
 0.06335382       0.03812055       0.04695266       0.11516182       0.17425073

If you notice, the sales are asymetric. Data fails different tests for normality.
If you test it with normal data sample, you would observe different results.

NORMAL_SAMPLE <- rnorm(5000);
JAB.PREDICTION.EDA(NORMAL_SAMPLE);
JAB.PREDICTION.EDA_TABLE(NORMAL_SAMPLE);
JAB.PREDICTION.NORMAL_TESTS(NORMAL_SAMPLE);
[1] "JAB: EDA VIEW 1.0"
         MEAN        MEDIAN            SD      KURTOSIS      SKEWNESS 
-0.0001047225  0.0006451141  0.9922556060  0.0825848178 -0.0086063633 

  The decimal point is at the |

  -4 | 5
  -4 | 
  -3 | 65
  -3 | 3000
  -2 | 999998877777666666666665
  -2 | 44444444444444333333333322222222222222222221111111111111111111111111+8
  -1 | 99999999999999999999999999999999999888888888888888888888888888888888+173
  -1 | 44444444444444444444444444444444444444444444444444444444444444444444+384
  -0 | 99999999999999999999999999999999999999999999999999999999999999999999+715
  -0 | 44444444444444444444444444444444444444444444444444444444444444444444+785
   0 | 00000000000000000000000000000000000000000000000000000000000000000000+819
   0 | 55555555555555555555555555555555555555555555555555555555555555555555+650
   1 | 00000000000000000000000000000000000000000000000000000000000000000000+439
   1 | 55555555555555555555555555555555555555555555555555555555555555555555+166
   2 | 00000000000000000000011111111111111111222222222222222223333333333344
   2 | 555555555566666666677889999
   3 | 00011114
   3 | 55
   4 | 1

[1] "JAB: NORMAL TEST TABLE 1.0"
    SHAPIRO ANDERSON-DARLING CRAMER-VON MISES               KS      JARQUE-BERA 
  0.4513131        0.5120592        0.6798911        0.6469248        0.4664040 

II. Model estimation and verification:

TIME_SERIES <- function(DATA,F,S){
    DATOS.TS <- ts(DATA,frequency = F, start = S);
    return(DATOS.TS);
}

JAB.PREDICTION.BEST_ARIMA <- function(TIME_SERIES){
    DATOS.BEST_ARIMA <- best.arima(DATOS.TS)
    return(DATOS.BEST_ARIMA)
}
SALES.TS <- TIME_SERIES(SALES,12,1968);
SALES.ARIMA <- JAB.PREDICTION.BEST_ARIMA(SALES.TS);

III. Forecasting:

Forecasting the future is always difficult...

"Computers in the future may weigh no more than 1.5 tons."
Popular Mechanics, forecasting the relentless march of science, 1949

"I think there is a world market for maybe five computers."
Chairman of IBM, 1943

"I have traveled the length and breadth of this country and talked with the best people, and I can assure you that data processing is a fad that won't last out the year."
The editor in charge of business books for Prentice Hall, 1957

"But what ... is it good for?"
Engineer at the Advanced Computing Systems Division of IBM, 1968, commenting on the microchip.

"There is no reason anyone would want a computer in their home."
President, Chairman and founder of Digital Equipment Corp., 1977

"This 'telephone' has too many shortcomings to be seriously considered as a means of communication. The device is inherently of no value to us."
Western Union internal memo, 1876.

"The wireless music box has no imaginable commercial value. Who would pay for a message sent to nobody in particular?"
David Sarnoff's associates in response to his urgings for investment in the radio in the 1920s.

"The concept is interesting and well-formed, but in order to earn better than a 'C,' the idea must be feasible."
A Yale University management professor in response to Fred Smith's paper proposing reliable overnight delivery service. (Smith went on to found Federal Express Corp.)

"Who the hell wants to hear actors talk?"
Warner Brothers, 1927.

"I'm just glad it'll be Clark Gable who's falling on his face and not Gary Cooper."
Gary Cooper on his decision not to take the leading role in "Gone With The Wind."

"A cookie store is a bad idea. Besides, the market research reports say America likes crispy cookies, not soft and chewy cookies like you make."
Response to Debbi Fields' idea of starting Mrs. Fields' Cookies.

"We don't like their sound, and guitar music is on the way out."
Decca Recording Co. rejecting the Beatles, 1962.

"Heavier-than-air flying machines are impossible."
President, Royal Society, 1895.

"If I had thought about it, I wouldn't have done the experiment. The literature was full of examples that said you can't do this."
Spencer Silver on the work that led to the unique adhesives for 3-M "Post-It" Notepads.

"So we went to Atari and said, 'Hey, we've got this amazing thing, even built with some of your parts, and what do you think about funding us? Or we' ll give it to you. We just want to do it. Pay our salary, we'll come work for you.' And they said, 'No.' So then we went to Hewlett-Packard, and they said, 'Hey, we don't need you. You haven't got through college yet.'"
Apple Computer Inc. founder Steve Jobs on attempts to get Atari and H-P interested in his and Steve Wozniak's personal computer.

"Professor Goddard does not know the relation between action and reaction and the need to have something better than a vacuum against which to react. He seems to lack the basic knowledge ladled out daily in high schools."
1921 New York Times editorial about Robert Goddard's revolutionary rocket work.

"You want to have consistent and uniform muscle development across all of your muscles? It can't be done. It's just a fact of life. You just have to accept inconsistent muscle development as an unalterable condition of weight training."
Response to Arthur Jones, who solved the "unsolvable" problem by inventing Nautilus.

"Drill for oil? You mean drill into the ground to try and find oil? You're crazy."
Drillers who Edwin L. Drake tried to enlist to his project to drill for oil in 1859.

"Stocks have reached what looks like a permanently high plateau."
Professor of Economics, Yale University, 1929.

"Airplanes are interesting toys but of no military value."
Professor of Strategy, Ecole Superieure de Guerre.

"Everything that can be invented has been invented."
Commissioner, U.S. Office of Patents, 1899.

"Louis Pasteur's theory of germs is ridiculous fiction".
Professor of Physiology at Toulouse, 1872

"The abdomen, the chest, and the brain will forever be shut from the intrusion of the wise and humane surgeon."
British surgeon, appointed Surgeon-Extraordinary to Queen Victoria 1873.

"640K ought to be enough for anybody."
Bill Gates, 1981