Daily electricity price forecasting (report)

Материал из MachineLearning.

(Различия между версиями)
Перейти к: навигация, поиск
(Set of algorithm variants and modifications)
Строка 1: Строка 1:
-
'''Введение в проект'''
+
==Computing experiment report==
-
== Project description ==
+
===Visual analysis===
-
=== Goal ===
+
====Model data====
-
The goal is to forecast average daily spot price of electricity. Forecasting horizon
+
-
(the maximal time segment where the forecast error does not exceed given value)
+
-
is supposed to be one month.
+
-
=== Motivation ===
+
At first we have to test our interpretation of LARS to be sure
-
For example, one needs precise forecast of electricity consumption for each hour
+
about its correct work. LARS has to work better with
-
during the next day to avoid transactions on the balancing market.
+
well-conditioned matrix, as many linear algorithms, such as LASSO.
 +
For ill-conditioned matrix values of regression coefficients
 +
<tex>\beta</tex> appear to increase and decrease during each step of
 +
algorithm. We generate model data by the below code.
-
=== Data ===
+
<code>
-
Daily time series from 1/1/2003 until now. Time series are weather (average-,
+
n = 100;
-
low-, and high daily temperature, relative humidity, precipitation, wind speed,
+
m = 20;
-
heating-, and cooling degree day) and average daily price of electricity. There is
+
X = e*ones(n,m)+(1-e)*[diag(ones(m,1));zeros(n-m,m)];
-
no data on electricity consumption and energy units prices. The data on sunrise
+
beta = (rand(m,1)-0.5);
-
and sunset are coming from the internet.
+
Y = X*beta;
-
+
X = X + (rand(n,m) - 0.5)/10;
-
=== Quality ===
+
-
The time series splits into the whole history but the last month and the last
+
-
month. The model is created using the first part and tested with the second.
+
-
The procedure must be repeated for each month of the last year. The target
+
-
function is MAPE (mean absolute percentage error) for given month.
+
-
=== Requirements ===
+
</code>
-
The monthly error of obtained model must not exceed the error of the existing
+
-
model, same to customers. It’s LASSO with some modifications.
+
-
=== Feasibility ===
 
-
The average daily price of electricity contains peaks and it is seems there is no
 
-
information about these peaks in given data. There is no visible correlation
 
-
between data and responses.
 
-
=== Methods ===
+
For different values of <tex>e</tex> we get ill- and well-conditioned
-
The model to be generated is a linear combination of selected features. Each
+
matrices. We use <tex>e=0.1k, k=\overline{1,10}</tex>. If <tex>e</tex> is small
-
primitive feature (of index j) is a set of j+nk-th samples of time series, k is a
+
matrix'll be well-conditioned with close to orthogonal correlation
-
period. The set of the features includes primitive features and their superpositions.
+
vectors, if <tex>e</tex> is close to 1, we get matrix from just ones with
 +
little noise. This matrix is ill-conditioned. Directions of
 +
correlation vectors are near to same.
-
== Problem definition ==
+
[[Изображение:EPF_10e=0.png|400px]]
 +
[[Изображение:EPF_10e=1.png|400px]]
 +
[[Изображение:EPF_10e=5.png|400px]]
 +
[[Изображение:EPF_10e=8.png|400px]]
 +
[[Изображение:EPF_10e=9.png|400px]]
 +
[[Изображение:EPF_10e=10.png|400px]]
-
We have variables matrix <tex>X</tex> and responses vector <tex>Y</tex> for this matrix. This is
 
-
time series. Our goal is to recover regression <tex>\hat{Y}</tex> for variables matrix <tex>\hat{X}</tex>. This
 
-
data is straight after initial data in time. Our goal is to find vector <tex>\beta</tex>� of linear
 
-
coefficients between <tex>\hat{X}</tex> and <tex>\hat{Y}</tex>, <tex>Y = X\beta^T</tex> .
 
-
As quality functional we use MAPE (mean average percent error).
 
-
<tex>{ Q(\hat{Y}) = \sum_{i=1}^n \frac{|y_i-\hat{y}_i|}{|y_i|}</tex>,
+
So, our LARS realization works normal according to this property.
-
+
-
== Algorithm description ==
+
====Data description====
-
=== State of art ===
+
As real data we use set from our problem. It consists
-
The main task of this subsection is to describe ways of daily electricity price
+
from variables matrix <tex>xRegression</tex> and responses vector
-
forecasting and sort of data needed for this task. Also we have a deal with
+
<tex>yRegression</tex>. We will designate them as <tex>X</tex> and <tex>Y</tex>. Size of <tex>X</tex>
-
German electricity price market and there is brief survey about it.
+
is <tex>n\times m</tex>, where <tex>n</tex> is number of objects (time series from
-
Let’s start from methods of forecasting daily electricity price. There are
+
days) and <tex>m</tex> is number of variables for each object. Size of <tex>Y</tex>
-
many ways to solve this problem. It can be ARIMA models or autoregression
+
is column <tex>n\times 1</tex>.
-
[1], [4]. Artificial neural networks are also used in combination with some serious
+
-
improvements like wavelet techniques or ARIMA models [2]. SVM can be used
+
-
in a close problem of price spikes forecasting [2]. Models of noise and jump can
+
-
be constructed in some other ways [3].
+
-
Sets of data can be rather different. For neural networks it can be only
+
-
time series [1], but in most of works some addition data is required. Weather is
+
-
an important factor in price forecasting [5]. It can be median day temperature,
+
-
HDD, CDD [4] or wind speed [6]. Dates of sunsets and sunrises can be useful too.
+
-
Energy consumption and system load has important impact on daily electricity
+
-
price [4]. Interesting features is prediction <tex>\log(P_t)</tex> instead of <tex>P_t</tex>electricity
+
-
price in €.
+
-
Our goal is forecasting daily electricity price for German electricity price
+
-
market EEX, so let’s represent some information about it. Germany has free
+
-
2
+
-
electricity market, so models for free electricity market can be applied to it.
+
-
Market of energy producing changes every year, main goals are phasing out
+
-
nuclear energy and creation new renewable sources manufactures. Germany is
+
-
one of the largest consumers of energy in the world. In 2008, it consumed energy
+
-
from the following sources: oil (34.8%), coal including lignite (24.2%), natural
+
-
gas (22.1%), nuclear (11.6%), renewables (1.6%), and other (5.8%), whereas
+
-
renewable energy is far more present in produced energy, since Germany imports
+
-
about two thirds of its energy. This country is the world’s largest operators
+
-
of non-hydro renewables capacity in the world, including the world’s largest
+
-
operator of wind generation [7].
+
-
References
+
First column for both <tex>X</tex> and <tex>Y</tex> is time series. For <tex>Y</tex> second
 +
column is responses for each object from <tex>X</tex>. They are normalized
 +
by year mean. Number of variables in <tex>X</tex> is 26. They are described
 +
in below table.
-
[1] Hsiao-Tien Pao. A Neural Network Approach to m-Daily-Ahead Electricity Price Prediction
+
{| class="wikitable" style="text-align: center;"
 +
|- bgcolor="#cccccc"
 +
! width=30 % |#
 +
! width=160 % | Description
-
[2] Wei Wu, Jianzhong Zhou,Li Mo and Chengjun Zhu. Forecasting Electricity Market Price
 
-
Spikes Based on Bayesian Expert with Support Vector Machines
 
-
 
-
[3] S.Borovkova, J.Permana. Modelling electricity prices by the potential jump-functions
 
-
 
-
[4] R.Weron, A.Misiorek. Forecasting spot electricity prices: A comparison of parametric and
 
-
semiparametric time series models
 
-
 
-
[5] J.Cherry, H.Cullen, M.Vissbeck, A.Small and C.Uvo. Impacts of the North Atlantic Oscillation
 
-
on Scandinavian Hydropower Production and Energy Markets
 
-
 
-
[6] Yuji Yamada. Optimal Hedging of Prediction Errors Using Prediction Errors Yuji Yamada
 
-
 
-
[7] http://en.wikipedia.org/wiki/Energy_in_Germany
 
-
 
-
=== Basic hypotheses and estimations ===
 
-
 
-
In case of linear regression we assume, that vector of responses <tex>Y</tex> is linear
 
-
combination for modified incoming matrix of features <tex>\tilde{X}</tex> , what we designate as
 
-
<tex>X</tex>:
 
-
 
-
<tex>Y = X\beta^T</tex>.
 
-
This modified matrix we can get from superposition, smoothing and autoregression
+
|-
-
in initial data set. LARS[8] seems suitable for this kind of problem, so we
+
| '''1''' || time series
-
use it in our work.
+
|-
-
From state of art and data we get some additional hypotheses. We have a
+
| '''2-6''' || day of week
-
deal with data set with two periodic – week periodic and year periodic. Possible
+
|-
-
usage of this is generation new features and creation own model for each day of
+
| '''7-18''' || month
-
week.
+
|-
-
 
+
| '''19''' || mean temperature
-
=== Mathematic algorithm description ===
+
|-
-
==== LARS overview ====
+
| '''20''' || HDD
-
 
+
|-
-
LARS (Least angle regression \cite{[9]}) is new (2002) model
+
| '''21''' || CDD
-
selection algorithm. It has several advantages:
+
|-
-
 
+
| '''22''' || high tepmeratue
-
* It does less steps then older LASSO and Forward selection.
+
|-
-
* This algorithm gives same results as LASSO and Forward selection after simple modification.
+
| '''23''' || low temperature
-
* It creates a connection between LASSO and Forward selection.
+
|-
-
 
+
| '''24''' || relative humidity
-
 
+
|-
-
Let's describe LARS. We have initial data set <tex>X [ n \times m ] </tex>, where <tex>m</tex> is number of variables, <tex>n</tex> is number of objects. <tex>Y</tex> is responses
+
| '''25''' || precipitation
-
vector. Our goal is recovery regression and make estimation for
+
|-
-
<tex> {\beta}</tex>, where <tex> {\hat{\mu}}=X{\hat{\beta}}'</tex>
+
| '''26''' || wind speed
-
For each algorithm step we add new variable to model and create parameters estimation for variables
+
|-
-
subset. Finally we'll get set of parameters estimation <tex>\{ {\beta_1,\beta_2,\dots,\beta_m}\}</tex>. Algorithm works in condition of independence <tex> \{ x_1,...,x_m \} =X</tex>.
+
|}
-
 
+
-
Describe this algorithm single step. For subset of vectors with the largest correlations, we add one vector to it,
+
-
and get indexes set <tex> { A } </tex>. Then
+
-
 
+
-
<tex>
+
-
X_{A}=(\cdots s_j , {\x_j}, \cdots)_{j \in A},
+
-
</tex>
+
-
 
+
-
where <tex>s_j=\pm{1}</tex>. Introduce
+
-
 
+
-
<tex>
+
-
{ G_{A} }=X'_{A} X_{A}
+
-
</tex>
+
-
 
+
-
<tex>
+
-
A_{A}=( {1}'_{A} {G}^{-1}_{A} {1}_{A})^{-1/2},
+
-
</tex>
+
-
 
+
-
where <tex> {1}_{A}</tex> --- ones vector size <tex>|{A}|</tex>. Then calculate equiangular vector between <tex>X_{A}</tex> :
+
-
 
+
-
<tex>
+
-
{u_A}=X_{A}\omega_{A} \qquad \omega_{A}=A_{A}
+
-
{G}^{-1}_{A} {1}_{A}
+
-
</tex>
+
-
 
+
-
Vector <tex>{u_A}</tex> has equal angles with all vectors from <tex>X_{A}.</tex>
+
-
 
+
-
After introducing all necessary designation one describe LARS. As
+
-
Stagewise, our algorithm starts from responses vector estimation
+
-
<tex> {\hat{\mu}}_0=0</tex>. After that it makes consecutive
+
-
steps in equiangular vector dimension. For each step we have
+
-
<tex> {\hat \mu}_{A} </tex> with estimation :
+
-
 
+
-
<tex> {\hat{c}}=X'({ y} - {\hat {\mu}}_{A} ). </tex>
+
-
 
+
-
Get active vectors indexes set <tex>A</tex> from
+
-
<tex>\hat{C}=\max_j(|\hat{c}_j|) \qquad {A}=\{j|\hat{c}_j=\hat{C} \} </tex>
 
-
<tex> s_j=sign\{ \hat{c}_j \}</tex>
 
-
<tex>j \in A. </tex>
 
-
Introduce additional vector:
 
-
<tex> a=X'{u}_{A}. </tex>
 
-
Write next approximation for estimation <tex>\hat{\mu}_{A}</tex>
+
====Experiments with simple LARS====
 +
We do a number of computational experiments with
 +
real data. For each experiment we calculate MAPE value.
-
<tex> {\hat{\mu}}_{A_+}= {\hat{\mu}}_{A}+\hat{\gamma}{{u_A}}, </tex>
+
In first case we use simple LARS algorithm without any additional
 +
improvements for matrix <tex>X</tex> and column <tex>Y</tex>.
-
where
+
<code>
 +
model = [1,1,0,0,0,0,0,0,0]; % model parameters
 +
</code>
-
<tex>
+
[[Изображение:EPF SC LARSresults.png|600px]]
-
\hat{\gamma}=\min_{j \in
+
[[Изображение:EPF SC MAPEresults.png|600px]]
-
{A}^c}{}^{+}(\frac{\hat{C}-\hat{c}_j}{A_{A}-a_j},\frac{\hat{C}+\hat{c}_j}{A_{A}+a_j}).
+
-
</tex>
+
-
This actions do <tex>\hat{c}</tex> minimization step for all variables with
+
-
indexes from set <\tex>A</tex>. In addition <tex>\hat{\gamma}</tex> is the least
+
-
positive <tex>\gamma</tex> on condition that we add new index to <tex>A</tex> on
+
-
the next step : <tex>{A}_{+}={A} \cup j.</tex>
+
-
This algorithm works only <tex>m</tex> steps. The experiment in article
+
Mean MAPE for all months equals 16.64%.
-
\cite{[1]} confirms, that LARS works better LASSO and Forward
+
-
Selection, and makes less number of iterations. Last step
+
-
<tex> {\beta}_m</tex> estimation gives least-squares solution.
+
-
==== Apply LARS ====
+
-
We use three steps in this algorithm.
+
For second experiment we use some additional variables. They are
 +
squares and square roots from basic variables
-
* Modify of initial data set <tex>X</tex> and set to classify <tex>\hat{X}</tex>. Get <tex>\tilde{X}, \hat{\tilde{X}}</tex>.
+
<code>
 +
model = [1,1,1,0,0,0,0,0,0]; % model parameters
 +
</code>
-
* LARS algorithm apply.
+
[[Изображение:EPF AV LARSresults.png|600px]]
 +
[[Изображение:EPF AV MAPEresults.png|600px]]
-
- We split our initial sample into two parts: for learning <tex>\{
+
Mean MAPE for all months equals 17.20%.
-
X^1,Y^1 \}</tex> and for control <tex>\{ X^2,Y^2\}</tex>.
+
-
- For learning sample we apply procedure of removing spikes from this part of
+
In third experiment we use smoothed and indicator variables.
-
sample.
+
-
- From <tex>\{X^1,Y^1 \} </tex> we get a set of weights vectors
+
<code>
-
<tex>\{ \beta_1, \beta_2, \dots , \beta_m \}</tex> for each step of LARS
+
model = [1,0,0,1,0,0,0,0,0]; % model parameters
-
algorithm.
+
</code>
-
- By using control sample <tex>\{ X^2,Y^2\}</tex> we choose <tex>\beta_i</tex> with best MAPE rate.
+
[[Изображение:EPF SS LARSresults.png|600px]]
 +
[[Изображение:EPF SS MAPEresults.png|600px]]
-
* Calculate <tex>\hat{Y}=\hat{X}\beta^T</tex>.
+
Mean MAPE equals 17.16%.
 +
For the next experiment we choose smoothed and squared smoothed
 +
variables.
-
==== Local algorithm Fedorovoy description ====
+
<code>
 +
model = [1,0,0,1,1,0,0,0,0]; % model parameters
 +
</code>
-
This algorithm is variation of k-means for time series. We have
+
[[Изображение:EPF CS LARSresults.png|600px]]
-
time series <tex>\tilde{f}=(f_1,\dots,f_n)</tex>. Suppose continuation of
+
[[Изображение:EPF CS MAPEresults.png|600px]]
-
time series depends on preceding events
+
-
<tex> {F}_n=(f_{n-l+1},\dots,f_n)</tex>. Select in our set subsets
+
-
<tex> {F}_r=(f_{r-l+1},\dots,f_r), r=\overline{l,n-\max(l,t)}.</tex>
+
MAPE equals 19.2%.
-
Introduce way to calculate distance <tex>\rho({F}_i,{F}_j)</tex>
+
In the last experiment of this subsection we choose combination of
-
between <tex>{F}_i</tex> and <tex>{F}_j</tex>.
+
nonsmoothed and smoothed variables.
-
Define set <tex>{A}</tex> of linear transformations for <tex>F</tex>:
+
<code>
 +
model = [1,1,1,1,0,0,0,0,0]; % model parameters
 +
</code>
-
<tex>
+
[[Изображение:EPF CSS LARSresults.png|600px]]
-
{A(F)}=\{ \check{F}^{a,b}|
+
[[Изображение:EPF CSS MAPEresults.png|600px]]
-
\check{F}^{a,b}=a{F}+b, a,b\in \mathbb{R} \}
+
-
</tex>
+
-
Note that for our problem importance of objects increases to the
+
We get MAPE 16.89% in this experiment.
-
end of vector <tex>{F}_i</tex> because of forecasting straightforward
+
-
vector <tex>{F}_{i+l}</tex>. Introduce parameter <tex>\lambda < 1</tex>. So,
+
-
our distance <tex>\rho_{u,v}</tex> is
+
-
<tex> \rho({F}_u,{F}_v)=\min_{a,b} \sum_{i=1}^l \lambda_i^2 (f_{u+i}-
+
From experiments above we draw a conclusion, that for simple
-
\check{f}_{v+i}^{a,b})^2 ,
+
linear method LARS MAPE is in bounds of 16-18%. Adding smoothed
-
</tex>
+
variables can't correct situation. From plots one can see problem
 +
of spikes prediction. MAPE for unexpected jumps is bigger, then
 +
MAPE for stationary zones. To increase accuracy of our forecast we
 +
need another methods.
-
where
+
====Removing spikes====
-
<tex>
+
To create better forecast we need to remove outliers from learning
-
\lambda_i=\lambda^{l-i+1},
+
sample. This procedure helps to get better regression
-
\check{F}_v^{a,b}=\{\check{f}_{v+1}^{a,b},\check{f}_{v+2}^{a,b},
+
coefficients. In our case remove spikes procedure depends on our
-
\dots,\check{f}_{v+l}^{a,b} \}.
+
basic algorithm model and two parameters. To get optimal value of
-
</tex>
+
this parameters we use 2 dimensional minimization MAPE procedure.
-
From that distance definition find <tex>k</tex> closest elements to
+
First experiment uses initial variables and removing spikes
-
<tex>{F}_n</tex>. We use this set of elements
+
procedure.
-
<tex>\{{F}_{(1)},\dots,{F}_{(k)} \}</tex>, sorted by ascending of
+
-
<tex>\rho_{n,(i)}</tex>, to create forecast by equation
+
-
<tex>
+
<code>
-
\hat{f}_{n+l+i+1}=\hat{f}_{n+l+i}+\sum_{j=1}^k \frac{\omega_j
+
model = [1,1,0,0,0,0,0,0,0]; % model parameters
-
({f}_{(j)+l+i+1}^{a,b} - {f}_{(j)+l+i}^{a,b})}{\sum_{i=1}^l
+
</code>
-
\omega_i},
+
-
</tex>
+
-
for each <tex>i \in \{ 0,1,2,\cdots, l-1\}</tex>, where <tex>\omega_j</tex>
+
[[Изображение:EPF P simplePicks.png|600px]]
-
proportional to distances <tex>\rho_{n,(j)}</tex>
+
-
<tex> \omega_j=\Big( 1-\frac{\rho_{n,(j)}}{\rho_{n,(k+1)}} \Big)^2. </tex>
+
Minimum MAPE equals 15.45\%. It is in point
 +
<tex>\{r_1,r_2\}=\{0.5,0.7\}</tex>. MAPE is smaller, then gained in
 +
previous subsection.
-
So ,we use integrated local averaged method to create forecast.
+
In second experiment we use another model. We add modified initial
-
This algorithm needs optimization by several parameters and it's
+
variables and smoothed variables.
-
not useful without this procedure \cite{[11]}. Our model requires
+
-
<tex>\{ k,\lambda\}</tex> (<tex>l</tex> is fixed and equals one month length ). To
+
-
define them we use two step iteration process from <tex>\{ k_0,\lambda_0\}
+
-
= \{ 5,0.5 \}</tex>:
+
-
* Optimization by <tex>\lambda</tex>.
+
<code>
-
* Optimization by <tex>k</tex>.
+
model = [1,1,1,1,0,0,0,0,0]; % model parameters
 +
</code>
-
It's necessary to keep in mind, that we use specified model of
+
[[Изображение:EPF P compliPicks.png|600px]]
-
general algorithm from \cite{[11]}. Function of distance and
+
-
weights for each element from set of <tex>{F}_{(i)}</tex> can be chosen
+
-
from another basic assumptions.
+
-
==== Apply Local algorithm Fedorovoy ====
+
MAPE function has local minimum in point <tex>\{r_1,r_2\}=\{0.8,1.0\}</tex>
 +
and equals 15.29\%. Let's look to plots for optimal <tex>r_1</tex> and
 +
<tex>r_2</tex>.
-
For Local algorithm Fedorovoy we use only responses vector <tex>Y</tex>. To
+
[[Изображение:EPF P LARSresults.png|600px]]
-
forecast future responses we have to initialize and optimize
+
[[Изображение:EPF P MAPEresults.png|600px]]
-
parameters of this algorithm (see section above).
+
-
* Optimize parameters <tex>k</tex>,<tex>\lambda</tex> for algorithm
+
Removing outliers doesn't help us to get better result for months
-
* Apply algorithm for last <tex>l</tex> elements of our responses set <tex>Y</tex>
+
with unexpected behavior. To the end of this subsection we note,
 +
that this procedure gives us advantage in competition with
 +
previously obtained models and algorithms in previous subsection.
 +
Best MAPE rate for this section experiments is 15.29%.
-
=== Set of algorithm variants and modifications ===
+
====Autoregression====
 +
Add autoregression according to subsection {\bf
 +
3.4.3} is sound idea for many forecasting algorithms. We use
 +
autoregression in two ways.
-
LARS algorithm works <tex>m</tex> steps, where <tex>m</tex> is number of variables
+
First way is add to each day from sample responses for days in
-
in matrix <tex>X</tex>. Expanding number of variables linear increases time
+
past and forecast responses for extended variables matrix. As in
-
for the algorithm to work. Let's introduce construction of new
+
previous subsection we use 2-dimensional discrete optimization for
-
variables generation.
+
removing spikes parameters.
-
<tex>\Xi = \{ \xi^i \}_{i=1}^m</tex> - initial set of variables.
 
-
\smallskip
 
-
<tex>{\bG} = \{ g_k \}_{k=1}^t</tex> - set of primitive functions so
 
-
called primitives with argument from <tex>\Xi</tex> and subset of <tex>\Xi</tex>.
 
-
\smallskip
 
-
<tex>\Xi'=\{ {\xi'}^i \}_{i=1}^{m'} </tex> - new set of variables. For
+
\includegraphics[width=0.80\textwidth]{Report/auto/autoPicks.png}
-
each variable
+
-
<tex>{\xi'}^i=f_i(\xi^{i^1},\xi^{i^2},\dots,\xi^{i^s}),</tex>
+
We change standard point of view to get more convenient to view
 +
surface. Minimum MAPE value is in point <tex>\{r_1,r_2\}=\{0.7,0.8\}</tex>.
 +
It equals 13.31\%.
-
where <tex>f_i</tex> is superposition of functions from <tex>{G}</tex>.
+
<code>
 +
model = [1,1,1,1,0,0,0,0.7,0.8]; % model parameters
 +
</code>
-
For our algorithm initial set of variables is <tex>X</tex>. We introduce
+
\includegraphics[width=0.40\textwidth]{Report/auto/LARSresults.png}
-
set of primitive functions from several components.
+
\includegraphics[width=0.40\textwidth]{Report/auto/MAPEresults.png}
-
==== Numericals operations ====
+
And for each month:
 +
\newline
 +
\includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults1.png}
 +
\newline
 +
\includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults2.png}
 +
\newline
 +
\includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults3.png}
 +
\newline
 +
\includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults4.png}
 +
\newline
 +
\includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults5.png}
 +
\newline
 +
\includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults6.png}
 +
\newline
 +
\includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults7.png}
 +
\newline
 +
\includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults8.png}
 +
\newline
 +
\includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults9.png}
 +
\newline
 +
\includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults10.png}
 +
\newline
 +
\includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults11.png}
 +
\newline
 +
\includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults12.png}
-
First component consists of simple numerical operations. For
 
-
example, square rooting or cubing. We add arithmetical operations
 
-
like addition and multiplication. Superposition of this functions
 
-
can be used in our new variables matrix <tex>X'</tex>.
 
-
==== Smoothing function ====
+
For this experiment we have best results from our set of different
 +
models algorithm. But to complete article it's necessary to
 +
analyze work for other algorithm and models we have use during
 +
work.
-
Second component consists of smoothing functions. We use Parzen
+
====Set of models for each day of week====
-
window smoothing with two parameters: width of Parzen window <tex>h</tex>
+
In data set with periodics can be suitable create it's
-
and periodic <tex>t</tex>. For each element from sample we calculate
+
own model for each element from periodic. For example, we can
-
<tex>
+
create different regression models for different dais of week.
-
{\xi'}_i^k = \sum_{j=-h}^h \omega_{j} \xi_{i+tj}^l,
+
Work of this heuristic in our case seems interesting.
-
</tex>
+
-
for each object, where <tex>k</tex>,<tex>l</tex> is indexes of elements from data
+
-
set. Only condition for <tex>\{\omega_j\}_{j=-h}^h</tex> is <tex> \sum_{j=-h}^h
+
-
\omega_{j} =1</tex>. In our case we use one from set of different kernel
+
-
functions. It is Epanechnikov kernel function. <tex>\omega_{\pm j}=\frac{3}{4k}(1-(\frac{j}{h})^2)</tex>, where <tex>k</tex> is normalization
+
-
coefficient, <tex>k=\sum_{j=-h}^h \omega_{j}</tex>. There is periodic with
+
-
step 7 in data set. For responses data set it seems reasonable
+
-
apply kernel smoothing several times for windows with different
+
-
periodic <tex>t \in \{ 1,7 \}</tex>.
+
-
==== Autoregression ====
+
In experiments we split in two parts: for control and for test. We
 +
forecast 4 day of week forward for each day. So, we create for
 +
each crossvalidation forecast for 28 days.
-
In our model we add autoregression to set of primitive functions
+
In first case we create model from standard variables. We test our
-
from variables and responses. For each variable adding we use
+
sample with different size of test sample (by what we choose best
-
parameters <tex>h\in H, H\subset \mathbb{N}</tex> -- shift of data.
+
<tex>\beta</tex>). Motivation of this step is small number of elements in
 +
control sample. It equals 4. But for different sizes of test set
 +
there is no decreasing of MAPE effect.
-
<tex>
+
<code>
-
{\xi'}_j^k=\xi_{j-h}^l,
+
model = [1,1,1,1,0,0,0,0,0,s]; % model parameters
-
</tex>
+
</code>
-
for each object, where <tex>k,l</tex> are indexes from data matrix. For
+
<tex>S</tex> is multitude to create size of test sample. Size of test set
-
objects in sample to classify we have to calculate it step by step
+
equals <tex>5\cdot s</tex>.
-
for responses. If there is no value for <tex>\xi_{i-h}^l</tex>, <tex>i<h</tex>, we
+
-
have to assign this value to be zero. It decreases correlation
+
-
value for this autoregression variable in LARS algorithm. But our
+
-
sample has about three thousands elements and for small values of
+
-
<tex>h</tex>, <tex>h<10</tex>, this factor decreases correlation in rate about
+
-
<tex>0.1-1\%</tex> or smaller in average.
+
-
As alternative we use another way to create autoregression matrix.
+
\begin{tabular}{|c|c|c|c|c|c|}
-
There are periodicals in our data set. To reshape our answers
+
-
matrix we create the set of new variables from responses. For
+
-
periodical length <tex>t</tex> choose <tex>H=\{1,2,\dots,t-1 \}</tex>. Then create
+
-
model for each part of periodical. From full sample of row indexes
+
-
<tex>{I}_t</tex> select <tex>I_{k,t}=\{ k, k+t, k+2t \dots \} </tex>. In set of
+
-
matrixes for each <tex>k\in\{1,2,\dots,t\}</tex> we get from LARS its own
+
-
model of linear regression. Otherwise, we modify our variables
+
-
matrix according to scheme
+
-
<tex>\begin{pmatrix}k&\xi_{k-1}^l&\xi_{k-2}^l&\cdots&\xi_{k-(t-1)}^l \\
+
-
k+t&\xi_{k+t-1}^l&\xi_{k+t-2}^l&\cdots&\xi_{k+t-(t-1)}^l \\
+
-
k+2t&\xi_{k+2t-1}^l&\xi_{k+2t-2}^l&\cdots&\xi_{k+2t-(t-1)}^l \\
+
-
\cdots&\cdots&\cdots&\cdots&\cdots \\ \end{pmatrix}</tex>
+
-
First row is indexes, what we leave in our variables matrix. To
+
\hline Control set size&4&8&12&16&20 \\
-
the right of them is variables, which we add to new variables
+
\hline MAPE rate&0.1740&0.1860&0.1790&0.1897&0.1846 \\
-
matrix.
+
\hline Control set size&24&28&32&36&40 \\
 +
\hline MAPE rate&0.1826&0.1882&0.1831& 0.1846&0.1952 \\
 +
\hline
 +
\end{tabular}
-
==== Removing spikes ====
+
From this table one can see, that results for this experiment
 +
aren't better, then results for model without choosing individual
 +
model for each day of week.
-
Another primitive function in our set is procedure of removing
+
Let's add autoregression matrix to this model according to way we
-
spikes from our sample. This function uses two parameters. It's
+
describe in {\bf 3.4.3}. We get <tex>\{ r_1,r_2 \}=\{ 0.3,0.3\}</tex> from
-
maximum rate of first and second type errors <tex>r_1,r_2</tex>. At first
+
two-dimensional optimization by this parameters.
-
step our algorithm classify all sample by using this initial
+
-
sample <tex>X</tex>. At second step we removing objects from sample if rate
+
-
of error to true value
+
-
<tex>\frac{|y_i-\hat{y}_i|}{|y_i|} > r_1</tex>
+
\includegraphics[width=0.80\textwidth]{Report/autoDOW/autoPicks.png}
-
or if rate of error to responses mean value
+
We got results below. MAPE equals 14.79\%.
-
<tex>\frac{|y_i-\hat{y}_i|}{\overline{|y_i|}} > r_2.</tex>
+
<code>
 +
model = [1,1,1,0,0,0,6,0.3,0.3,1]; % model parameters
 +
</code>
-
We get <tex>r_1,r_2</tex> from crossvalidation: 12 times we split sample
+
\includegraphics[width=0.40\textwidth]{Report/autoDOW/LARSresults.png}
-
to control and learning sample. For learning sample we create
+
\includegraphics[width=0.40\textwidth]{Report/autoDOW/MAPEresults.png}
-
forecast to control sample. By optimization <tex>r_1,r_2</tex> we get minimum of
+
-
mean MAPE for control set.
+
 +
To the end of this section we say, that for our data set this way
 +
can't give us as precise results as method from previous section.
-
==== Normalization ====
+
==== Local algorithm Fedorovoy====
 +
For algorithm Fedorovoy we don't use any additional
 +
hypotheses except described in works \cite{[11]},\cite{[12]}.
 +
Results for this algorithm are worse, that result for different
 +
LARS variations. Main complications in our data for this algorithm
 +
are big noises and spikes in learning and test data set and weak
 +
dependence future data from past. Motivation to use this algorithm
 +
is periodics in our data.
-
For some variables we need normalization. This primitive function
+
To get parameters for this algorithm we have used iteration
-
is necessary in our sample. It can be
+
method. At first iteration we got local minimum in point <tex>\{ k,\om
 +
\}=\{49, 0.9\}</tex>, where <tex>k</tex> is number of nearest neighbours for our
 +
algorithm and <tex>\om</tex> defines way to calculate distance between
 +
different segments of our sample.
-
<tex>{\xi'}_j^k=\frac{\xi_{j}^l-\xi_{min}^l}{\xi_{max}^l -
+
\includegraphics[width=0.40\textwidth]{Report/Local/kNNresults.png}
-
\xi_{min}^l}.</tex>
+
\includegraphics[width=0.40\textwidth]{Report/Local/MAPEresults.png}
-
for each <tex>\xi_j^l</tex> from <tex>\xi^l</tex>. This normalization makes nearly
+
MAPE equals 20.43\%.
-
equal initial conditions for each variable in LARS and that is we
+
-
need. This primitive is useful for responses and variables.
+
-
==== Additional notes ====
+
===Criterion analysis===
-
It's necessary to make some additional notes. Some functions can
+
We use MAPE quality functional. In our case target function to
-
be useful only for subset of variables and we have to take into
+
minimize is deviation of initial responses to ones from algorithm,
-
account this fact. Another thing to pay attention is a possible
+
so this functional looks suitable. For many variations and 2
-
usage primitives superposition.
+
algorithms we got set of different results. Let's briefly describe
 +
them in the table.
-
All this statements need additional researching and computing
+
\begin{tabular}{|c|c|c|c|c|c|}
-
experiments in other parts of this article to create good model
+
\hline Algorithm&LARS&RS LARS& RS and autoLARS& DOW LARS& LAF \\
-
for our problem.
+
\hline MAPE&16.64\%&15.29\%&{\bf 13.31\%}&14.79\%&20.43\% \\
 +
\hline
 +
\end{tabular}
-
== Описание системы ==
+
{\it LARS} is simple LARS realization with best set of variables.
-
* Ссылка на файл system.docs
+
{\it AutoLARS} is LARS with removing spikes procedure. {\it RS and
-
* Ссылка на файлы системы
+
autoLARS} is LARS with removing spikes and adding autoregression
 +
variables. {\it DOW LARS} is LARS with creation specified model
 +
for each periodic (in our case -- week periodic). For this
 +
algorithm we also add autoregression variables and remove spikes
 +
procedure.{\it LAF} is local algorithm Fedorovoy realization. For
 +
our variants of this algorithms set we get best result with {\it
 +
RS and autoLARS}.
-
== Отчет о вычислительных экспериментах ==
+
Main complication for most of algorithm was spikes and outliers in
 +
data set. Another complication for LARS-based algorithm was weak
 +
correlation between initial variables and responses. To this
 +
problem we add also high rate of noise. The reason of it can be
 +
unobservable in our data events. Some from our algoithms take into
 +
account this complications. They gave us better results, then
 +
simple algorithms. But we get only 4-5\% of MAPE rate by applying
 +
our best algorithm in complication with simple LARS.
-
=== Визуальный анализ работы алгоритма ===
+
===Dependency from parameters analysis===
-
=== Анализ качества работы алгоритма ===
+
In different sections of our work we use a number of optimization
 +
procedures. Frow plots and 3d plots one can see, that all
 +
optimizations are stabile and give us close to optimal parameters.
 +
In most algorithms we use simple discrete one- or two-dimensional
 +
optimization. For local algorithm Fedorovoy we use iterations
 +
method, but he gives us local optimum result at first step. May
 +
be, it's suitable to apply randomization search in this case.
-
=== Анализ зависимости работы алгоритма от параметров ===
+
==Results report==
-
== Отчет о полученных результатах ==
+
For our best algorithm we decrease MAPE rate by 5\% from initial
 +
algorithm (We get 13.31\% MAPE rate vs 17\% in start). To solve
 +
this problem we applied big set of heuristics and modifications to
 +
LARS. We also created realization of local algorithm
 +
k-nearest-neighbours to compare results.
== Список литературы ==
== Список литературы ==

Версия 08:17, 14 февраля 2010

Computing experiment report

Visual analysis

Model data

At first we have to test our interpretation of LARS to be sure about its correct work. LARS has to work better with well-conditioned matrix, as many linear algorithms, such as LASSO. For ill-conditioned matrix values of regression coefficients \beta appear to increase and decrease during each step of algorithm. We generate model data by the below code.

n = 100;
m = 20;
X = e*ones(n,m)+(1-e)*[diag(ones(m,1));zeros(n-m,m)];
beta = (rand(m,1)-0.5);
Y = X*beta;
X = X + (rand(n,m) - 0.5)/10;


For different values of e we get ill- and well-conditioned matrices. We use e=0.1k, k=\overline{1,10}. If e is small matrix'll be well-conditioned with close to orthogonal correlation vectors, if e is close to 1, we get matrix from just ones with little noise. This matrix is ill-conditioned. Directions of correlation vectors are near to same.


So, our LARS realization works normal according to this property.

Data description

As real data we use set from our problem. It consists from variables matrix xRegression and responses vector yRegression. We will designate them as X and Y. Size of X is n\times m, where n is number of objects (time series from days) and m is number of variables for each object. Size of Y is column n\times 1.

First column for both X and Y is time series. For Y second column is responses for each object from X. They are normalized by year mean. Number of variables in X is 26. They are described in below table.

# Description


1 time series
2-6 day of week
7-18 month
19 mean temperature
20 HDD
21 CDD
22 high tepmeratue
23 low temperature
24 relative humidity
25 precipitation
26 wind speed



Experiments with simple LARS

We do a number of computational experiments with real data. For each experiment we calculate MAPE value.

In first case we use simple LARS algorithm without any additional improvements for matrix X and column Y.

model = [1,1,0,0,0,0,0,0,0]; % model parameters

Mean MAPE for all months equals 16.64%.

For second experiment we use some additional variables. They are squares and square roots from basic variables

model = [1,1,1,0,0,0,0,0,0]; % model parameters

Mean MAPE for all months equals 17.20%.

In third experiment we use smoothed and indicator variables.

model = [1,0,0,1,0,0,0,0,0]; % model parameters

Mean MAPE equals 17.16%.

For the next experiment we choose smoothed and squared smoothed variables.

model = [1,0,0,1,1,0,0,0,0]; % model parameters

MAPE equals 19.2%.

In the last experiment of this subsection we choose combination of nonsmoothed and smoothed variables.

model = [1,1,1,1,0,0,0,0,0]; % model parameters

We get MAPE 16.89% in this experiment.

From experiments above we draw a conclusion, that for simple linear method LARS MAPE is in bounds of 16-18%. Adding smoothed variables can't correct situation. From plots one can see problem of spikes prediction. MAPE for unexpected jumps is bigger, then MAPE for stationary zones. To increase accuracy of our forecast we need another methods.

Removing spikes

To create better forecast we need to remove outliers from learning sample. This procedure helps to get better regression coefficients. In our case remove spikes procedure depends on our basic algorithm model and two parameters. To get optimal value of this parameters we use 2 dimensional minimization MAPE procedure.

First experiment uses initial variables and removing spikes procedure.

model = [1,1,0,0,0,0,0,0,0]; % model parameters

Изображение:EPF P simplePicks.png

Minimum MAPE equals 15.45\%. It is in point \{r_1,r_2\}=\{0.5,0.7\}. MAPE is smaller, then gained in previous subsection.

In second experiment we use another model. We add modified initial variables and smoothed variables.

model = [1,1,1,1,0,0,0,0,0]; % model parameters

Изображение:EPF P compliPicks.png

MAPE function has local minimum in point \{r_1,r_2\}=\{0.8,1.0\} and equals 15.29\%. Let's look to plots for optimal r_1 and r_2.

Removing outliers doesn't help us to get better result for months with unexpected behavior. To the end of this subsection we note, that this procedure gives us advantage in competition with previously obtained models and algorithms in previous subsection. Best MAPE rate for this section experiments is 15.29%.

Autoregression

Add autoregression according to subsection {\bf 3.4.3} is sound idea for many forecasting algorithms. We use autoregression in two ways.

First way is add to each day from sample responses for days in past and forecast responses for extended variables matrix. As in previous subsection we use 2-dimensional discrete optimization for removing spikes parameters.


\includegraphics[width=0.80\textwidth]{Report/auto/autoPicks.png}

We change standard point of view to get more convenient to view surface. Minimum MAPE value is in point \{r_1,r_2\}=\{0.7,0.8\}. It equals 13.31\%.

model = [1,1,1,1,0,0,0,0.7,0.8]; % model parameters

\includegraphics[width=0.40\textwidth]{Report/auto/LARSresults.png} \includegraphics[width=0.40\textwidth]{Report/auto/MAPEresults.png}

And for each month: \newline \includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults1.png} \newline \includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults2.png} \newline \includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults3.png} \newline \includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults4.png} \newline \includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults5.png} \newline \includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults6.png} \newline \includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults7.png} \newline \includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults8.png} \newline \includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults9.png} \newline \includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults10.png} \newline \includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults11.png} \newline \includegraphics[width=0.90\textwidth]{Report/EachMonth/LARSresults12.png}


For this experiment we have best results from our set of different models algorithm. But to complete article it's necessary to analyze work for other algorithm and models we have use during work.

Set of models for each day of week

In data set with periodics can be suitable create it's own model for each element from periodic. For example, we can create different regression models for different dais of week. Work of this heuristic in our case seems interesting.

In experiments we split in two parts: for control and for test. We forecast 4 day of week forward for each day. So, we create for each crossvalidation forecast for 28 days.

In first case we create model from standard variables. We test our sample with different size of test sample (by what we choose best \beta). Motivation of this step is small number of elements in control sample. It equals 4. But for different sizes of test set there is no decreasing of MAPE effect.

model = [1,1,1,1,0,0,0,0,0,s]; % model parameters

S is multitude to create size of test sample. Size of test set equals 5\cdot s.

\begin{tabular}{|c|c|c|c|c|c|}

\hline Control set size&4&8&12&16&20 \\ \hline MAPE rate&0.1740&0.1860&0.1790&0.1897&0.1846 \\ \hline Control set size&24&28&32&36&40 \\ \hline MAPE rate&0.1826&0.1882&0.1831& 0.1846&0.1952 \\ \hline \end{tabular}

From this table one can see, that results for this experiment aren't better, then results for model without choosing individual model for each day of week.

Let's add autoregression matrix to this model according to way we describe in {\bf 3.4.3}. We get \{ r_1,r_2 \}=\{ 0.3,0.3\} from two-dimensional optimization by this parameters.

\includegraphics[width=0.80\textwidth]{Report/autoDOW/autoPicks.png}

We got results below. MAPE equals 14.79\%.

model = [1,1,1,0,0,0,6,0.3,0.3,1]; % model parameters

\includegraphics[width=0.40\textwidth]{Report/autoDOW/LARSresults.png} \includegraphics[width=0.40\textwidth]{Report/autoDOW/MAPEresults.png}

To the end of this section we say, that for our data set this way can't give us as precise results as method from previous section.

Local algorithm Fedorovoy

For algorithm Fedorovoy we don't use any additional hypotheses except described in works \cite{[11]},\cite{[12]}. Results for this algorithm are worse, that result for different LARS variations. Main complications in our data for this algorithm are big noises and spikes in learning and test data set and weak dependence future data from past. Motivation to use this algorithm is periodics in our data.

To get parameters for this algorithm we have used iteration method. At first iteration we got local minimum in point \{ k,\om
\}=\{49, 0.9\}, where k is number of nearest neighbours for our algorithm and \om defines way to calculate distance between different segments of our sample.

\includegraphics[width=0.40\textwidth]{Report/Local/kNNresults.png} \includegraphics[width=0.40\textwidth]{Report/Local/MAPEresults.png}

MAPE equals 20.43\%.

Criterion analysis

We use MAPE quality functional. In our case target function to minimize is deviation of initial responses to ones from algorithm, so this functional looks suitable. For many variations and 2 algorithms we got set of different results. Let's briefly describe them in the table.

\begin{tabular}{|c|c|c|c|c|c|}

\hline Algorithm&LARS&RS LARS& RS and autoLARS& DOW LARS& LAF  \\
\hline MAPE&16.64\%&15.29\%&{\bf 13.31\%}&14.79\%&20.43\% \\
\hline

\end{tabular}

{\it LARS} is simple LARS realization with best set of variables. {\it AutoLARS} is LARS with removing spikes procedure. {\it RS and autoLARS} is LARS with removing spikes and adding autoregression variables. {\it DOW LARS} is LARS with creation specified model for each periodic (in our case -- week periodic). For this algorithm we also add autoregression variables and remove spikes procedure.{\it LAF} is local algorithm Fedorovoy realization. For our variants of this algorithms set we get best result with {\it RS and autoLARS}.

Main complication for most of algorithm was spikes and outliers in data set. Another complication for LARS-based algorithm was weak correlation between initial variables and responses. To this problem we add also high rate of noise. The reason of it can be unobservable in our data events. Some from our algoithms take into account this complications. They gave us better results, then simple algorithms. But we get only 4-5\% of MAPE rate by applying our best algorithm in complication with simple LARS.

Dependency from parameters analysis

In different sections of our work we use a number of optimization procedures. Frow plots and 3d plots one can see, that all optimizations are stabile and give us close to optimal parameters. In most algorithms we use simple discrete one- or two-dimensional optimization. For local algorithm Fedorovoy we use iterations method, but he gives us local optimum result at first step. May be, it's suitable to apply randomization search in this case.

Results report

For our best algorithm we decrease MAPE rate by 5\% from initial algorithm (We get 13.31\% MAPE rate vs 17\% in start). To solve this problem we applied big set of heuristics and modifications to LARS. We also created realization of local algorithm k-nearest-neighbours to compare results.

Список литературы

Данная статья является непроверенным учебным заданием.
Студент: Участник:Зайцев Алексей
Преподаватель: Участник:В.В. Стрижов
Срок: 15 декабря 2009

До указанного срока статья не должна редактироваться другими участниками проекта MachineLearning.ru. По его окончании любой участник вправе исправить данную статью по своему усмотрению и удалить данное предупреждение, выводимое с помощью шаблона {{Задание}}.

См. также методические указания по использованию Ресурса MachineLearning.ru в учебном процессе.


Личные инструменты