Scripting Tutorial

Start by importing the relevant objects:

>>> from orangecontrib.timeseries import *

Let’s load new Timeseries, for example:

>>> data = Timeseries('airpassengers')

Timeseries object is just an Orange.data.Table object with some extensions.

Find more info and function docstrings in the reference.

Periodicity

You can compute periodogram values using periodogram() or periodogram_nonequispaced() (Lomb-Scargle) for non-uniformly spaced time series.

With our air passengers example, calculate the periodogram on the only data-bearing column, which also happens to be a class variable:

>>> periods, pgram_values = periodogram(data.Y, detrend='diff')
>>> periods
array([  2.38333333,   3.04255319,   3.97222222,   5.95833333,  11.91666667])
>>> pgram_values
array([ 0.14585092,  0.17023564,  0.23531016,  1.        ,  0.90136873])

Obviously, 6 and 12 are important periods for this data set.

Autocorrelation

Compute autocorrelation or partial autocorrelation coefficients using autocorrelation() or partial_autocorrelation() functions. For example:

>>> acf = autocorrelation(data.Y)
>>> acf[:4]
array([[  12.        ,    0.82952186],
       [  24.        ,    0.6386278 ],
       [  36.        ,    0.4493648 ],
       [  48.        ,    0.19895185]])
>>> pacf = partial_autocorrelation(data.Y)
>>> pacf[:4]
array([[  9.        ,   0.23248854],
       [ 13.        ,  -0.53969124],
       [ 25.        ,  -0.16274616],
       [ 40.        ,  -0.08833969]])

Interpolation

Let’s say your data is missing some values:

>>> data.Y[7:11]
array([ 148.,  136.,  119.,  104.])
>>> data.Y[7:11] = np.nan

You can interpolate those values with one of supported interpolation methods using interpolate_timeseries() function:

>>> interpolated = interpolate_timeseries(data, method='cubic')
>>> interpolated[7:11].Y
array([ 151.22663433,  146.80661022,  137.77326894,  127.15995178])
>>> data = interpolated

Seasonal decomposition

To decompose the time series into trend, seasonal and residual components, use seasonal_decompose() function:

>>> passengers = Timeseries(Domain(['Air passengers'], source=data.domain), data)
>>> decomposed = seasonal_decompose(passengers, model='multiplicative', period=12)
>>> decomposed.domain
[Air passengers (season. adj.), Air passengers (seasonal), Air passengers (trend), Air passengers (residual)]

To use this decomposed time series effectively, we just have to add back the time variable that was stripped in the first step above:

>>> ts = Timeseries(Timeseries.concatenate((data, decomposed)))
>>> ts.time_variable = data.time_variable

Just kidding. Use statsmodels.seasonal.seasonal_decompose() instead.

Moving transform

It’s easy enough to apply moving windows transforms over any raw data in Python. In Orange3-Timeseries, you can use moving_transform() function. It accepts a time series object and a transform specification (list of tuples (Variable, window length, aggregation function)). For example:

>>> spec = [(data.domain['Air passengers'], 10, np.nanmean), ]  # Just 10-year SMA
>>> transformed = moving_transform(data, spec)
>>> transformed.domain
[Month, Air passengers (10; nanmean) | Air passengers]
>>> transformed
[[1949-01-01, 112.000 | 112.000],
 [1949-02-01, 115.000 | 118.000],
 [1949-03-01, 120.667 | 132.000],
 [1949-04-01, 122.750 | 129.000],
 [1949-05-01, 122.400 | 121.000],
 ...
]

There are a couple of nan-safe aggregation functions available in orangecontrib.timeseries.agg_funcs module.

Time series modelling and forecast

There are, as of yet, two models available: ARIMA and VAR. Both models have a common interface, so the usage of one is similar to the other. Let’s look at an example. The data we model must have defined a class variable:

>>> data = Timeseries('airpassengers')
>>> data.domain
[Month | Air passengers]
>>> data.domain.class_var
ContinuousVariable(name='Air passengers', number_of_decimals=3)

We define the model with its parameters (see the reference for what arguments each model accepts):

>>> model = ARIMA((2, 1, 1))

Now we fit the data:

>>> model.fit(data)
<...ARIMA object at 0x...>

After fitting, we can get the forecast along with desired confidence intervals:

>>> forecast, ci95_low, ci95_high = model.predict(steps=10, alpha=.05)

We can also output the prediction as a Timeseries object:

>>> forecast = model.predict(10, as_table=True)
>>> forecast.domain
[Air passengers (forecast), Air passengers (95%CI low), Air passengers (95%CI high)]
>>> np.set_printoptions(precision=1)
>>> forecast.X
array([[ 470.5,  417.8,  523.2],
       [ 492.6,  414.1,  571.1],
       [ 498.5,  411.5,  585.4],
       ...
       [ 492.7,  403. ,  582.4],
       [ 497.1,  407.3,  586.8]])

We can examine model’s fitted values and residuals with appropriately-named methods:

>>> model.fittedvalues(as_table=False)
array([ 114.7,  121.7,  ..., 440.4,  386.8])
>>> model.residuals(as_table=False)
array([ 3.3,  10.3, ..., -50.4,  45.2])

We can evaluate the model on in-sample, fitted values:

>>> for measure, error in sorted(model.errors().items()):
...     print('{:7s} {:>6.2f}'.format(measure.upper(), error))
MAE      19.67
MAPE      0.08
POCID    58.45
R2        0.95
RMSE     27.06

Finally, one should more robustly evaluate their models using cross validation. An example, edited for some clarity:

>>> models = [ARIMA((1, 1, 0)), ARIMA((2, 1, 2)), VAR(1), VAR(3)]
>>> model_evaluation(data, models, n_folds=10, forecast_steps=3)  
[['Model',                    'RMSE', 'MAE', 'MAPE', 'POCID', 'R²', 'AIC', 'BIC'],
 ['ARIMA(1,1,0)',             47.318, 36.803, 0.093, 68.965, 0.625, 1059.3, 1067.4],
 ['ARIMA(1,1,0) (in-sample)', 32.040, 20.340, 0.089, 58.450, 0.927, 1403.4, 1412.3],
 ['ARIMA(2,1,2)',             44.659, 28.332, 0.075, 72.413, 0.666, 1032.8, 1049.2],
 ['ARIMA(2,1,2) (in-sample)', 25.057, 16.159, 0.070, 59.859, 0.955, 1344.0, 1361.8],
 ['VAR(1)',                   63.185, 45.553, 0.118, 68.965, 0.332, 28.704, 28.849],
 ['VAR(1) (in-sample)',       31.316, 19.001, 0.084, 54.929, 0.930, 29.131, 29.255],
 ['VAR(3)',                   46.210, 28.526, 0.085, 82.758, 0.643, 28.140, 28.482],
 ['VAR(3) (in-sample)',       25.642, 18.010, 0.072, 61.428, 0.953, 28.406, 28.698]]

Granger Causality

Use granger_causality() to estimate causality between series. A synthetic example:

>>> series = np.arange(100)
>>> X = np.column_stack((series, np.roll(series, 1), np.roll(series, 3)))
>>> threecol = Timeseries(Domain.from_numpy(X), X)
>>> for lag, ante, cons in granger_causality(threecol, 10):
...     if lag > 1:
...         print('Series {cons} lags by {ante} by {lag} lags.'.format(**locals()))
...
Series Feature 1 lags by Feature 2 by 5 lags.
Series Feature 1 lags by Feature 3 by 4 lags.
Series Feature 2 lags by Feature 3 by 2 lags.

Use this knowledge wisely.