Scripting Tutorial

Start by importing the relevant objects:

>>> from orangecontrib.timeseries import *

Let’s load new Timeseries, for example:

>>> data = Timeseries.from_file('airpassengers')
>>> np.set_printoptions(precision=1)

Timeseries object is just an Orange.data.Table object with some extensions.

Find more info and function docstrings in the reference.

Periodicity

You can compute periodogram values using periodogram() or periodogram_nonequispaced() (Lomb-Scargle) for non-uniformly spaced time series.

With our air passengers example, calculate the periodogram on the only data-bearing column, which also happens to be a class variable:

>>> periods, pgram_values = periodogram(data.Y, detrend='diff')
>>> periods
array([  2.4,   3. ,   4. ,   6. ,  11.9])
>>> pgram_values
array([0.1, 0.2, 0.2, 1. , 0.9])

Obviously, 6 and 12 are important periods for this data set.

Autocorrelation

Compute autocorrelation or partial autocorrelation coefficients using autocorrelation() or partial_autocorrelation() functions. For example:

>>> acf = autocorrelation(data.Y)
>>> acf[:4]
array([[12. ,  0.8],
       [24. ,  0.6],
       [36. ,  0.4],
       [48. ,  0.2]])
>>> pacf = partial_autocorrelation(data.Y)
>>> pacf[:4]
array([[ 9. ,  0.2],
       [13. , -0.5],
       [25. , -0.2],
       [40. , -0.1]])

Interpolation

Let’s say your data is missing some values:

>>> data.Y[7:11]
array([148., 136., 119., 104.])
>>> data.Y[7:11] = np.nan

You can interpolate those values with one of supported interpolation methods using interpolate_timeseries() function:

>>> interpolated = interpolate_timeseries(data, method='cubic')
>>> interpolated[7:11].Y
array([151.2, 146.8, 137.8, 127.2])
>>> data = interpolated

Seasonal decomposition

To decompose the time series into trend, seasonal and residual components, use seasonal_decompose() function:

>>> from Orange.data import Domain
>>> passengers = Timeseries.from_table(Domain(['Air passengers'], source=data.domain), data)
>>> decomposed = seasonal_decompose(passengers, model='multiplicative', period=12)
>>> decomposed.domain
[Air passengers (season. adj.), Air passengers (seasonal), Air passengers (trend), Air passengers (residual)]

To use this decomposed time series effectively, we just have to add back the time variable that was stripped in the first step above:

>>> ts = Timeseries.concatenate((data, decomposed))
>>> ts.time_variable = data.time_variable

Just kidding. Use statsmodels.seasonal.seasonal_decompose() instead.

Moving transform

It’s easy enough to apply moving windows transforms over any raw data in Python. In Orange3-Timeseries, you can use moving_transform() function. It accepts a time series object and a transform specification (list of tuples (Variable, window length, aggregation function)). For example:

>>> spec = [(data.domain['Air passengers'], 10, np.nanmean), ]  # Just 10-year SMA
>>> transformed = moving_transform(data, spec)
>>> transformed.domain
[Month, Air passengers (10; nanmean) | Air passengers]
>>> transformed
[[1949-01-01, 112.000 | 112],
 [1949-02-01, 115.000 | 118],
 [1949-03-01, 120.667 | 132],
 [1949-04-01, 122.750 | 129],
 [1949-05-01, 122.400 | 121],
 ...
]

There are a couple of nan-safe aggregation functions available in orangecontrib.timeseries.agg_funcs module.

Time series modelling and forecast

There are, as of yet, two models available: ARIMA and VAR. Both models have a common interface, so the usage of one is similar to the other. Let’s look at an example. The data we model must have defined a class variable:

>>> data = Timeseries.from_file('airpassengers')
>>> data.domain
[Month | Air passengers]
>>> data.domain.class_var
ContinuousVariable(name='Air passengers', number_of_decimals=0)

We define the model with its parameters (see the reference for what arguments each model accepts):

>>> model = ARIMA((2, 1, 1))

Now we fit the data:

>>> model.fit(data)
<...ARIMA object at 0x...>

After fitting, we can get the forecast along with desired confidence intervals:

>>> forecast, ci95_low, ci95_high = model.predict(steps=10, alpha=.05)

We can also output the prediction as a Timeseries object:

>>> forecast = model.predict(10, as_table=True)
>>> forecast.domain
[Air passengers (forecast), Air passengers (95%CI low), Air passengers (95%CI high)]
>>> forecast.X
array([[470.5, 417.8, 523.1],
       [492.6, 414.1, 571.1],
       [498.5, 411.5, 585.4],
       ...
       [492.7, 403. , 582.4],
       [497.1, 407.3, 586.8]])

We can examine model’s fitted values and residuals with appropriately-named methods:

>>> model.fittedvalues(as_table=False)
array([114.7, 121.7, ..., 440.4, 386.8])
>>> model.residuals(as_table=False)
array([ 3.3,  10.3, ..., -50.4,  45.2])

We can evaluate the model on in-sample, fitted values:

>>> for measure, error in sorted(model.errors().items()):
...     print('{:7s} {:>6.2f}'.format(measure.upper(), error))
MAE      19.66
MAPE      0.08
POCID    58.45
R2        0.95
RMSE     27.06

Finally, one should more robustly evaluate their models using cross validation. An example, edited for some clarity:

>>> models = [ARIMA((1, 1, 0)), ARIMA((2, 1, 2)), VAR(1), VAR(3)]
>>> model_evaluation(data, models, n_folds=10, forecast_steps=3)  
[['Model',                    'RMSE', 'MAE', 'MAPE', 'POCID', 'R²', 'AIC', 'BIC'],
 ['ARIMA(1,1,0)',             47.318, 36.803, 0.093, 68.965, 0.625, 1059.3, 1067.4],
 ['ARIMA(1,1,0) (in-sample)', 32.040, 20.340, 0.089, 58.450, 0.927, 1403.4, 1412.3],
 ['ARIMA(2,1,2)',             44.659, 28.332, 0.075, 72.413, 0.666, 1032.8, 1049.2],
 ['ARIMA(2,1,2) (in-sample)', 25.057, 16.159, 0.070, 59.859, 0.955, 1344.0, 1361.8],
 ['VAR(1)',                   63.185, 45.553, 0.118, 68.965, 0.332, 28.704, 28.849],
 ['VAR(1) (in-sample)',       31.316, 19.001, 0.084, 54.929, 0.930, 29.131, 29.255],
 ['VAR(3)',                   46.210, 28.526, 0.085, 82.758, 0.643, 28.140, 28.482],
 ['VAR(3) (in-sample)',       25.642, 18.010, 0.072, 61.428, 0.953, 28.406, 28.698]]

Granger Causality

Use granger_causality() to estimate causality between series. A synthetic example:

>>> series = np.arange(100)
>>> X = np.column_stack((series, np.roll(series, 1), np.roll(series, 3)))
>>> threecol = Timeseries.from_numpy(Domain.from_numpy(X), X)
>>> for lag, ante, cons in granger_causality(threecol, 10):
...     if lag > 1:
...         print('Series {cons} lags by {ante} by {lag} lags.'.format(**locals()))
...
Series Feature 1 lags by Feature 2 by 3 lags.
Series Feature 2 lags by Feature 3 by 4 lags.

Use this knowledge wisely.