Welcome to PECUZAL attractor reconstruction for MATLAB
PECUZAL is an automatic approach for embedding time series in MATLAB. It is based on the paper Kraemer et al. 2021 (Open Source), where the functionality is explained in detail. Here we give an introduction to its easy usage in three examples. Enjoy Embedding! Note, that you are welcome to use this Live-Script on your local machine. Download the file from the git repository.
Installation
There are two ways of using the proposed PECUZAL method:
- Install as Toolbox. This is the easiest way and allows the usage of the the function pecuzal_embedding independently from your current working directory. It is available like a built-in MATLAB-function and you do not have to copy any files etc. For this simply download the 'pecuzal-embedding.mltbx' from this repository or from MATLAB-Central and double-click pecuzal-embedding.mltbx for installation. That's it.
- You can also download this repository and copy the folder into the MATLAB user's directory. This is usually the user's "Documents" folder appended with "MATLAB" (you can find out using the function userpath). Add the toolbox by the addpath command, e.g., addpath ~/Documents/MATLAB/PECUZAL_Matlab on a Linux system. For everyday use, copy this command to a startup.m file in the MATLAB user's directory.
Citing and reference
If you enjoy this tool and find it valuable for your research please cite
- Kraemer et al., "A unified and automated approach to attractor reconstruction", New Journal of Physics 23(3), 033017. doi:10.1088/1367-2630/abe336 (2021).
or as BiBTeX-entry:
@article{Kraemer2021,
doi = {10.1088/1367-2630/abe336},
url = {https://doi.org/10.1088/1367-2630/abe336},
year = 2021,
month = {mar},
publisher = {{IOP} Publishing},
volume = {23},
number = {3},
pages = {033017},
author = {K H Kraemer and G Datseris and J Kurths and I Z Kiss and J L Ocampo-Espindola and N Marwan},
title = {A unified and automated approach to attractor reconstruction},
journal = {New Journal of Physics}
}
Basic usage
Let us first take a look at the help of the pecuzal_embedding function:
help pecuzal_embedding
pecuzal_embedding automated phase space reconstruction method.
Y = pecuzal_embedding(X) reconstructs fully automatically the
(N-by-M)-matrix Y of phase space vectors from the time series matrix X
of size (N0-by-M0) where M0 could be 1 (for time series) or larger for
multivariate time series.
Y = pecuzal_embedding(X, TAUS) reconstructs the phase space vectors
by using vector TAU for which possible delay times the algorithm shall
look (default [0:50]).
[Y, TAU_VALS, TS_VALS] = pecuzal_embedding(...) provides a vector of (different)
time delays TAU_VALS chosen by the algorithm and a vector TS_VALS with the
indices between 1 and M0 of chosen time series from matrix X, corresponding to
the delays TAU_VALS.
[Y, TAU_VALS, TS_VALS, L, E] = pecuzal_embedding(...) provides a vector L
of L-statistic values for the reconstruction in each encountered embedding cycle.
This also includes the L-value of the very last embedding cycle, which does
not contribute to the final embedding, due to an increasing L-value. The cell
array E contains all epsilon-mins of the continuity statistic as a function of
TS_VALS for each encountered time series in X.
... = pecuzal_embedding(..., Name, Value) specifies further optional parameters
for the embedding algorithm using one or more Name, Value pair arguments.
Optional name-value-arguments:
'sample_size' - (default 1) size of the random phase space vector sample the
algorithm considers for each delay value, in order to compute
the continuity statistic. This is a float from the intervall
(0 1]. The size of the considered sample is SAMPLE_SIZE*N
(where N is length of the current phase space trajectory).
'theiler' - (default 1) the temporal correlation window (Theiler window)
for which no nearest neighbours are considered, because they
could be direct predecessors or successors of the fiducial point.
When input X is a multivariate dataset, the Theiler window needs
to be chosen as the maximum from each of the time series.
'alpha' - (default 0.05) significance level for the continuity statistic.
'p' - (default 0.5) binomial parameter for the continuity statistic.
'KNN' - (default 13) maximal number of points in the delta-neighbourhood.
For a range of points from [8:KNN] the minimial epsilon-scale
according to the significance level 'alpha' and binomial parameter
'p' is computed and the minimum of all considered delta-neighborhood
sizes is finally chosen. Must be at least 8.
'k' - (default 3) considered number of nearest neighbors for Uzal's
L-statistic.
'L_thres' - (default 0) The algorithm breaks, when this threshold is exceeded
by `ΔL` in an embedding cycle (set as a positive number, i.e. an
absolute value of `ΔL`).
'max_cycles' - (default 10) maximum number of embedding cycles the algorithm
performs after which it stops.
'econ' - (default False) Economy-mode for L-statistic computation. Instead of
computing L-statistics for time horizons `2:Tw`, here we only compute them for
`2:2:Tw`.
Example:
% Reconstruct phase space vectors from the 2nd component of the Roessler system
roessler = @(t,x) [-x(2)-x(3); x(1)+0.2*x(2); 0.2+x(3)*(x(1)-5.7)]; % Roessler system
[t, x] = ode45(roessler, [0:.2:1500], [1.; .5; 0.5]); % solve ODE
x(1:2501,:) = []; % remove transients
plot3(x(:,1), x(:,2), x(:,3))
y = pecuzal_embedding(x(:,2), 0:100, 'theiler', 7); % reconstruct using PECUZAL
plot3(y(:,1), y(:,2), y(:,3))
Further reading:
H. K. Kraemer et al., … 2021
See also pecora_embedding_cycle, uzal_cost_pecuzal, uzal_cost
The function expects at least one input argument (a uni- or multivariate time series) and returns the phase space vectors. Several optional parameters, some as (Name, Value)-pairs, can be used fine tune the algorithm. We recommend to stick to the default values first, as described in [kraemer2020], and only parse an adequate Theiler-window, in order to exclude serially correlated time series values from the nearest neighbour search. We will see in the following how that could be done.
Univariate example (Rössler system)
We exemplify the proposed embedding method by embedding the y-component of the Rössler system with standard parameters
set(0, 'DefaultAxesFontSize', 14, 'DefaultLineLineWidth', 1) % increase font size and line width for the figures
% set parameters for standard Roessler
% Set time series length and sampling
tf = (N+transient_time) * dt;
roe = @(t,x) [-x(2)-x(3); x(1)+a*x(2); b+x(3)*(x(1)-c)];
x0 = [1, -2, 0.1]; % initial conditions
[t, x] = ode45(roe, t, x0);
x(1:transient_time+1,:) = [];
t = t(1:end-transient_time-1);
We focus on the y-component and compute the auto mutual information (MI), in order to estimate an appropriate Theiler window. This is especially important when dealing with highly sampled datasets. Let us also plot the data and the MI.
y = x(:,2); % bind y-component
mi = mutualinformation(y,50); % compute MI
figure('Units', 'normalized', 'Position', [.3 .3 .7 .5])
plot(t(1:1000), y(1:1000))
title('y-component of Roessler time series')
plot(mi(:,1), mi(:,2), '.-', 'MarkerSize',15)
xlabel('time delay \tau')
ylabel('mutual information [nats]')
title('Mutual information for y-component of Roessler time series')
Now we are ready to go and call the PECUZAL algorithm pecuzal_embedding with a Theiler window determined from the first minimum of the mutual information shown in the above Figure, i.e. 'theiler'=7, and possible delays ranging in the interval [0:50]. We also use the 'econ' option for an accelerated computation.
NOTE: The following computation will take approximately 2 to 3 minutes (depending on the machine you are running the code on).
[Y_reconstruct, tau_vals, ts_vals, Ls, E] = pecuzal_embedding(y, 0:50, 'theiler', 7, 'econ', true);
Algorithm stopped due to minimum L-value reached. VALID embedding achieved.
After the computation has finished, the function returns the following note in the console:
Algorithm stopped due to minimum L-value reached. VALID embedding achieved.
The matrix Y_reconstruct contains the reconstructed trajectory. Since in this example matrix Y_reconstruct is a three-column matrix, it corresponds to a three-dimensional phase space trajectory which we can visualize in a 3D plot.
plot3(Y_reconstruct(:,1), Y_reconstruct(:,2), Y_reconstruct(:,3))
xlabel(sprintf('y(t+%i)', tau_vals(1)))
ylabel(sprintf('y(t+%i)', tau_vals(2)))
zlabel(sprintf('y(t+%i)', tau_vals(3)))
title('PECUZAL reconstructed Roessler system')
plot3(x(:,1), x(:,2), x(:,3))
title('Original Roessler system')
For the correct axis labels we used the delay values the PECUZAL algorithm used and which are returned by the output-variable tau_vals.
This means, that the reconstructed trajectory consists of the unlagged time series (here the y-component) and two more components with the time series lagged by 7 and 15 samples, respectively. Note the coincidence with the first minimum of the mutual information. The output variable ts_vals contains the indices regarding the chosen time series for each delay value stored in tau_vals. Because there is only one time series we have used, it is simply
We can look at the continuity statistic the algorithm has used to find the embedding. These statistics for each embedding cycle are provided by the cell array E.
plot(0:length(E{1})-1, E{1}(:,1), 'LineWidth',3), hold on
scatter(tau_vals(2), E{1}(tau_vals(2)+1,1), 'k', 'filled','HandleVisibility',"off")
plot(0:length(E{2})-1, E{2}(:,1), 'LineWidth',3)
scatter(tau_vals(3), E{2}(tau_vals(3)+1,1), 'k', 'filled','HandleVisibility',"off")
plot(0:length(E{3})-1, E{3}(:,1), 'LineWidth',3)
legend('1st embedding cycle', '2nd embedding cycle', '3rd embedding cycle', 'Location', 'northeastoutside')
title('PECUZAL continuity statistics for Roessler y-component')
ylabel('$\langle \varepsilon^\star \rangle$', 'Interpreter', 'latex')
The black markers point at the postitions, where the algorithm picked the delays for the reconstruction from. In the 3rd embedding cycle there is no delay value picked and the algorithm stops, because it cannot minimize ΔL further. Its values for each embedding cycle are stored in vector Ls:
Ls
-0.671367774405204 -0.676055155971805
Thus, the total decrease of ΔL over all encountered embedding cycles is
L_total_uni = sum(Ls)
L_total_uni =
-1.347422930377009
Multivariate example (Rössler system)
Similar to the approach in the preceeding section, we highlight the capability of the proposed embedding method for a multivariate input. The idea is now to use all three time series to the algorithm, even though this is a very far-from-reality example. We already have an adequate representation of the system we want to reconstruct, namely the three time series from the numerical integration. But let us see what PECUZAL suggests for a reconstruction. Since we have with three time series now, we estimate the Theiler window as the maximum of all Theiler windows of each time series. Again, we estimate the Theiler window by taking the first minimum of the auto mutual information.
mis = zeros(length(mi),3);
Z = mutualinformation(x(:,i), 50, 0);
figure('Units', 'normalized', 'Position', [.3 .3 .7 .7])
plot(t(1:1000), x(1:1000,i))
title([char(119+i), '-component of Roessler time series'])
xlabel('time lag [in sampling units]')
title(['Mutual information for ', char(119+i), '-component of Roessler time series'])
Due to the spikyness of the z-component the according auto mutual information yields no meaningful result here. So we stick to the choice of theiler = 7 and call the PECUZAL algorithm with the 'econ' option and possible delays in the interval [0:50].
NOTE: The following computation will take approximately 10-12 minutes (depending on the machine you are running the code on).
[Y_reconstruct_m, tau_vals_m, ts_vals_m, Ls_m, E_m] = pecuzal_embedding(x, 0:50, 'theiler', 7, 'econ', true);
Algorithm stopped due to minimum L-value reached. VALID embedding achieved.
The computation end with the following note in the console:
Algorithm stopped due to minimum L-value reached. VALID embedding achieved.
The suggested embedding parameters
reveal that PECUZAL reconstructs the trajectory Y_reconstruct_m from the unlagged y-component time series (index 0), the unlagged x-component and finally again the x-component but now lagged by 3 samples. As expected the total ΔL-value is smaller here than in the univariate case, but surprisingly it is not much of a difference: L_total_multi = sum(Ls_m)
L_total_multi =
-1.363109645925171
The reconstructed attractor looks also quite similar to the original one, even though that is not a proper evaluation criterion for the goodness of a reconstruction, see [kraemer2020]. figure('Units','normalized','Position',[.3 .3 .7 .5])
plot3(Y_reconstruct_m(:,1), Y_reconstruct_m(:,2), Y_reconstruct_m(:,3))
xlabel(sprintf('%s(t+%i)', char(119+ts_vals_m(1)), tau_vals(1)))
ylabel(sprintf('%s(t+%i)', char(119+ts_vals_m(2)), tau_vals(2)))
zlabel(sprintf('%s(t+%i)', char(119+ts_vals_m(3)), tau_vals(3)))
title('PECUZAL reconstructed Roessler system')
plot3(x(:,1), x(:,2), x(:,3))
title('Original Roessler system')
Stochastic source example
Finally we demonstrate how the PECUZAL method works with non-deterministic input data. Therefore, we create a simple AR(1)-process:
M = 2000; % length of the time series
x(1) = .2; % intial value
alpha = .9; % auto-correlation parameter
p = .2; % noise amplitude
x(i) = alpha*x(i-1) + p*randn;
x = x(10:end); % remove transients
When we now call the PECUZAL algorithm
[Y_reconstruct_s, tau_vals_s, ts_vals_s, Ls_s, eps_s] = pecuzal_embedding(x, 'econ', true);
Algorithm stopped due toincreasing L-values in the first embedding cycle. NO valid embedding achieved.
This means, the algorithm could not obtain any valid embedding, thus, it values the input data as a non-deterministic source.