Welcome to PECUZAL attractor reconstruction for MATLAB

Table of Contents
PECUZAL is an automatic approach for embedding time series in MATLAB. It is based on the paper Kraemer et al. 2021 (Open Source), where the functionality is explained in detail. Here we give an introduction to its easy usage in three examples. Enjoy Embedding!
Note, that you are welcome to use this Live-Script on your local machine. Download the file from the git repository.

Installation

There are two ways of using the proposed PECUZAL method:

Citing and reference

If you enjoy this tool and find it valuable for your research please cite
or as BiBTeX-entry:
@article{Kraemer2021,
doi = {10.1088/1367-2630/abe336},
url = {https://doi.org/10.1088/1367-2630/abe336},
year = 2021,
month = {mar},
publisher = {{IOP} Publishing},
volume = {23},
number = {3},
pages = {033017},
author = {K H Kraemer and G Datseris and J Kurths and I Z Kiss and J L Ocampo-Espindola and N Marwan},
title = {A unified and automated approach to attractor reconstruction},
journal = {New Journal of Physics}
}

Basic usage

Let us first take a look at the help of the pecuzal_embedding function:
help pecuzal_embedding
pecuzal_embedding automated phase space reconstruction method. Y = pecuzal_embedding(X) reconstructs fully automatically the (N-by-M)-matrix Y of phase space vectors from the time series matrix X of size (N0-by-M0) where M0 could be 1 (for time series) or larger for multivariate time series. Y = pecuzal_embedding(X, TAUS) reconstructs the phase space vectors by using vector TAU for which possible delay times the algorithm shall look (default [0:50]). [Y, TAU_VALS, TS_VALS] = pecuzal_embedding(...) provides a vector of (different) time delays TAU_VALS chosen by the algorithm and a vector TS_VALS with the indices between 1 and M0 of chosen time series from matrix X, corresponding to the delays TAU_VALS. [Y, TAU_VALS, TS_VALS, L, E] = pecuzal_embedding(...) provides a vector L of L-statistic values for the reconstruction in each encountered embedding cycle. This also includes the L-value of the very last embedding cycle, which does not contribute to the final embedding, due to an increasing L-value. The cell array E contains all epsilon-mins of the continuity statistic as a function of TS_VALS for each encountered time series in X. ... = pecuzal_embedding(..., Name, Value) specifies further optional parameters for the embedding algorithm using one or more Name, Value pair arguments. Optional name-value-arguments: 'sample_size' - (default 1) size of the random phase space vector sample the algorithm considers for each delay value, in order to compute the continuity statistic. This is a float from the intervall (0 1]. The size of the considered sample is SAMPLE_SIZE*N (where N is length of the current phase space trajectory). 'theiler' - (default 1) the temporal correlation window (Theiler window) for which no nearest neighbours are considered, because they could be direct predecessors or successors of the fiducial point. When input X is a multivariate dataset, the Theiler window needs to be chosen as the maximum from each of the time series. 'alpha' - (default 0.05) significance level for the continuity statistic. 'p' - (default 0.5) binomial parameter for the continuity statistic. 'KNN' - (default 13) maximal number of points in the delta-neighbourhood. For a range of points from [8:KNN] the minimial epsilon-scale according to the significance level 'alpha' and binomial parameter 'p' is computed and the minimum of all considered delta-neighborhood sizes is finally chosen. Must be at least 8. 'k' - (default 3) considered number of nearest neighbors for Uzal's L-statistic. 'L_thres' - (default 0) The algorithm breaks, when this threshold is exceeded by `ΔL` in an embedding cycle (set as a positive number, i.e. an absolute value of `ΔL`). 'max_cycles' - (default 10) maximum number of embedding cycles the algorithm performs after which it stops. 'econ' - (default False) Economy-mode for L-statistic computation. Instead of computing L-statistics for time horizons `2:Tw`, here we only compute them for `2:2:Tw`. Example: % Reconstruct phase space vectors from the 2nd component of the Roessler system roessler = @(t,x) [-x(2)-x(3); x(1)+0.2*x(2); 0.2+x(3)*(x(1)-5.7)]; % Roessler system [t, x] = ode45(roessler, [0:.2:1500], [1.; .5; 0.5]); % solve ODE x(1:2501,:) = []; % remove transients plot3(x(:,1), x(:,2), x(:,3)) y = pecuzal_embedding(x(:,2), 0:100, 'theiler', 7); % reconstruct using PECUZAL plot3(y(:,1), y(:,2), y(:,3)) Further reading: H. K. Kraemer et al., … 2021 See also pecora_embedding_cycle, uzal_cost_pecuzal, uzal_cost
The function expects at least one input argument (a uni- or multivariate time series) and returns the phase space vectors. Several optional parameters, some as (Name, Value)-pairs, can be used fine tune the algorithm. We recommend to stick to the default values first, as described in [kraemer2020], and only parse an adequate Theiler-window, in order to exclude serially correlated time series values from the nearest neighbour search. We will see in the following how that could be done.

Univariate example (Rössler system)

We exemplify the proposed embedding method by embedding the y-component of the Rössler system with standard parameters
clear
set(0, 'DefaultAxesFontSize', 14, 'DefaultLineLineWidth', 1) % increase font size and line width for the figures
% set parameters for standard Roessler
a = 0.2;
b = 0.2;
c = 5.7;
% Set time series length and sampling
N = 2500;
dt = 0.2;
transient_time = 2500;
tf = (N+transient_time) * dt;
t = 0:dt:tf;
We define the Rössler ODE-System and solve it to obtain the time series.
% Roessler ODE
roe = @(t,x) [-x(2)-x(3); x(1)+a*x(2); b+x(3)*(x(1)-c)];
% solve the ODE
x0 = [1, -2, 0.1]; % initial conditions
[t, x] = ode45(roe, t, x0);
% remove transients
x(1:transient_time+1,:) = [];
t = t(1:end-transient_time-1);
We focus on the y-component and compute the auto mutual information (MI), in order to estimate an appropriate Theiler window. This is especially important when dealing with highly sampled datasets. Let us also plot the data and the MI.
y = x(:,2); % bind y-component
mi = mutualinformation(y,50); % compute MI
figure('Units', 'normalized', 'Position', [.3 .3 .7 .5])
subplot(1,2,1)
plot(t(1:1000), y(1:1000))
title('y-component of Roessler time series')
xlabel('time')
grid on
subplot(1,2,2)
plot(mi(:,1), mi(:,2), '.-', 'MarkerSize',15)
xlabel('time delay \tau')
ylabel('mutual information [nats]')
title('Mutual information for y-component of Roessler time series')
grid on
Now we are ready to go and call the PECUZAL algorithm pecuzal_embedding with a Theiler window determined from the first minimum of the mutual information shown in the above Figure, i.e. 'theiler'=7, and possible delays ranging in the interval [0:50]. We also use the 'econ' option for an accelerated computation.
NOTE: The following computation will take approximately 2 to 3 minutes (depending on the machine you are running the code on).
[Y_reconstruct, tau_vals, ts_vals, Ls, E] = pecuzal_embedding(y, 0:50, 'theiler', 7, 'econ', true);
Algorithm stopped due to minimum L-value reached. VALID embedding achieved.
After the computation has finished, the function returns the following note in the console:
Algorithm stopped due to minimum L-value reached. VALID embedding achieved.
The matrix Y_reconstruct contains the reconstructed trajectory. Since in this example matrix Y_reconstruct is a three-column matrix, it corresponds to a three-dimensional phase space trajectory which we can visualize in a 3D plot.
subplot(1,2,1)
plot3(Y_reconstruct(:,1), Y_reconstruct(:,2), Y_reconstruct(:,3))
xlabel(sprintf('y(t+%i)', tau_vals(1)))
ylabel(sprintf('y(t+%i)', tau_vals(2)))
zlabel(sprintf('y(t+%i)', tau_vals(3)))
title('PECUZAL reconstructed Roessler system')
grid on
view([-34.6 5.2])
subplot(1,2,2)
plot3(x(:,1), x(:,2), x(:,3))
xlabel('x(t)')
ylabel('y(t)')
zlabel('z(t)')
title('Original Roessler system')
grid on
For the correct axis labels we used the delay values the PECUZAL algorithm used and which are returned by the output-variable tau_vals.
tau_vals
tau_vals = 1×3
0 7 15
This means, that the reconstructed trajectory consists of the unlagged time series (here the y-component) and two more components with the time series lagged by 7 and 15 samples, respectively. Note the coincidence with the first minimum of the mutual information. The output variable ts_vals contains the indices regarding the chosen time series for each delay value stored in tau_vals. Because there is only one time series we have used, it is simply
ts_vals
ts_vals = 1×3
1 1 1
This output is only needed for the multivariate case, see section Multivariate example (Rössler system).
We can look at the continuity statistic the algorithm has used to find the embedding. These statistics for each embedding cycle are provided by the cell array E.
clf
plot(0:length(E{1})-1, E{1}(:,1), 'LineWidth',3), hold on
scatter(tau_vals(2), E{1}(tau_vals(2)+1,1), 'k', 'filled','HandleVisibility',"off")
plot(0:length(E{2})-1, E{2}(:,1), 'LineWidth',3)
scatter(tau_vals(3), E{2}(tau_vals(3)+1,1), 'k', 'filled','HandleVisibility',"off")
plot(0:length(E{3})-1, E{3}(:,1), 'LineWidth',3)
legend('1st embedding cycle', '2nd embedding cycle', '3rd embedding cycle', 'Location', 'northeastoutside')
title('PECUZAL continuity statistics for Roessler y-component')
xlabel('delay')
ylabel('$\langle \varepsilon^\star \rangle$', 'Interpreter', 'latex')
set(gca,'LineWidth',2)
set(gca,'FontSize',18)
grid on
The black markers point at the postitions, where the algorithm picked the delays for the reconstruction from. In the 3rd embedding cycle there is no delay value picked and the algorithm stops, because it cannot minimize ΔL further. Its values for each embedding cycle are stored in vector Ls:
Ls
Ls = 1×2
-0.671367774405204 -0.676055155971805
Thus, the total decrease of ΔL over all encountered embedding cycles is
L_total_uni = sum(Ls)
L_total_uni =
-1.347422930377009

Multivariate example (Rössler system)

Similar to the approach in the preceeding section, we highlight the capability of the proposed embedding method for a multivariate input. The idea is now to use all three time series to the algorithm, even though this is a very far-from-reality example. We already have an adequate representation of the system we want to reconstruct, namely the three time series from the numerical integration. But let us see what PECUZAL suggests for a reconstruction.
Since we have with three time series now, we estimate the Theiler window as the maximum of all Theiler windows of each time series. Again, we estimate the Theiler window by taking the first minimum of the auto mutual information.
mis = zeros(length(mi),3);
for i = 1:3
Z = mutualinformation(x(:,i), 50, 0);
mis(:,i) = Z(:,2);
end
figure('Units', 'normalized', 'Position', [.3 .3 .7 .7])
tiledlayout(3,2)
for i = 1:3
nexttile
plot(t(1:1000), x(1:1000,i))
ylabel(char(119+i))
if i == 3
xlabel('time')
end
title([char(119+i), '-component of Roessler time series'])
grid on
nexttile
plot(Z(:,1), mis(:,i))
ylabel('MI')
if i == 3
xlabel('time lag [in sampling units]')
end
title(['Mutual information for ', char(119+i), '-component of Roessler time series'])
grid on
end
Due to the spikyness of the z-component the according auto mutual information yields no meaningful result here. So we stick to the choice of theiler = 7 and call the PECUZAL algorithm with the 'econ' option and possible delays in the interval [0:50].
NOTE: The following computation will take approximately 10-12 minutes (depending on the machine you are running the code on).
[Y_reconstruct_m, tau_vals_m, ts_vals_m, Ls_m, E_m] = pecuzal_embedding(x, 0:50, 'theiler', 7, 'econ', true);
Algorithm stopped due to minimum L-value reached. VALID embedding achieved.
The computation end with the following note in the console:
Algorithm stopped due to minimum L-value reached. VALID embedding achieved.
The suggested embedding parameters
tau_vals_m
tau_vals_m = 1×3
0 0 3
ts_vals_m
ts_vals_m = 1×3
2 1 1
reveal that PECUZAL reconstructs the trajectory Y_reconstruct_m from the unlagged y-component time series (index 0), the unlagged x-component and finally again the x-component but now lagged by 3 samples. As expected the total ΔL-value is smaller here than in the univariate case, but surprisingly it is not much of a difference:
L_total_multi = sum(Ls_m)
L_total_multi =
-1.363109645925171
The reconstructed attractor looks also quite similar to the original one, even though that is not a proper evaluation criterion for the goodness of a reconstruction, see [kraemer2020].
figure('Units','normalized','Position',[.3 .3 .7 .5])
subplot(1,2,1)
plot3(Y_reconstruct_m(:,1), Y_reconstruct_m(:,2), Y_reconstruct_m(:,3))
xlabel(sprintf('%s(t+%i)', char(119+ts_vals_m(1)), tau_vals(1)))
ylabel(sprintf('%s(t+%i)', char(119+ts_vals_m(2)), tau_vals(2)))
zlabel(sprintf('%s(t+%i)', char(119+ts_vals_m(3)), tau_vals(3)))
title('PECUZAL reconstructed Roessler system')
grid on
view([94 -6])
subplot(1,2,2)
plot3(x(:,1), x(:,2), x(:,3))
xlabel('x(t)')
ylabel('y(t)')
zlabel('z(t)')
title('Original Roessler system')
grid on
view([-37.5 30.0])

Stochastic source example

Finally we demonstrate how the PECUZAL method works with non-deterministic input data. Therefore, we create a simple AR(1)-process:
M = 2000; % length of the time series
x = zeros(1,M+10);
x(1) = .2; % intial value
alpha = .9; % auto-correlation parameter
p = .2; % noise amplitude
for i = 2:N+10
x(i) = alpha*x(i-1) + p*randn;
end
x = x(10:end); % remove transients
figure
plot(1:1000, x(1:1000))
title('AR(1) process')
xlabel('sample')
grid on
When we now call the PECUZAL algorithm
[Y_reconstruct_s, tau_vals_s, ts_vals_s, Ls_s, eps_s] = pecuzal_embedding(x, 'econ', true);
Algorithm stopped due toincreasing L-values in the first embedding cycle. NO valid embedding achieved.
This means, the algorithm could not obtain any valid embedding, thus, it values the input data as a non-deterministic source.