Statistics¶
Statistical features summarize the distribution of a price series over a
rolling window. All have forward_period = 0 and are safe for live streaming.
Adf¶
Streaming <3µs/update Research
The Augmented Dickey-Fuller test measures whether a rolling window of prices behaves like a stationary (mean-reverting) process or a unit-root (random walk) process. Two outputs are produced per bar: the ADF test statistic and its approximate p-value.
The regression model estimated by OLS on each window is:
The test statistic is \(\hat{\gamma} / \text{SE}(\hat{\gamma})\). Under H0 (unit root) it follows the Dickey-Fuller distribution, not Student's t. P-values are computed via linear interpolation over a 45-point table derived from MacKinnon (2010).
H0: \(\gamma = 0\) (unit root - non-stationary) H1: \(\gamma < 0\) (no unit root - stationary)
| Name | Type | Constraint | Description |
|---|---|---|---|
inputs |
list[str] |
len = 1 | Input column, e.g. ["close"] |
window |
int |
> 3 + 2 * lags |
Rolling window length |
outputs |
list[str] |
len = 2 | Output columns [adf_stat_col, adf_pval_col] |
lags |
int \| None |
>= 0 | Lagged differences. None applies Schwert's rule |
regression |
str |
'c' or 'ct' |
'c': constant only. 'ct': constant + trend |
Schwert's rule (default when lags=None): \(k = \lfloor 12 \cdot (n/100)^{0.25} \rfloor\).
Applied once at construction. For window=100 this gives k=12.
| Column | When valid | Description |
|---|---|---|
outputs[0] |
t >= window - 1, no NaN in buffer, OLS non-singular |
ADF test statistic |
outputs[1] |
Same as above | Approximate p-value (MacKinnon 2010, asymptotic) |
-
Warm-up. The first
window - 1bars return[NaN, NaN]. -
NaNpropagation. ANaNinput enters the buffer. Both outputs areNaNuntil theNaNis evicted afterwindowconsecutive valid bars. -
Singular OLS. When the OLS system is degenerate (e.g. all values in the buffer are identical), both outputs return
NaN. -
reset(). Clears the buffer entirely. Call it between backtest folds (CPCV, walk-forward) to avoid state leaking across splits. -
P-value accuracy. P-values use the asymptotic (large-sample) MacKinnon distribution. For
window < 100the asymptotic approximation becomes less accurate - the true significance level may differ by a few percent from the reported p-value. Preferwindow >= 100for reliable inference. -
Implementation. Full OLS via Gaussian elimination on every
update(),O(window)per bar.
| Situation | Output |
|---|---|
t < window - 1 (buffer not full) |
[NaN, NaN] |
| Buffer full, all values valid, OLS non-singular | [stat, pvalue] |
Any NaN in the buffer |
[NaN, NaN] |
| OLS singular (e.g. constant series) | [NaN, NaN] |
After reset() |
[NaN, NaN] until buffer refills |
-
Stat below -3.5 (
'c') or -4.0 ('ct'): strong evidence of stationarity. The series is likely mean-reverting in this window. -
P-value below 0.05: reject H0 (unit root) at the 5% level.
-
Rolling use. A series that switches from high p-values to low p-values across time is transitioning from a trending/random-walk regime to a mean-reverting regime - a common signal in pairs trading and stat-arb.
-
Regression choice. Use
'c'when the series oscillates around a non-zero level. Use'ct'when you expect a linear trend and want to test stationarity around that trend.
from statsmodels.tsa.stattools import adfuller
from oryon.features import Adf
# Reference: adfuller(x, regression='c', maxlag=0, autolag=None)[0] = -5.656
x = [0.0, 0.5087, -0.1558, 0.2507, 0.8633, 0.3085, 0.5017, 0.4578,
-0.2826, 0.1437, 0.4694, -0.0066, 0.2960, 0.6133, -0.1088,
0.3521, 0.3786, 0.1477, 0.5707, 0.1324]
adf = Adf(inputs=["close"], window=20, outputs=["adf_stat", "adf_pval"],
lags=0, regression="c")
for v in x[:-1]:
adf.update([v]) # returns [NaN, NaN] during warm-up
stat, pval = adf.update([x[-1]])
print(f"stat={stat:.4f}, pval={pval:.2e}")
# stat=-5.6565, pval=1.04e-06 → strong evidence of stationarity
-
Additional test series. The current Rust test suite validates
adf_statagainst statsmodels on a single 20-bar reference series. Adding 2-3 more series (random walk, strong mean-reversion, high-volatility) would increase confidence across the full range of the statistic. -
Finite-sample p-values. P-values currently use the asymptotic MacKinnon distribution (
N=1inmackinnonp). A finite-sample correction (separate lookup tables forN=30,50,100,250) would improve accuracy for short windows. This requires Monte Carlo simulation or MacKinnon (1994) coefficient tables not available through statsmodels. -
Performance: incremental XtX update. Each
update()currently rebuilds the full OLS system from scratch -O(window * p^2)per bar, wherep = 3 + lagsis the number of regressors. Atwindow=200withlags=0this costs ~3µs, and grows further with larger windows or more lags. The bottleneck is avoidable: theAdfstruct could maintainXtXandXtyas running state, updating them incrementally inO(p^2)per bar by adding the new row and subtracting the evicted row. This would reduce per-bar cost to sub-1µs regardless of window size.
Skewness¶
Streaming <1µs/update Research
Fisher-Pearson corrected skewness over a rolling window, identical to pandas .skew().
Positive values indicate a long right tail (rare large gains), negative values a long left
tail (rare large losses). Useful for detecting regime shifts and tail risk.
| Name | Type | Constraint | Description |
|---|---|---|---|
inputs |
list[str] |
len = 1 | Input column, e.g. ["close"] |
window |
int |
>= 3 | Rolling window length |
outputs |
list[str] |
len = 1 | Output column, e.g. ["close_skew_20"] |
| Column | When valid | Description |
|---|---|---|
outputs[0] |
t >= window - 1, no NaN in buffer, not all values equal |
Fisher-Pearson sample skewness |
-
Warm-up. The first
window - 1bars returnNaN. A full buffer ofwindowvalues is required. -
NaNpropagation. ANaNinput contaminates the buffer. Output staysNaNuntil theNaNis evicted afterwindowconsecutive valid bars. -
All values equal. If all
windowvalues are identical, the standard deviation is zero and the output isNaN. -
reset(). Clears the buffer entirely. Call it between backtest folds (CPCV, walk-forward) to avoid state leaking across splits. -
Implementation. Recomputes over the full buffer on every
update()(O(N)per bar). Uses sample standard deviation (N-1 denominator) for the standardization step.
| Situation | Output |
|---|---|
t < window - 1 (buffer not full) |
NaN |
| Buffer full, all values valid, not all equal | Skewness value |
Any NaN in the buffer |
NaN |
| All values in the buffer are equal | NaN |
After reset() |
NaN until buffer refills |
-
Signal. Positive: right tail dominates - extreme positive deviations are farther from the mean than extreme negative ones. Negative: left tail dominates. Zero: the distribution is symmetric over the window.
-
Rolling. Changes in sign or magnitude capture distributional shifts in the series. A transition from positive to negative skewness signals that the left tail is growing relative to the right.
import pandas as pd
from oryon.features import Skewness
from oryon import FeaturePipeline
from oryon.adapters import run_features_pipeline_pandas
sk = Skewness(["close"], window=3, outputs=["close_skew_3"])
fp = FeaturePipeline(features=[sk], input_columns=["close"])
df = pd.DataFrame({"close": [1.0, 2.0, 4.0, 6.0, 8.0]})
out = run_features_pipeline_pandas(fp, df)
print(out)
# close_skew_3
# 0 NaN
# 1 NaN
# 2 0.94
# 3 0.00
# 4 0.00
skew([1, 2, 4]) = 0.935 (right-skewed). skew([2, 4, 6]) = 0.0 (symmetric -
evenly spaced values always give zero skewness). Results match pandas .skew() exactly.
Kurtosis¶
Streaming <1µs/update Research
Fisher excess kurtosis over a rolling window, identical to pandas .kurt().
Values above 0 indicate fat tails (leptokurtic). A normal distribution gives 0.
Uniformly spaced returns give negative kurtosis (platykurtic).
| Name | Type | Constraint | Description |
|---|---|---|---|
inputs |
list[str] |
len = 1 | Input column, e.g. ["close"] |
window |
int |
>= 4 | Rolling window length |
outputs |
list[str] |
len = 1 | Output column, e.g. ["close_kurt_20"] |
| Column | When valid | Description |
|---|---|---|
outputs[0] |
t >= window - 1, no NaN in buffer, not all values equal |
Fisher excess kurtosis |
-
Warm-up. The first
window - 1bars returnNaN. A full buffer ofwindowvalues is required. -
NaNpropagation. ANaNinput contaminates the buffer. Output staysNaNuntil theNaNis evicted afterwindowconsecutive valid bars. -
All values equal. If all
windowvalues are identical, the standard deviation is zero and the output isNaN. -
reset(). Clears the buffer entirely. Call it between backtest folds (CPCV, walk-forward) to avoid state leaking across splits. -
Implementation. Recomputes over the full buffer on every
update()(O(N)per bar). Uses sample standard deviation (N-1 denominator) for the standardization step.
| Situation | Output |
|---|---|
t < window - 1 (buffer not full) |
NaN |
| Buffer full, all values valid, not all equal | Excess kurtosis value |
Any NaN in the buffer |
NaN |
| All values in the buffer are equal | NaN |
After reset() |
NaN until buffer refills |
-
Signal. Excess kurtosis > 0: more probability mass in the tails than a normal distribution (leptokurtic). = 0: consistent with normal. < 0: lighter tails than normal (platykurtic).
-
Rolling. Spikes in excess kurtosis indicate concentration of extreme observations within the window, the tail behavior is changing.
import pandas as pd
from oryon.features import Kurtosis
from oryon import FeaturePipeline
from oryon.adapters import run_features_pipeline_pandas
ku = Kurtosis(["close"], window=4, outputs=["close_kurt_4"])
fp = FeaturePipeline(features=[ku], input_columns=["close"])
df = pd.DataFrame({"close": [1.0, 2.0, 4.0, 8.0, 6.0]})
out = run_features_pipeline_pandas(fp, df)
print(out)
# close_kurt_4
# 0 NaN
# 1 NaN
# 2 NaN
# 3 0.76
# 4 2.24
kurt([1, 2, 4, 8]) = 0.758 (fat tails from the jump to 8). Results match
pandas .kurt() exactly.
Median Moving Average¶
Streaming <1µs/update Research
Rolling median over the last window bars. More robust to outliers than the SMA -
a single spike does not shift the output, making it useful as a pre-filter before
applying trend or signal detection indicators.
| Name | Type | Constraint | Description |
|---|---|---|---|
inputs |
list[str] |
len = 1 | Input column, e.g. ["close"] |
window |
int |
>= 1 | Rolling window length |
outputs |
list[str] |
len = 1 | Output column, e.g. ["close_mma_20"] |
| Column | When valid | Description |
|---|---|---|
outputs[0] |
t >= window - 1, no NaN in buffer |
Rolling median of the last window values |
| Property | Value |
|---|---|
warm_up_period |
window - 1 |
forward_period |
0 |
-
Warm-up. The first
window - 1bars returnNaN. -
NaNpropagation. ANaNinput enters the buffer and contaminates the output. Output staysNaNuntil theNaNis evicted afterwindowconsecutive valid bars. -
reset(). Clears the buffer entirely.
| Situation | Output |
|---|---|
t < window - 1 (buffer not full) |
NaN |
| Buffer full, all values valid | Median value |
Any NaN in the buffer |
NaN |
After reset() |
NaN until buffer refills |
-
Signal. Tracks the central value of the series while ignoring extreme bars. A single spike does not shift the output, unlike the SMA or EMA.
-
Window size. Small windows (
3-5) remove isolated outliers while preserving local structure. Large windows smooth too aggressively and lag significantly behind real price moves. -
Use case. Pre-filter before applying trend or signal detection indicators when the raw series contains frequent outliers (bad ticks, gaps, illiquid bars).
ShannonEntropy¶
Streaming <1µs/update Research
Rolling Shannon entropy over the last window bars. Values are discretized into
k equal-width bins; \(p_i\) is the fraction of observations in bin \(i\).
High entropy means the distribution is spread across bins (disordered market).
Low entropy means mass is concentrated in a few bins (directional or calm regime).
When all values in the window are identical, range is zero and entropy is 0.0 (not NaN).
| Name | Type | Constraint | Description |
|---|---|---|---|
inputs |
list[str] |
len = 1 | Input column, e.g. ["returns"] |
window |
int |
>= 2 | Rolling window length |
outputs |
list[str] |
len = 1 | Output column, e.g. ["returns_entropy_20"] |
bins |
int \| None |
>= 2 or None |
Number of bins. None applies Sturges' rule. Default: None |
normalize |
bool |
- | If True, output is H / ln(bins) in [0, 1]. Default: True |
Sturges' rule (default when bins=None): \(k = \lceil 1 + \log_2(\text{window}) \rceil\).
Computed once at construction. For window=20 this gives k=5, for window=200, k=9.
| Column | When valid | Description |
|---|---|---|
outputs[0] |
t >= window - 1, no NaN in buffer |
Shannon entropy in nats (normalize=False) or [0, 1] (normalize=True) |
-
Warm-up. The first
window - 1bars returnNaN. A full buffer ofwindowvalues is required before the first output. -
NaNpropagation. ANaNinput enters the buffer. Output staysNaNuntil theNaNis evicted afterwindowconsecutive valid bars. -
Identical values. When all
windowvalues are equal, range is zero and all mass falls in the first bin - entropy is0.0, notNaN. This represents maximum certainty (minimum disorder). -
reset(). Clears the buffer entirely. Call it between backtest folds (CPCV, walk-forward) to avoid state leaking across splits. -
Implementation. Rebuilds bin counts over the full buffer on every
update()(O(window)per bar). Bins use equal-width partitioning of[min, max].
| Situation | Output |
|---|---|
t < window - 1 (buffer not full) |
NaN |
| Buffer full, all values valid | Entropy value |
Any NaN in the buffer |
NaN |
| All values identical (range = 0) | 0.0 |
After reset() |
NaN until buffer refills |
-
High entropy (near 1.0). Returns are spread across the value range - no dominant direction, diffuse distribution. Associated with choppy or transitional regimes.
-
Low entropy (near 0.0). Returns are concentrated in a narrow region - the distribution is peaked. Associated with trending or calm regimes where most observations cluster together.
-
Bin choice.
bins=2captures a simple high/low split and is the most stable. More bins (Sturges or explicit) resolve finer structure but add noise for small windows. As a rule:bins <= window / 5avoids empty bins on most real distributions. -
Normalize. Use
normalize=True(default) when comparing across assets or across differentbinsconfigs. Usenormalize=Falsewhen you want the raw value in nats for downstream math (e.g. KL divergence).
from oryon.features import ShannonEntropy
se = ShannonEntropy(inputs=["x"], window=4, outputs=["entropy_4"],
bins=2, normalize=True)
se.update([1.0]) # [nan] - warm-up
se.update([1.0]) # [nan]
se.update([4.0]) # [nan]
se.update([4.0]) # [1.0] - window=[1,1,4,4]: 2 low, 2 high -> max entropy
se.update([4.0]) # [0.811] - window=[1,4,4,4]: 1 low, 3 high -> entropy drops
se.update([4.0]) # [0.0] - window=[4,4,4,4]: range=0 -> minimum entropy
Entropy falls from 1.0 (uniform split) to 0.811 (skewed 1/4 vs 3/4) to 0.0
(constant series). The 0.811 value equals \(H^*([0.25, 0.75]) = 0.5623 / \ln 2\).
import pandas as pd
from oryon.features import ShannonEntropy
from oryon import FeaturePipeline
from oryon.adapters import run_features_pipeline_pandas
se = ShannonEntropy(["returns"], window=4, outputs=["returns_entropy_4"],
bins=2, normalize=True)
fp = FeaturePipeline(features=[se], input_columns=["returns"])
df = pd.DataFrame({"returns": [0.01, -0.02, 0.03, -0.01, 0.02, 0.03, 0.01]})
out = run_features_pipeline_pandas(fp, df)
print(out)
# returns_entropy_4
# 0 NaN
# 1 NaN
# 2 NaN
# 3 1.000 # counts=[2,2] -> uniform split -> max entropy
# 4 1.000 # counts=[2,2]
# 5 0.811 # counts=[1,3] -> skewed split -> entropy drops
# 6 0.811 # counts=[1,3]
-
Incremental range tracking. Each
update()scans the full buffer to findminandmax(O(window)extra). A sliding-window min/max structure (e.g. monotone deque) would reduce this toO(1)amortized, cutting the constant factor roughly in half for large windows. -
Additional bin methods. Freedman-Diaconis (
h = 2 * IQR * n^{-1/3}) and equal-frequency (quantile) bins are natural extensions. Equal-frequency bins in particular avoid empty bins and produce more stable entropy estimates on fat-tailed financial series. TheBinMethodenum in Rust is already structured to accommodate new variants without changing existing behavior.
Correlation¶
Streaming <18µs/update Research
Rolling pairwise correlation between two series over a sliding window. Three methods are supported, differing in what relationship they measure and their computational cost.
Pearson - product-moment correlation, measures linear co-movement:
Spearman - Pearson correlation applied to the ranks of each series (average rank for ties):
Kendall tau-b - fraction of concordant minus discordant pairs, adjusted for ties:
where \(n_0 = n(n-1)/2\), \(n_1\) = pairs tied in \(x\), \(n_2\) = pairs tied in \(y\), \(C\) = concordant pairs (same ordering in both series), \(D\) = discordant pairs.
| Name | Type | Constraint | Description |
|---|---|---|---|
inputs |
list[str] |
len >= 2 | Two input columns [x, y] |
window |
int |
>= 2 | Rolling window length |
outputs |
list[str] |
len = 1 | Output column, e.g. ["xy_corr_20"] |
method |
str |
'pearson', 'spearman', 'kendall' |
Correlation method. Default: 'pearson' |
| Column | When valid | Description |
|---|---|---|
outputs[0] |
t >= window - 1, no NaN in buffer, neither series constant |
Correlation coefficient in [-1, 1] |
-
Warm-up. The first
window - 1bars returnNaN. A full buffer ofwindowvalues is required for the first output. -
NaNpropagation. ANaNinput enters the buffer. Output staysNaNuntil theNaNis evicted afterwindowconsecutive valid bars. -
Constant series. If either series has zero variance over the window, the correlation is mathematically undefined and the output is
NaN. -
reset(). Clears both buffers entirely. Call between backtest folds (CPCV, walk-forward) to avoid state leaking across splits. -
Performance. Measured at
w200on Apple M-series:
| Method | Complexity | w20 | w200 |
|---|---|---|---|
'pearson' |
O(n) | 39 ns | 384 ns |
'spearman' |
O(n log n) | 279 ns | 1 687 ns |
'kendall' |
O(n^2) | 248 ns | 17 779 ns |
| Situation | Output |
|---|---|
t < window - 1 (buffer not full) |
NaN |
| Buffer full, all values valid, neither series constant | Correlation value in [-1, 1] |
Any NaN in the buffer |
NaN |
| Either series constant over the window | NaN |
After reset() |
NaN until buffer refills |
-
Pearson. Measures linear co-movement. Use for pairs trading (do two price series move together linearly?), factor exposure (is a return series linearly related to a risk factor?). Sensitive to outliers.
-
Spearman. Measures monotonic co-movement regardless of linearity. Ranks the values before correlating, so a consistently higher value in one series paired with a consistently higher value in the other scores +1, even if the relationship is exponential rather than linear. More robust to outliers than Pearson.
-
Kendall. Measures concordance: what fraction of pairs agree on ordering? Conceptually simpler than Spearman, has better statistical properties at small samples and is more robust to outliers, but is significantly slower. Prefer Spearman for large windows.
-
Choosing a method. For most quant use cases, Pearson is sufficient and fastest. Use Spearman when the relationship may be non-linear or the series are fat-tailed. Use Kendall for small windows when you need concordance-based statistics (e.g. comparing strategy rankings).
from oryon.features import Correlation
# Pearson: close prices vs volume — do they move together linearly?
corr = Correlation(inputs=["close", "volume"], window=20, outputs=["cv_corr_20"])
corr.update([100.0, 1_200_000.0]) # -> [NaN] (warm-up)
# ... feed 20 bars ...
# Spearman: rank correlation between two return series
corr_sp = Correlation(inputs=["ret_a", "ret_b"], window=60,
outputs=["ret_corr_60"], method="spearman")
# Kendall: concordance for small windows only
corr_k = Correlation(inputs=["x", "y"], window=20,
outputs=["xy_tau_20"], method="kendall")
Manual verification (window=3, method='pearson'):
Pearson - incremental O(1) is possible.
The current implementation recomputes over the full buffer on every bar (O(n)).
An incremental version would maintain five running sums for the sliding window:
\(S_x\), \(S_y\), \(S_{xx}\), \(S_{yy}\), \(S_{xy}\). When bar \(t\) enters and bar \(t-n\)
is evicted, each sum is updated in O(1):
This would reduce Pearson from 373 ns to sub-10 ns at w200 - roughly on par with
EMA. The main caveat is numerical stability: when \(n \cdot S_{xx} \approx S_x^2\)
(near-constant series), catastrophic cancellation can occur. A Welford-style
compensated accumulator mitigates this but adds implementation complexity. The
two-pass approach currently used is unconditionally stable, which is why it was
chosen first.
Spearman - no sub-O(n) approach known for a sliding window.
Every time one bar enters and one leaves, all ranks in the window can shift.
There is no way to avoid touching all n ranks on each update. With an
order-statistic tree (e.g. a Fenwick tree on compressed rank indices), individual
rank lookups become O(log n) each, reducing the ranking step to O(n log n) with
better cache behavior - but the asymptotic complexity stays O(n log n). The
downstream Pearson step on ranks is still O(n). Bottom line: Spearman at w200
is bounded below by O(n) and is unlikely to drop below a few hundred nanoseconds.
Kendall - incremental O(n) is possible (vs current O(n^2)).
When the window slides by one bar, only O(n) pairs change: the n-1 pairs
involving the evicted element are removed, and n-1 new pairs involving the
incoming element are added. A sliding-window implementation would maintain
running concordance and tie counts, updating them in O(n) rather than
recomputing all \(n(n-1)/2\) pairs from scratch. This should significantly
reduce the per-bar cost at large windows, with the goal of approaching the
1-2 µs range at w200.
AutoCorrelation¶
Streaming <19µs/update Research
Rolling autocorrelation of a single series at a fixed lag. Uses the same Pearson,
Spearman, and Kendall formulas as Correlation, with \(x = x_t\) and \(y = x_{t-\text{lag}}\)
over a sliding window of window observations.
| Name | Type | Constraint | Description |
|---|---|---|---|
inputs |
list[str] |
len = 1 | Input column, e.g. ["close"] |
window |
int |
>= 2 | Number of bars in each sub-window |
outputs |
list[str] |
len = 1 | Output column, e.g. ["close_autocorr_20_1"] |
lag |
int |
>= 1 | Lag in bars. Default: 1 |
method |
str |
'pearson', 'spearman', 'kendall' |
Correlation method. Default: 'pearson' |
| Column | When valid | Description |
|---|---|---|
outputs[0] |
t >= window + lag - 1, no NaN in buffer, neither sub-window constant |
Autocorrelation coefficient in [-1, 1] |
-
Warm-up. The first
window + lag - 1bars returnNaN. A combined buffer ofwindow + lagvalues is required for the first output. -
NaNpropagation. ANaNinput enters the buffer. Output staysNaNuntil theNaNis evicted afterwindow + lagconsecutive valid bars. -
Constant series. If either sub-window has zero variance over the window, the correlation is mathematically undefined and the output is
NaN. -
reset(). Clears the buffer entirely. Call between backtest folds (CPCV, walk-forward) to avoid state leaking across splits. -
Performance. Same complexity and throughput as
Correlation- the only difference is that both sub-windows are slices of the same buffer rather than two separate buffers.
| Situation | Output |
|---|---|
t < window + lag - 1 (buffer not full) |
NaN |
| Buffer full, all values valid, neither sub-window constant | Autocorrelation value in [-1, 1] |
Any NaN in the buffer |
NaN |
| Either sub-window constant over the window | NaN |
After reset() |
NaN until buffer refills |
-
Lag 1. Measures bar-to-bar persistence. A high positive value means the series tends to continue in the same direction (trending). A high negative value means the series tends to reverse (mean-reverting on a one-bar horizon).
-
Multiple lags. Running
AutoCorrelationat several lags simultaneously (e.g. 1, 5, 20 bars) produces a rolling ACF profile. Jumps in the ACF at periodic lags may reveal seasonality or microstructure patterns. -
Spearman / Kendall. Robust alternatives when the relationship is non-linear or when the series contains outliers. Same tradeoffs as in
Correlation.
from oryon.features import AutoCorrelation
# Lag-1 autocorrelation of close prices
ac = AutoCorrelation(inputs=["close"], window=20, outputs=["close_ac1_20"], lag=1)
ac.update([100.0]) # -> [NaN] (warm-up, need window + lag = 21 bars)
# ... feed 21 bars total for first valid output ...
# Lag-5 autocorrelation: weekly persistence in daily data
ac5 = AutoCorrelation(inputs=["close"], window=20, outputs=["close_ac5_20"], lag=5)
Manual verification (window=3, lag=1, method='pearson'):