RollingKurt

Description

The RollingKurt class computes the excess kurtosis of a data sequence within a specified moving window. This rolling calculation provides a measure of the “tailedness” of the data distribution over the window, with a correction applied for small sample sizes. The computed value represents excess kurtosis, meaning it is adjusted to measure how the distribution deviates from a normal distribution (where excess kurtosis is zero). Additionally, a bias correction (or sample correction) is included, making this estimate more accurate when sample sizes are small.

Initial values: The constructor requires a positive integer window_size parameter to define the rolling window.
NaN handling: NaN values are not handled natively and should be preprocessed if necessary.

Usage Example and Plot

Below is an example of using RollingKurt to calculate the rolling median for a random dataset, along with a plot illustrating its output.

import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from screamer import RollingKurt

# Generate example data
data = np.cumsum(np.random.normal(size=300))

# Create subplots with specified row heights and shared x-axis
fig = make_subplots(
    rows=2, cols=1,
    shared_xaxes=True,
    row_heights=[2/3, 1/3],
    vertical_spacing=0.1
)

# Add traces for each subplot
fig.add_trace(go.Scatter(y=data, mode='lines', name='Input Data'), row=1, col=1)
fig.add_trace(go.Scatter(y=RollingKurt(30)(data), mode='lines', name='Rolling Kurtosis', line=dict(color='red')), row=2, col=1)

# Update layout with titles and axis labels
fig.update_layout(
    title=f"Rolling Kurtosis with Window Size 30",
    xaxis_title="Index",
    yaxis=dict(title="Input Data"),
    yaxis2=dict(title="Rolling Kurtosis", range=[-2, 4]),
    margin=dict(l=20, r=20, t=80, b=20),
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1)
)

fig.show()

Implementation Details

Algorithm

RollingKurt implements cyclic buffers to accumulate windowed statistics.

Complexity

  • Time Complexity: O(log(1)) per new element due to the insertion and deletion operations in the heaps.

  • Space Complexity: O(window_size), as only elements within the current window are stored.

Performance

  • Short streams (n=1.000): 120% faster than Pandas Rolling kurt

  • Longer streams (n=1.000.000): 400% faster than Pandas Rolling kurt