Statistics.

The postfix complex calculator has facilities to figure simple descriptive statistics, both univariate and bivariate.

stats univar
weight n n−1 bivar
data count

Some preliminaries:

♦ For univariate calculations, n stands for the number of points, and a_k for any individual point.

♦ For bivariate calculations, n represents the number of pairs of points; within each pair, a_k is one member and b_k the other. It is possible to regard the a_k's as independent variables, and the b_k's as dependent — or vice versa. The letters a and b were chosen to correspond to their data sources, stack registers a and b respectively.

♦ The subscript k varies from 1 to n inclusive.

♦ Each complex number can be broken into real and imaginary parts: a_k = u_k + iv_k and b_k = x_k + iy_k; where u_k, v_k, x_k and y_k are real numbers.

♦ The Greek majuscule sigma is used for summation. For instance, the sum of all the a_k's is written Σ a_k. In every case, the index of summation runs between 1 and n. Note the difference between Σ (x_k²) and (Σ x_k)².

♦ The Greek minuscule mu, suitably subscripted, is used for means. For instance μ_a is the mean of the a_k's, and μ_x is the mean of the x_k's. Often when the statistics of real numbers is discussed, the mean of the a_k's is denoted a̅, but the overbar is also the symbol for complex conjugation, so it is avoided here.

♦ The superscript c is used for complex conjugation: a_k^c = (u_k + iv_k)^c = u_k − iv_k.

♦ The Greek minuscule sigma, subscripted and squared, is used for variances, which turn out to be real numbers even with complex data. For example, σ_a² is the variance of the a_k's. The standard deviation, not directly provided by the calculator, is the nonnegative square root of the variance: for instance, σ_a is the standard deviation of the a_k's. Meanwhile, σ_ab (not squared) stands for the covariance between the a_k's and b_k's.

♦ Variances and covariances can be weighted according to n or n − 1. If an entire population is measured, n weighting is the usual choice; if only a sample of the population is measured, n − 1 is typically preferred.

♦ In the bivariate case, ρ_ab is used for the correlation of the a_k's and b_k's.

♦ An additional feature is that the covariance and correlation between a_k's real and imaginary parts u_k and v_k are available; similarly between b_k's parts x_k and y_k. Statistics between radius and angle are not attempted, one reason being that there are multiple plausible ways to define the mean of a set of angles.

♦ Notation aa^c denotes the square of the magnitude of a, where magnitude is a synonym for radius, and is a real number. The interpretation is a(a^c), not (aa)^c. It equals a^ca.

Here are the statistical controls.

• clears all statistical data so that a fresh calculation can begin
• enables the univar and bivar radio buttons
• does not affect the stack
weight
n
n − 1 • selects n or n − 1 weighting
data count • in univariate mode, tells how many points are stored
• in bivariate mode, tell how many pairs of data points are stored
univar • selects univariate mode
• disables buttons specific to bivariate mode
• inserts a data point popped from stack register a
• disables the univar and bivar radio buttons
• operates only in univariate mode
bivar • selects bivariate mode
• disables buttons specific to univariate mode
• inserts a pair of data points popped from stack registers a and b
• disables the univar and bivar radio buttons
• operates only in bivariate mode

The following operations produce a result and push it into register a of the stack. To calculate anything, at least two points are required.

mean of the a values mean of the b values
variance of the a values variance of the b values
variance of a's real parts variance of b's real parts
variance of a's imaginary parts variance of b's imaginary parts
covariance between a's real and imaginary parts covariance between b's real and imaginary parts
correlation between a's real and imaginary parts correlation between b's real and imaginary parts
a independent,
b dependent slope of the regression line slope of the regression line b independent,
a dependent
b-intercept of the regression line a-intercept of the regression line
covariance between the a and b data sets correlation between the a and b data sets

On the buttons, the notation b(a) is to suggest b as a function of a, hence a independent and b dependent. Regression is linear least squares.

Should the mean of a's real parts be required, it can be found as the real part of the mean of the (complex) a values; similarly for the imaginaries; further similarly for the b's.

It is not possible to change between the univariate and bivariate modes while statistical data is stored; this is to prevent the garbling of data.

The calculator employs procedures equivalent to the following "textbook" formulas:

item n weighting n − 1 weighting field
a mean μ_a = Σ a_k ÷ n complex
a variance σ_a² = Σ (a − μ_a) (a − μ_a)^c ÷ n σ_a² = Σ (a − μ_a) (a − μ_a)^c ÷ (n − 1) real
a real part variance σ_u² = Σ (u − μ_u)² ÷ n σ_u² = Σ (u − μ_u)² ÷ (n − 1)
a imag part variance σ_v² = Σ (v − μ_v)² ÷ n σ_v² = Σ (v − μ_v)² ÷ (n − 1)
a real-imag covariance σ_uv = Σ (u_k − μ_u) (v_k − μ_v) ÷ n σ_uv = Σ (u_k − μ_u) (v_k − μ_v) ÷ (n − 1) real
a real-imag correlation ρ_uv = σ_uv ÷ (σ_uσ_v)
b mean μ_b = Σ b_k ÷ n complex
b variance σ_b² = Σ (b_k − μ_b) (b_k − μ_b)^c ÷ n σ_b² = Σ (b_k − μ_b) (b_k − μ_b)^c ÷ (n − 1) real
b real part variance σ_x² = Σ (x − μ_x)² ÷ n σ_x² = Σ (x − μ_x)² ÷ (n − 1)
b imag part variance σ_y² = Σ (y − μ_y)² ÷ n σ_y² = Σ (y − μ_y)² ÷ (n − 1)
b real-imag covariance σ_xy = Σ (x_k − μ_x) (y_k − μ_y) ÷ n σ_xy = Σ (x_k − μ_x) (y_k − μ_y) ÷ (n − 1) real
b real-imag correlation ρ_xy = σ_xy ÷ (σ_xσ_y)
a-b covariance σ_ab = Σ (a_k − μ_a) (b_k − μ_b)^c ÷ n σ_ab = Σ (a_k − μ_a) (b_k − μ_b)^c ÷ (n − 1) complex
a-b correlation ρ_ab = σ_ab ÷ (σ_aσ_b)

item	n weighting	n − 1 weighting	field
a mean	μ_a = Σ a_k ÷ n	complex
a variance	σ_a² = Σ (a − μ_a) (a − μ_a)^c ÷ n	σ_a² = Σ (a − μ_a) (a − μ_a)^c ÷ (n − 1)	real
a real part variance	σ_u² = Σ (u − μ_u)² ÷ n	σ_u² = Σ (u − μ_u)² ÷ (n − 1)
a imag part variance	σ_v² = Σ (v − μ_v)² ÷ n	σ_v² = Σ (v − μ_v)² ÷ (n − 1)
a real-imag covariance	σ_uv = Σ (u_k − μ_u) (v_k − μ_v) ÷ n	σ_uv = Σ (u_k − μ_u) (v_k − μ_v) ÷ (n − 1)	real
a real-imag correlation	ρ_uv = σ_uv ÷ (σ_uσ_v)
b mean	μ_b = Σ b_k ÷ n	complex
b variance	σ_b² = Σ (b_k − μ_b) (b_k − μ_b)^c ÷ n	σ_b² = Σ (b_k − μ_b) (b_k − μ_b)^c ÷ (n − 1)	real
b real part variance	σ_x² = Σ (x − μ_x)² ÷ n	σ_x² = Σ (x − μ_x)² ÷ (n − 1)
b imag part variance	σ_y² = Σ (y − μ_y)² ÷ n	σ_y² = Σ (y − μ_y)² ÷ (n − 1)
b real-imag covariance	σ_xy = Σ (x_k − μ_x) (y_k − μ_y) ÷ n	σ_xy = Σ (x_k − μ_x) (y_k − μ_y) ÷ (n − 1)	real
b real-imag correlation	ρ_xy = σ_xy ÷ (σ_xσ_y)
a-b covariance	σ_ab = Σ (a_k − μ_a) (b_k − μ_b)^c ÷ n	σ_ab = Σ (a_k − μ_a) (b_k − μ_b)^c ÷ (n − 1)	complex
a-b correlation	ρ_ab = σ_ab ÷ (σ_aσ_b)

An equally valid a-b covariance definition would have conjugated (a − μ_a) rather than (b − μ_b). Such a change would have the effect of conjugating the covariance itself, and consequently the correlation. Another way to conjugate the covariance and correlation is to exchange the a and b data sets.

The correlation will have a magnitude that is no greater than one, and will have a magnitude of exactly unity whenever the regression line fits the data perfectly. Note that in the correlation formulas, the σ's are not squared.

The table below gives formulas for the two a-b regression lines, which in general do not coincide, but are often close.

slope s′ = (σ_ab)^c ÷ σ_a² a independent,
b dependent
intercept t′ = μ_b − s′μ_a
regression line b = t′ + s′a
slope s″ = σ_ab ÷ σ_b² b independent,
a dependent
intercept t″ = μ_a − s″μ_b
regression line a = t″ + s″b

Contrast the conjugation of σ_ab in s′ versus the lack of conjugation in s″. Thus the product of s′ and s″ will be a real number.

For reasons of efficiency, some formulas used by the implementation differ from the "textbook" formulas above.

Each time a data item is inserted, the calculator increments the accumulators listed in the table below; not stored is the data item itself.

Internal accumulators
all real
univariate and bivariate Σ u_k Σ v_k Σ (u_k²) Σ (v_k²) Σ (u_kv_k) n
bivariate only Σ x_k Σ y_k Σ (x_k²) Σ (y_k²) Σ (x_ky_k)
Σ (u_k y_k) Σ (v_k y_k) Σ (u_k x_k) Σ (v_k x_k)

Using values in the accumulators, the calculator can produce various statistics on demand, with no need to iterate through every data item. In the table below are formulas for the variances and covariances; the real and imaginary parts of σ_ab are given separately for ease of reading.

n weighting n − 1 weighting field
σ_u² = Σ (u_k²) ÷ n − (Σ u_k)² ÷ n² σ_u² = Σ (u_k²) ÷ (n − 1) − (Σ u_k)² ÷ (n² − n) real
σ_v² = Σ (v_k²) ÷ n − (Σ v_k)² ÷ n² σ_v² = Σ (v_k²) ÷ (n − 1) − (Σ v_k)² ÷ (n² − n)
σ_a² = σ_u² + σ_v² σ_a² = σ_u² + σ_v²
σ_x² = Σ (x_k²) ÷ n − (Σ x_k)² ÷ n² σ_x² = Σ (x_k²) ÷ (n − 1) − (Σ x_k)² ÷ (n² − n) real
σ_y² = Σ (y_k²) ÷ n − (Σ y_k)² ÷ n² σ_y² = Σ (y_k²) ÷ (n − 1) − (Σ y_k)² ÷ (n² − n)
σ_b² = σ_x² + σ_y² σ_b² = σ_x² + σ_y²
σ_uv = Σ (u_kv_v) ÷ n − (Σ u_k) (Σ v_k) ÷ n² σ_uv = Σ (u_kv_k) ÷ (n − 1) − (Σ u_k) (Σ v_k) ÷ (n² − n) real
σ_xy = Σ (x_kx_y) ÷ n − (Σ x_k) (Σ y_k) ÷ n² σ_xy = Σ (x_ky_k) ÷ (n − 1) − (Σ x_k) (Σ y_k) ÷ (n² − n) real

real (σ_ab) = Σ (u_kx_k) ÷ n
+ Σ (v_ky_k) ÷ n
− (Σ u_k) (Σ x_k) ÷ n²
− (Σ v_k) (Σ y_k) ÷ n²

real (σ_ab) = Σ (u_kx_k) ÷ (n − 1)
+ Σ (v_ky_k) ÷ (n − 1)
− (Σ u_k) (Σ x_k) ÷ (n² − n)
− (Σ v_k) (Σ y_k) ÷ (n² − n)
complex

imag (σ_ab) = Σ (v_kx_k) ÷ n
− Σ (u_ky_k) ÷ n
− (Σ v_k) (Σ x_k) ÷ n²
+ (Σ u_k) (Σ y_k) ÷ n²

imag (σ_ab) = Σ (v_kx_k) ÷ (n − 1)
− Σ (u_ky_k) ÷ (n − 1)
− (Σ v_k) (Σ x_k) ÷ (n² − n)
+ (Σ u_k) (Σ y_k) ÷ (n² − n)

The correlations, and the slopes and intercepts of the regression lines, are calculated from these.

These statistics formulas are augmented from those used for real numbers by the TI 59 calculator. They are presented here in detail because very few sources cover regression with complex variables.

The formulas for slope and intercept are consistent with the matrix solution given by whuber:

β̂ = (X^ctX)⁻¹X^ctz

where:

β̂ is a vector with two components:
- β̂₀ = intercept;
- β̂₁ = slope.
X is a matrix with two columns and one row for each pair of data points:
- every X_k,1 is 1;
- each X_k,2 is the kth member of the independent data set.
X^ct is the conjugate transpose of X.
(X^ctX)⁻¹ is the matrix inverse of X^ctX.
z is a vector containing the dependent data set, where z_k is the kth member of the dependent data set.

Although many authors give the corresponding matrix formula for real numbers, whuber is one of the few to develop it for the complex case.

Home Page.

	• clears all statistical data so that a fresh calculation can begin • enables the univar and bivar radio buttons • does not affect the stack
weight n n − 1	• selects n or n − 1 weighting
data count	• in univariate mode, tells how many points are stored • in bivariate mode, tell how many pairs of data points are stored
univar	• selects univariate mode • disables buttons specific to bivariate mode
	• inserts a data point popped from stack register a • disables the univar and bivar radio buttons • operates only in univariate mode
bivar	• selects bivariate mode • disables buttons specific to univariate mode
	• inserts a pair of data points popped from stack registers a and b • disables the univar and bivar radio buttons • operates only in bivariate mode

mean of the a values		mean of the b values
variance of the a values		variance of the b values
variance of a's real parts		variance of b's real parts
variance of a's imaginary parts		variance of b's imaginary parts
covariance between a's real and imaginary parts		covariance between b's real and imaginary parts
correlation between a's real and imaginary parts		correlation between b's real and imaginary parts
a independent, b dependent	slope of the regression line	slope of the regression line	b independent, a dependent
	b-intercept of the regression line	a-intercept of the regression line
covariance between the a and b data sets		correlation between the a and b data sets

slope	s′ = (σ_ab)^c ÷ σ_a²	a independent, b dependent
intercept	t′ = μ_b − s′μ_a
regression line	b = t′ + s′a
slope	s″ = σ_ab ÷ σ_b²	b independent, a dependent
intercept	t″ = μ_a − s″μ_b
regression line	a = t″ + s″b

	Internal accumulators all real
univariate and bivariate	Σ u_k	Σ v_k	Σ (u_k²)	Σ (v_k²)	Σ (u_kv_k)	n
bivariate only	Σ x_k	Σ y_k	Σ (x_k²)	Σ (y_k²)	Σ (x_ky_k)
bivariate only	Σ (u_k y_k)	Σ (v_k y_k)	Σ (u_k x_k)	Σ (v_k x_k)