Statistics.

The postfix complex calculator has facilities to figure simple descriptive statistics, both univariate and bivariate.

 stats univar weight n n−1 bivar data count

Some preliminaries:

♦ For univariate calculations, n stands for the number of points, and ak for any individual point.

♦ For bivariate calculations, n represents the number of pairs of points; within each pair, ak is one member and bk the other. It is possible to regard the ak's as independent variables, and the bk's as dependent — or vice versa. The letters a and b were chosen to correspond to their data sources, stack registers a and b respectively.

♦ The subscript k varies from 1 to n inclusive.

♦ Each complex number can be broken into real and imaginary parts: ak = uk + ivk and bk = xk + iyk; where uk, vk, xk and yk are real numbers.

♦ The Greek majuscule sigma is used for summation. For instance, the sum of all the ak's is written Σ ak. In every case, the index of summation runs between 1 and n. Note the difference between Σ (xk2) and (Σ xk)2.

♦ The Greek minuscule mu, suitably subscripted, is used for means. For instance μa is the mean of the ak's, and μx is the mean of the xk's. Often when the statistics of real numbers is discussed, the mean of the ak's is denoted , but the overbar is also the symbol for complex conjugation, so it is avoided here.

♦ The superscript c is used for complex conjugation: akc = (uk + ivk)c = ukivk.

♦ The Greek minuscule sigma, subscripted and squared, is used for variances, which turn out to be real numbers even with complex data. For example, σa2 is the variance of the ak's. The standard deviation, not directly provided by the calculator, is the nonnegative square root of the variance: for instance, σa is the standard deviation of the ak's. Meanwhile, σab (not squared) stands for the covariance between the ak's and bk's.

♦ Variances and covariances can be weighted according to n or n − 1. If an entire population is measured, n weighting is the usual choice; if only a sample of the population is measured, n − 1 is typically preferred.

♦ In the bivariate case, ρab is used for the correlation of the ak's and bk's.

♦ An additional feature is that the covariance and correlation between ak's real and imaginary parts uk and vk are available; similarly between bk's parts xk and yk. Statistics between radius and angle are not attempted, one reason being that there are multiple plausible ways to define the mean of a set of angles.

♦ Notation aac denotes the square of the magnitude of a, where magnitude is a synonym for radius, and is a real number. The interpretation is a(ac), not (aa)c. It equals aca.

Here are the statistical controls.

 • clears all statistical data so that a fresh calculation can begin • enables the univar and bivar radio buttons • does not affect the stack weight n n − 1 • selects n or n − 1 weighting data count • in univariate mode, tells how many points are stored • in bivariate mode, tell how many pairs of data points are stored univar • selects univariate mode • disables buttons specific to bivariate mode • inserts a data point popped from stack register a • disables the univar and bivar radio buttons • operates only in univariate mode bivar • selects bivariate mode • disables buttons specific to univariate mode • inserts a pair of data points popped from stack registers a and b • disables the univar and bivar radio buttons • operates only in bivariate mode

The following operations produce a result and push it into register a of the stack. To calculate anything, at least two points are required.

 mean of the a values mean of the b values variance of the a values variance of the b values variance of a's real parts variance of b's real parts variance of a's imaginary parts variance of b's imaginary parts covariance between a's real and imaginary parts covariance between b's real and imaginary parts correlation between a's real and imaginary parts correlation between b's real and imaginary parts a independent,b dependent slope of the regression line slope of the regression line b independent,a dependent b-intercept of the regression line a-intercept of the regression line covariance between the a and b data sets correlation between the a and b data sets

On the buttons, the notation b(a) is to suggest b as a function of a, hence a independent and b dependent. Regression is linear least squares.

Should the mean of a's real parts be required, it can be found as the real part of the mean of the (complex) a values; similarly for the imaginaries; further similarly for the b's.

It is not possible to change between the univariate and bivariate modes while statistical data is stored; this is to prevent the garbling of data.

The calculator employs procedures equivalent to the following "textbook" formulas:

item n weighting n − 1 weighting field a mean μa = Σ ak ÷ n complex a variance σa2 = Σ (a − μa) (a − μa)c ÷ n σa2 = Σ (a − μa) (a − μa)c ÷ (n − 1) real a real part variance σu2 = Σ (u − μu)2 ÷ n σu2 = Σ (u − μu)2 ÷ (n − 1) a imag part variance σv2 = Σ (v − μv)2 ÷ n σv2 = Σ (v − μv)2 ÷ (n − 1) a real-imag covariance σuv = Σ (uk − μu) (vk − μv) ÷ n σuv = Σ (uk − μu) (vk − μv) ÷ (n − 1) real a real-imag correlation ρuv = σuv ÷ (σuσv) b mean μb = Σ bk ÷ n complex b variance σb2 = Σ (bk − μb) (bk − μb)c ÷ n σb2 = Σ (bk − μb) (bk − μb)c ÷ (n − 1) real b real part variance σx2 = Σ (x − μx)2 ÷ n σx2 = Σ (x − μx)2 ÷ (n − 1) b imag part variance σy2 = Σ (y − μy)2 ÷ n σy2 = Σ (y − μy)2 ÷ (n − 1) b real-imag covariance σxy = Σ (xk − μx) (yk − μy) ÷ n σxy = Σ (xk − μx) (yk − μy) ÷ (n − 1) real b real-imag correlation ρxy = σxy ÷ (σxσy) a-b covariance σab = Σ (ak − μa) (bk − μb)c ÷ n σab = Σ (ak − μa) (bk − μb)c ÷ (n − 1) complex a-b correlation ρab = σab ÷ (σaσb)

An equally valid a-b covariance definition would have conjugated (a − μa) rather than (b − μb). Such a change would have the effect of conjugating the covariance itself, and consequently the correlation. Another way to conjugate the covariance and correlation is to exchange the a and b data sets.

The correlation will have a magnitude that is no greater than one, and will have a magnitude of exactly unity whenever the regression line fits the data perfectly. Note that in the correlation formulas, the σ's are not squared.

The table below gives formulas for the two a-b regression lines, which in general do not coincide, but are often close.

 slope s′ = (σab)c ÷ σa2 a independent,b dependent intercept t′ = μb − s′μa regression line b = t′ + s′a slope s″ = σab ÷ σb2 b independent,a dependent intercept t″ = μa − s″μb regression line a = t″ + s″b

Contrast the conjugation of σab in s′ versus the lack of conjugation in s″. Thus the product of s′ and s″ will be a real number.

For reasons of efficiency, some formulas used by the implementation differ from the "textbook" formulas above.

Each time a data item is inserted, the calculator increments the accumulators listed in the table below; not stored is the data item itself.

 Internal accumulatorsall real univariate and bivariate Σ uk Σ vk Σ (uk2) Σ (vk2) Σ (ukvk) n bivariate only Σ xk Σ yk Σ (xk2) Σ (yk2) Σ (xkyk) Σ (uk yk) Σ (vk yk) Σ (uk xk) Σ (vk xk)

Using values in the accumulators, the calculator can produce various statistics on demand, with no need to iterate through every data item. In the table below are formulas for the variances and covariances; the real and imaginary parts of σab are given separately for ease of reading.

n weightingn − 1 weightingfield
σu2 = Σ (uk2) ÷ n − (Σ uk)2 ÷ n2 σu2 = Σ (uk2) ÷ (n − 1) − (Σ uk)2 ÷ (n2n) real
σv2 = Σ (vk2) ÷ n − (Σ vk)2 ÷ n2 σv2 = Σ (vk2) ÷ (n − 1) − (Σ vk)2 ÷ (n2n)
σa2 = σu2 + σv2 σa2 = σu2 + σv2
σx2 = Σ (xk2) ÷ n − (Σ xk)2 ÷ n2 σx2 = Σ (xk2) ÷ (n − 1) − (Σ xk)2 ÷ (n2n) real
σy2 = Σ (yk2) ÷ n − (Σ yk)2 ÷ n2 σy2 = Σ (yk2) ÷ (n − 1) − (Σ yk)2 ÷ (n2n)
σb2 = σx2 + σy2 σb2 = σx2 + σy2
σuv = Σ (ukvv) ÷ n − (Σ uk) (Σ vk) ÷ n2 σuv = Σ (ukvk) ÷ (n − 1) − (Σ uk) (Σ vk) ÷ (n2n) real
σxy = Σ (xkxy) ÷ n − (Σ xk) (Σ yk) ÷ n2 σxy = Σ (xkyk) ÷ (n − 1) − (Σ xk) (Σ yk) ÷ (n2n) real
 real (σab) = Σ (ukxk) ÷ n + Σ (vkyk) ÷ n − (Σ uk) (Σ xk) ÷ n2 − (Σ vk) (Σ yk) ÷ n2
 real (σab) = Σ (ukxk) ÷ (n − 1) + Σ (vkyk) ÷ (n − 1) − (Σ uk) (Σ xk) ÷ (n2 − n) − (Σ vk) (Σ yk) ÷ (n2 − n)
complex
 imag (σab) = Σ (vkxk) ÷ n − Σ (ukyk) ÷ n − (Σ vk) (Σ xk) ÷ n2 + (Σ uk) (Σ yk) ÷ n2
 imag (σab) = Σ (vkxk) ÷ (n − 1) − Σ (ukyk) ÷ (n − 1) − (Σ vk) (Σ xk) ÷ (n2 − n) + (Σ uk) (Σ yk) ÷ (n2 − n)

The correlations, and the slopes and intercepts of the regression lines, are calculated from these.

These statistics formulas are augmented from those used for real numbers by the TI 59 calculator. They are presented here in detail because very few sources cover regression with complex variables.

The formulas for slope and intercept are consistent with the matrix solution given by whuber:

β̂ = (XctX)−1Xctz

where:

• β̂ is a vector with two components:
• β̂0 = intercept;
• β̂1 = slope.
• X is a matrix with two columns and one row for each pair of data points:
• every Xk,1 is 1;
• each Xk,2 is the kth member of the independent data set.
• Xct is the conjugate transpose of X.
• (XctX)−1 is the matrix inverse of XctX.
• z is a vector containing the dependent data set, where zk is the kth member of the dependent data set.

Although many authors give the corresponding matrix formula for real numbers, whuber is one of the few to develop it for the complex case.