• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# We did not author this file nor mantain it. Skip linting it.
2#pylint: skip-file
3# Copyright (c) 1999-2008 Gary Strangman; All Rights Reserved.
4#
5# Permission is hereby granted, free of charge, to any person obtaining a copy
6# of this software and associated documentation files (the "Software"), to deal
7# in the Software without restriction, including without limitation the rights
8# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9# copies of the Software, and to permit persons to whom the Software is
10# furnished to do so, subject to the following conditions:
11#
12# The above copyright notice and this permission notice shall be included in
13# all copies or substantial portions of the Software.
14#
15# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21# THE SOFTWARE.
22#
23# Comments and/or additions are welcome (send e-mail to:
24# strang@nmr.mgh.harvard.edu).
25#
26"""stats.py module
27
28(Requires pstat.py module.)
29
30#################################################
31#######  Written by:  Gary Strangman  ###########
32#######  Last modified:  Oct 31, 2008 ###########
33#################################################
34
35A collection of basic statistical functions for python.  The function
36names appear below.
37
38IMPORTANT:  There are really *3* sets of functions.  The first set has an 'l'
39prefix, which can be used with list or tuple arguments.  The second set has
40an 'a' prefix, which can accept NumPy array arguments.  These latter
41functions are defined only when NumPy is available on the system.  The third
42type has NO prefix (i.e., has the name that appears below).  Functions of
43this set are members of a "Dispatch" class, c/o David Ascher.  This class
44allows different functions to be called depending on the type of the passed
45arguments.  Thus, stats.mean is a member of the Dispatch class and
46stats.mean(range(20)) will call stats.lmean(range(20)) while
47stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)).
48This is a handy way to keep consistent function names when different
49argument types require different functions to be called.  Having
50implementated the Dispatch class, however, means that to get info on
51a given function, you must use the REAL function name ... that is
52"print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine,
53while "print stats.mean.__doc__" will print the doc for the Dispatch
54class.  NUMPY FUNCTIONS ('a' prefix) generally have more argument options
55but should otherwise be consistent with the corresponding list functions.
56
57Disclaimers:  The function list is obviously incomplete and, worse, the
58functions are not optimized.  All functions have been tested (some more
59so than others), but they are far from bulletproof.  Thus, as with any
60free software, no warranty or guarantee is expressed or implied. :-)  A
61few extra functions that don't appear in the list below can be found by
62interested treasure-hunters.  These functions don't necessarily have
63both list and array versions but were deemed useful
64
65CENTRAL TENDENCY:  geometricmean
66                   harmonicmean
67                   mean
68                   median
69                   medianscore
70                   mode
71
72MOMENTS:  moment
73          variation
74          skew
75          kurtosis
76          skewtest   (for Numpy arrays only)
77          kurtosistest (for Numpy arrays only)
78          normaltest (for Numpy arrays only)
79
80ALTERED VERSIONS:  tmean  (for Numpy arrays only)
81                   tvar   (for Numpy arrays only)
82                   tmin   (for Numpy arrays only)
83                   tmax   (for Numpy arrays only)
84                   tstdev (for Numpy arrays only)
85                   tsem   (for Numpy arrays only)
86                   describe
87
88FREQUENCY STATS:  itemfreq
89                  scoreatpercentile
90                  percentileofscore
91                  histogram
92                  cumfreq
93                  relfreq
94
95VARIABILITY:  obrientransform
96              samplevar
97              samplestdev
98              signaltonoise (for Numpy arrays only)
99              var
100              stdev
101              sterr
102              sem
103              z
104              zs
105              zmap (for Numpy arrays only)
106
107TRIMMING FCNS:  threshold (for Numpy arrays only)
108                trimboth
109                trim1
110                round (round all vals to 'n' decimals; Numpy only)
111
112CORRELATION FCNS:  covariance  (for Numpy arrays only)
113                   correlation (for Numpy arrays only)
114                   paired
115                   pearsonr
116                   spearmanr
117                   pointbiserialr
118                   kendalltau
119                   linregress
120
121INFERENTIAL STATS:  ttest_1samp
122                    ttest_ind
123                    ttest_rel
124                    chisquare
125                    ks_2samp
126                    mannwhitneyu
127                    ranksums
128                    wilcoxont
129                    kruskalwallish
130                    friedmanchisquare
131
132PROBABILITY CALCS:  chisqprob
133                    erfcc
134                    zprob
135                    ksprob
136                    fprob
137                    betacf
138                    gammln
139                    betai
140
141ANOVA FUNCTIONS:  F_oneway
142                  F_value
143
144SUPPORT FUNCTIONS:  writecc
145                    incr
146                    sign  (for Numpy arrays only)
147                    sum
148                    cumsum
149                    ss
150                    summult
151                    sumdiffsquared
152                    square_of_sums
153                    shellsort
154                    rankdata
155                    outputpairedstats
156                    findwithin
157"""
158## CHANGE LOG:
159## ===========
160## 09-07-21 ... added capability for getting the 'proportion' out of l/amannwhitneyu (but comment-disabled)
161## 08-10-31 ... fixed import LinearAlgebra bug before glm fcns
162## 07-11-26 ... conversion for numpy started
163## 07-05-16 ... added Lin's Concordance Correlation Coefficient (alincc) and acov
164## 05-08-21 ... added "Dice's coefficient"
165## 04-10-26 ... added ap2t(), an ugly fcn for converting p-vals to T-vals
166## 04-04-03 ... added amasslinregress() function to do regression on N-D arrays
167## 03-01-03 ... CHANGED VERSION TO 0.6
168##              fixed atsem() to properly handle limits=None case
169##              improved histogram and median functions (estbinwidth) and
170##                   fixed atvar() function (wrong answers for neg numbers?!?)
171## 02-11-19 ... fixed attest_ind and attest_rel for div-by-zero Overflows
172## 02-05-10 ... fixed lchisqprob indentation (failed when df=even)
173## 00-12-28 ... removed aanova() to separate module, fixed licensing to
174##                   match Python License, fixed doc string & imports
175## 00-04-13 ... pulled all "global" statements, except from aanova()
176##              added/fixed lots of documentation, removed io.py dependency
177##              changed to version 0.5
178## 99-11-13 ... added asign() function
179## 99-11-01 ... changed version to 0.4 ... enough incremental changes now
180## 99-10-25 ... added acovariance and acorrelation functions
181## 99-10-10 ... fixed askew/akurtosis to avoid divide-by-zero errors
182##              added aglm function (crude, but will be improved)
183## 99-10-04 ... upgraded acumsum, ass, asummult, asamplevar, avar, etc. to
184##                   all handle lists of 'dimension's and keepdims
185##              REMOVED ar0, ar2, ar3, ar4 and replaced them with around
186##              reinserted fixes for abetai to avoid math overflows
187## 99-09-05 ... rewrote achisqprob/aerfcc/aksprob/afprob/abetacf/abetai to
188##                   handle multi-dimensional arrays (whew!)
189## 99-08-30 ... fixed l/amoment, l/askew, l/akurtosis per D'Agostino (1990)
190##              added anormaltest per same reference
191##              re-wrote azprob to calc arrays of probs all at once
192## 99-08-22 ... edited attest_ind printing section so arrays could be rounded
193## 99-08-19 ... fixed amean and aharmonicmean for non-error(!) overflow on
194##                   short/byte arrays (mean of #s btw 100-300 = -150??)
195## 99-08-09 ... fixed asum so that the None case works for Byte arrays
196## 99-08-08 ... fixed 7/3 'improvement' to handle t-calcs on N-D arrays
197## 99-07-03 ... improved attest_ind, attest_rel (zero-division errortrap)
198## 99-06-24 ... fixed bug(?) in attest_ind (n1=a.shape[0])
199## 04/11/99 ... added asignaltonoise, athreshold functions, changed all
200##                   max/min in array section to N.maximum/N.minimum,
201##                   fixed square_of_sums to prevent integer overflow
202## 04/10/99 ... !!! Changed function name ... sumsquared ==> square_of_sums
203## 03/18/99 ... Added ar0, ar2, ar3 and ar4 rounding functions
204## 02/28/99 ... Fixed aobrientransform to return an array rather than a list
205## 01/15/99 ... Essentially ceased updating list-versions of functions (!!!)
206## 01/13/99 ... CHANGED TO VERSION 0.3
207##              fixed bug in a/lmannwhitneyu p-value calculation
208## 12/31/98 ... fixed variable-name bug in ldescribe
209## 12/19/98 ... fixed bug in findwithin (fcns needed pstat. prefix)
210## 12/16/98 ... changed amedianscore to return float (not array) for 1 score
211## 12/14/98 ... added atmin and atmax functions
212##              removed umath from import line (not needed)
213##              l/ageometricmean modified to reduce chance of overflows (take
214##                   nth root first, then multiply)
215## 12/07/98 ... added __version__variable (now 0.2)
216##              removed all 'stats.' from anova() fcn
217## 12/06/98 ... changed those functions (except shellsort) that altered
218##                   arguments in-place ... cumsum, ranksort, ...
219##              updated (and fixed some) doc-strings
220## 12/01/98 ... added anova() function (requires NumPy)
221##              incorporated Dispatch class
222## 11/12/98 ... added functionality to amean, aharmonicmean, ageometricmean
223##              added 'asum' function (added functionality to N.add.reduce)
224##              fixed both moment and amoment (two errors)
225##              changed name of skewness and askewness to skew and askew
226##              fixed (a)histogram (which sometimes counted points <lowerlimit)
227
228import pstat  # required 3rd party module
229import math, string, copy  # required python modules
230from types import *
231
232__version__ = 0.6
233
234############# DISPATCH CODE ##############
235
236
237class Dispatch:
238  """
239The Dispatch class, care of David Ascher, allows different functions to
240be called depending on the argument types.  This way, there can be one
241function name regardless of the argument type.  To access function doc
242in stats.py module, prefix the function with an 'l' or 'a' for list or
243array arguments, respectively.  That is, print stats.lmean.__doc__ or
244print stats.amean.__doc__ or whatever.
245"""
246
247  def __init__(self, *tuples):
248    self._dispatch = {}
249    for func, types in tuples:
250      for t in types:
251        if t in self._dispatch.keys():
252          raise ValueError, "can't have two dispatches on " + str(t)
253        self._dispatch[t] = func
254    self._types = self._dispatch.keys()
255
256  def __call__(self, arg1, *args, **kw):
257    if type(arg1) not in self._types:
258      raise TypeError, "don't know how to dispatch %s arguments" % type(arg1)
259    return apply(self._dispatch[type(arg1)], (arg1,) + args, kw)
260
261##########################################################################
262########################   LIST-BASED FUNCTIONS   ########################
263##########################################################################
264
265### Define these regardless
266
267####################################
268#######  CENTRAL TENDENCY  #########
269####################################
270
271
272def lgeometricmean(inlist):
273  """
274Calculates the geometric mean of the values in the passed list.
275That is:  n-th root of (x1 * x2 * ... * xn).  Assumes a '1D' list.
276
277Usage:   lgeometricmean(inlist)
278"""
279  mult = 1.0
280  one_over_n = 1.0 / len(inlist)
281  for item in inlist:
282    mult = mult * pow(item, one_over_n)
283  return mult
284
285
286def lharmonicmean(inlist):
287  """
288Calculates the harmonic mean of the values in the passed list.
289That is:  n / (1/x1 + 1/x2 + ... + 1/xn).  Assumes a '1D' list.
290
291Usage:   lharmonicmean(inlist)
292"""
293  sum = 0
294  for item in inlist:
295    sum = sum + 1.0 / item
296  return len(inlist) / sum
297
298
299def lmean(inlist):
300  """
301Returns the arithematic mean of the values in the passed list.
302Assumes a '1D' list, but will function on the 1st dim of an array(!).
303
304Usage:   lmean(inlist)
305"""
306  sum = 0
307  for item in inlist:
308    sum = sum + item
309  return sum / float(len(inlist))
310
311
312def lmedian(inlist, numbins=1000):
313  """
314Returns the computed median value of a list of numbers, given the
315number of bins to use for the histogram (more bins brings the computed value
316closer to the median score, default number of bins = 1000).  See G.W.
317Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics.
318
319Usage:   lmedian (inlist, numbins=1000)
320"""
321  (hist, smallest, binsize, extras) = histogram(
322      inlist, numbins, [min(inlist), max(inlist)])  # make histog
323  cumhist = cumsum(hist)  # make cumulative histogram
324  for i in range(len(cumhist)):  # get 1st(!) index holding 50%ile score
325    if cumhist[i] >= len(inlist) / 2.0:
326      cfbin = i
327      break
328  LRL = smallest + binsize * cfbin  # get lower read limit of that bin
329  cfbelow = cumhist[cfbin - 1]
330  freq = float(hist[cfbin])  # frequency IN the 50%ile bin
331  median = LRL + (
332      (len(inlist) / 2.0 - cfbelow) / float(freq)) * binsize  # median formula
333  return median
334
335
336def lmedianscore(inlist):
337  """
338Returns the 'middle' score of the passed list.  If there is an even
339number of scores, the mean of the 2 middle scores is returned.
340
341Usage:   lmedianscore(inlist)
342"""
343
344  newlist = copy.deepcopy(inlist)
345  newlist.sort()
346  if len(newlist) % 2 == 0:  # if even number of scores, average middle 2
347    index = len(newlist) / 2  # integer division correct
348    median = float(newlist[index] + newlist[index - 1]) / 2
349  else:
350    index = len(newlist) / 2  # int divsion gives mid value when count from 0
351    median = newlist[index]
352  return median
353
354
355def lmode(inlist):
356  """
357Returns a list of the modal (most common) score(s) in the passed
358list.  If there is more than one such score, all are returned.  The
359bin-count for the mode(s) is also returned.
360
361Usage:   lmode(inlist)
362Returns: bin-count for mode(s), a list of modal value(s)
363"""
364
365  scores = pstat.unique(inlist)
366  scores.sort()
367  freq = []
368  for item in scores:
369    freq.append(inlist.count(item))
370  maxfreq = max(freq)
371  mode = []
372  stillmore = 1
373  while stillmore:
374    try:
375      indx = freq.index(maxfreq)
376      mode.append(scores[indx])
377      del freq[indx]
378      del scores[indx]
379    except ValueError:
380      stillmore = 0
381  return maxfreq, mode
382
383####################################
384############  MOMENTS  #############
385####################################
386
387
388def lmoment(inlist, moment=1):
389  """
390Calculates the nth moment about the mean for a sample (defaults to
391the 1st moment).  Used to calculate coefficients of skewness and kurtosis.
392
393Usage:   lmoment(inlist,moment=1)
394Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r)
395"""
396  if moment == 1:
397    return 0.0
398  else:
399    mn = mean(inlist)
400    n = len(inlist)
401    s = 0
402    for x in inlist:
403      s = s + (x - mn)**moment
404    return s / float(n)
405
406
407def lvariation(inlist):
408  """
409Returns the coefficient of variation, as defined in CRC Standard
410Probability and Statistics, p.6.
411
412Usage:   lvariation(inlist)
413"""
414  return 100.0 * samplestdev(inlist) / float(mean(inlist))
415
416
417def lskew(inlist):
418  """
419Returns the skewness of a distribution, as defined in Numerical
420Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
421
422Usage:   lskew(inlist)
423"""
424  return moment(inlist, 3) / pow(moment(inlist, 2), 1.5)
425
426
427def lkurtosis(inlist):
428  """
429Returns the kurtosis of a distribution, as defined in Numerical
430Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
431
432Usage:   lkurtosis(inlist)
433"""
434  return moment(inlist, 4) / pow(moment(inlist, 2), 2.0)
435
436
437def ldescribe(inlist):
438  """
439Returns some descriptive statistics of the passed list (assumed to be 1D).
440
441Usage:   ldescribe(inlist)
442Returns: n, mean, standard deviation, skew, kurtosis
443"""
444  n = len(inlist)
445  mm = (min(inlist), max(inlist))
446  m = mean(inlist)
447  sd = stdev(inlist)
448  sk = skew(inlist)
449  kurt = kurtosis(inlist)
450  return n, mm, m, sd, sk, kurt
451
452####################################
453#######  FREQUENCY STATS  ##########
454####################################
455
456
457def litemfreq(inlist):
458  """
459Returns a list of pairs.  Each pair consists of one of the scores in inlist
460and it's frequency count.  Assumes a 1D list is passed.
461
462Usage:   litemfreq(inlist)
463Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
464"""
465  scores = pstat.unique(inlist)
466  scores.sort()
467  freq = []
468  for item in scores:
469    freq.append(inlist.count(item))
470  return pstat.abut(scores, freq)
471
472
473def lscoreatpercentile(inlist, percent):
474  """
475Returns the score at a given percentile relative to the distribution
476given by inlist.
477
478Usage:   lscoreatpercentile(inlist,percent)
479"""
480  if percent > 1:
481    print '\nDividing percent>1 by 100 in lscoreatpercentile().\n'
482    percent = percent / 100.0
483  targetcf = percent * len(inlist)
484  h, lrl, binsize, extras = histogram(inlist)
485  cumhist = cumsum(copy.deepcopy(h))
486  for i in range(len(cumhist)):
487    if cumhist[i] >= targetcf:
488      break
489  score = binsize * (
490      (targetcf - cumhist[i - 1]) / float(h[i])) + (lrl + binsize * i)
491  return score
492
493
494def lpercentileofscore(inlist, score, histbins=10, defaultlimits=None):
495  """
496Returns the percentile value of a score relative to the distribution
497given by inlist.  Formula depends on the values used to histogram the data(!).
498
499Usage:   lpercentileofscore(inlist,score,histbins=10,defaultlimits=None)
500"""
501
502  h, lrl, binsize, extras = histogram(inlist, histbins, defaultlimits)
503  cumhist = cumsum(copy.deepcopy(h))
504  i = int((score - lrl) / float(binsize))
505  pct = (cumhist[i - 1] + (
506      (score -
507       (lrl + binsize * i)) / float(binsize)) * h[i]) / float(len(inlist)) * 100
508  return pct
509
510
511def lhistogram(inlist, numbins=10, defaultreallimits=None, printextras=0):
512  """
513Returns (i) a list of histogram bin counts, (ii) the smallest value
514of the histogram binning, and (iii) the bin width (the last 2 are not
515necessarily integers).  Default number of bins is 10.  If no sequence object
516is given for defaultreallimits, the routine picks (usually non-pretty) bins
517spanning all the numbers in the inlist.
518
519Usage:   lhistogram (inlist, numbins=10,
520defaultreallimits=None,suppressoutput=0)
521Returns: list of bin values, lowerreallimit, binsize, extrapoints
522"""
523  if (defaultreallimits <> None):
524    if type(defaultreallimits) not in [ListType, TupleType] or len(
525        defaultreallimits) == 1:  # only one limit given, assumed to be lower one & upper is calc'd
526      lowerreallimit = defaultreallimits
527      upperreallimit = 1.000001 * max(inlist)
528    else:  # assume both limits given
529      lowerreallimit = defaultreallimits[0]
530      upperreallimit = defaultreallimits[1]
531    binsize = (upperreallimit - lowerreallimit) / float(numbins)
532  else:  # no limits given for histogram, both must be calc'd
533    estbinwidth = (max(inlist) -
534                   min(inlist)) / float(numbins) + 1e-6  #1=>cover all
535    binsize = ((max(inlist) - min(inlist) + estbinwidth)) / float(numbins)
536    lowerreallimit = min(inlist) - binsize / 2  #lower real limit,1st bin
537  bins = [0] * (numbins)
538  extrapoints = 0
539  for num in inlist:
540    try:
541      if (num - lowerreallimit) < 0:
542        extrapoints = extrapoints + 1
543      else:
544        bintoincrement = int((num - lowerreallimit) / float(binsize))
545        bins[bintoincrement] = bins[bintoincrement] + 1
546    except:
547      extrapoints = extrapoints + 1
548  if (extrapoints > 0 and printextras == 1):
549    print '\nPoints outside given histogram range =', extrapoints
550  return (bins, lowerreallimit, binsize, extrapoints)
551
552
553def lcumfreq(inlist, numbins=10, defaultreallimits=None):
554  """
555Returns a cumulative frequency histogram, using the histogram function.
556
557Usage:   lcumfreq(inlist,numbins=10,defaultreallimits=None)
558Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
559"""
560  h, l, b, e = histogram(inlist, numbins, defaultreallimits)
561  cumhist = cumsum(copy.deepcopy(h))
562  return cumhist, l, b, e
563
564
565def lrelfreq(inlist, numbins=10, defaultreallimits=None):
566  """
567Returns a relative frequency histogram, using the histogram function.
568
569Usage:   lrelfreq(inlist,numbins=10,defaultreallimits=None)
570Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
571"""
572  h, l, b, e = histogram(inlist, numbins, defaultreallimits)
573  for i in range(len(h)):
574    h[i] = h[i] / float(len(inlist))
575  return h, l, b, e
576
577####################################
578#####  VARIABILITY FUNCTIONS  ######
579####################################
580
581
582def lobrientransform(*args):
583  """
584Computes a transform on input data (any number of columns).  Used to
585test for homogeneity of variance prior to running one-way stats.  From
586Maxwell and Delaney, p.112.
587
588Usage:   lobrientransform(*args)
589Returns: transformed data for use in an ANOVA
590"""
591  TINY = 1e-10
592  k = len(args)
593  n = [0.0] * k
594  v = [0.0] * k
595  m = [0.0] * k
596  nargs = []
597  for i in range(k):
598    nargs.append(copy.deepcopy(args[i]))
599    n[i] = float(len(nargs[i]))
600    v[i] = var(nargs[i])
601    m[i] = mean(nargs[i])
602  for j in range(k):
603    for i in range(n[j]):
604      t1 = (n[j] - 1.5) * n[j] * (nargs[j][i] - m[j])**2
605      t2 = 0.5 * v[j] * (n[j] - 1.0)
606      t3 = (n[j] - 1.0) * (n[j] - 2.0)
607      nargs[j][i] = (t1 - t2) / float(t3)
608  check = 1
609  for j in range(k):
610    if v[j] - mean(nargs[j]) > TINY:
611      check = 0
612  if check <> 1:
613    raise ValueError, 'Problem in obrientransform.'
614  else:
615    return nargs
616
617
618def lsamplevar(inlist):
619  """
620Returns the variance of the values in the passed list using
621N for the denominator (i.e., DESCRIBES the sample variance only).
622
623Usage:   lsamplevar(inlist)
624"""
625  n = len(inlist)
626  mn = mean(inlist)
627  deviations = []
628  for item in inlist:
629    deviations.append(item - mn)
630  return ss(deviations) / float(n)
631
632
633def lsamplestdev(inlist):
634  """
635Returns the standard deviation of the values in the passed list using
636N for the denominator (i.e., DESCRIBES the sample stdev only).
637
638Usage:   lsamplestdev(inlist)
639"""
640  return math.sqrt(samplevar(inlist))
641
642
643def lcov(x, y, keepdims=0):
644  """
645Returns the estimated covariance of the values in the passed
646array (i.e., N-1).  Dimension can equal None (ravel array first), an
647integer (the dimension over which to operate), or a sequence (operate
648over multiple dimensions).  Set keepdims=1 to return an array with the
649same number of dimensions as inarray.
650
651Usage:   lcov(x,y,keepdims=0)
652"""
653
654  n = len(x)
655  xmn = mean(x)
656  ymn = mean(y)
657  xdeviations = [0] * len(x)
658  ydeviations = [0] * len(y)
659  for i in range(len(x)):
660    xdeviations[i] = x[i] - xmn
661    ydeviations[i] = y[i] - ymn
662  ss = 0.0
663  for i in range(len(xdeviations)):
664    ss = ss + xdeviations[i] * ydeviations[i]
665  return ss / float(n - 1)
666
667
668def lvar(inlist):
669  """
670Returns the variance of the values in the passed list using N-1
671for the denominator (i.e., for estimating population variance).
672
673Usage:   lvar(inlist)
674"""
675  n = len(inlist)
676  mn = mean(inlist)
677  deviations = [0] * len(inlist)
678  for i in range(len(inlist)):
679    deviations[i] = inlist[i] - mn
680  return ss(deviations) / float(n - 1)
681
682
683def lstdev(inlist):
684  """
685Returns the standard deviation of the values in the passed list
686using N-1 in the denominator (i.e., to estimate population stdev).
687
688Usage:   lstdev(inlist)
689"""
690  return math.sqrt(var(inlist))
691
692
693def lsterr(inlist):
694  """
695Returns the standard error of the values in the passed list using N-1
696in the denominator (i.e., to estimate population standard error).
697
698Usage:   lsterr(inlist)
699"""
700  return stdev(inlist) / float(math.sqrt(len(inlist)))
701
702
703def lsem(inlist):
704  """
705Returns the estimated standard error of the mean (sx-bar) of the
706values in the passed list.  sem = stdev / sqrt(n)
707
708Usage:   lsem(inlist)
709"""
710  sd = stdev(inlist)
711  n = len(inlist)
712  return sd / math.sqrt(n)
713
714
715def lz(inlist, score):
716  """
717Returns the z-score for a given input score, given that score and the
718list from which that score came.  Not appropriate for population calculations.
719
720Usage:   lz(inlist, score)
721"""
722  z = (score - mean(inlist)) / samplestdev(inlist)
723  return z
724
725
726def lzs(inlist):
727  """
728Returns a list of z-scores, one for each score in the passed list.
729
730Usage:   lzs(inlist)
731"""
732  zscores = []
733  for item in inlist:
734    zscores.append(z(inlist, item))
735  return zscores
736
737####################################
738#######  TRIMMING FUNCTIONS  #######
739####################################
740
741
742def ltrimboth(l, proportiontocut):
743  """
744Slices off the passed proportion of items from BOTH ends of the passed
745list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost'
74610% of scores.  Assumes list is sorted by magnitude.  Slices off LESS if
747proportion results in a non-integer slice index (i.e., conservatively
748slices off proportiontocut).
749
750Usage:   ltrimboth (l,proportiontocut)
751Returns: trimmed version of list l
752"""
753  lowercut = int(proportiontocut * len(l))
754  uppercut = len(l) - lowercut
755  return l[lowercut:uppercut]
756
757
758def ltrim1(l, proportiontocut, tail='right'):
759  """
760Slices off the passed proportion of items from ONE end of the passed
761list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
76210% of scores).  Slices off LESS if proportion results in a non-integer
763slice index (i.e., conservatively slices off proportiontocut).
764
765Usage:   ltrim1 (l,proportiontocut,tail='right')  or set tail='left'
766Returns: trimmed version of list l
767"""
768  if tail == 'right':
769    lowercut = 0
770    uppercut = len(l) - int(proportiontocut * len(l))
771  elif tail == 'left':
772    lowercut = int(proportiontocut * len(l))
773    uppercut = len(l)
774  return l[lowercut:uppercut]
775
776####################################
777#####  CORRELATION FUNCTIONS  ######
778####################################
779
780
781def lpaired(x, y):
782  """
783Interactively determines the type of data and then runs the
784appropriated statistic for paired group data.
785
786Usage:   lpaired(x,y)
787Returns: appropriate statistic name, value, and probability
788"""
789  samples = ''
790  while samples not in ['i', 'r', 'I', 'R', 'c', 'C']:
791    print '\nIndependent or related samples, or correlation (i,r,c): ',
792    samples = raw_input()
793
794  if samples in ['i', 'I', 'r', 'R']:
795    print '\nComparing variances ...',
796    # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112
797    r = obrientransform(x, y)
798    f, p = F_oneway(pstat.colex(r, 0), pstat.colex(r, 1))
799    if p < 0.05:
800      vartype = 'unequal, p=' + str(round(p, 4))
801    else:
802      vartype = 'equal'
803    print vartype
804    if samples in ['i', 'I']:
805      if vartype[0] == 'e':
806        t, p = ttest_ind(x, y, 0)
807        print '\nIndependent samples t-test:  ', round(t, 4), round(p, 4)
808      else:
809        if len(x) > 20 or len(y) > 20:
810          z, p = ranksums(x, y)
811          print '\nRank Sums test (NONparametric, n>20):  ', round(z, 4), round(
812              p, 4)
813        else:
814          u, p = mannwhitneyu(x, y)
815          print '\nMann-Whitney U-test (NONparametric, ns<20):  ', round(
816              u, 4), round(p, 4)
817
818    else:  # RELATED SAMPLES
819      if vartype[0] == 'e':
820        t, p = ttest_rel(x, y, 0)
821        print '\nRelated samples t-test:  ', round(t, 4), round(p, 4)
822      else:
823        t, p = ranksums(x, y)
824        print '\nWilcoxon T-test (NONparametric):  ', round(t, 4), round(p, 4)
825  else:  # CORRELATION ANALYSIS
826    corrtype = ''
827    while corrtype not in ['c', 'C', 'r', 'R', 'd', 'D']:
828      print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
829      corrtype = raw_input()
830    if corrtype in ['c', 'C']:
831      m, b, r, p, see = linregress(x, y)
832      print '\nLinear regression for continuous variables ...'
833      lol = [['Slope', 'Intercept', 'r', 'Prob', 'SEestimate'],
834             [round(m, 4), round(b, 4), round(r, 4), round(p, 4), round(see, 4)]
835            ]
836      pstat.printcc(lol)
837    elif corrtype in ['r', 'R']:
838      r, p = spearmanr(x, y)
839      print '\nCorrelation for ranked variables ...'
840      print "Spearman's r: ", round(r, 4), round(p, 4)
841    else:  # DICHOTOMOUS
842      r, p = pointbiserialr(x, y)
843      print '\nAssuming x contains a dichotomous variable ...'
844      print 'Point Biserial r: ', round(r, 4), round(p, 4)
845  print '\n\n'
846  return None
847
848
849def lpearsonr(x, y):
850  """
851Calculates a Pearson correlation coefficient and the associated
852probability value.  Taken from Heiman's Basic Statistics for the Behav.
853Sci (2nd), p.195.
854
855Usage:   lpearsonr(x,y)      where x and y are equal-length lists
856Returns: Pearson's r value, two-tailed p-value
857"""
858  TINY = 1.0e-30
859  if len(x) <> len(y):
860    raise ValueError, 'Input values not paired in pearsonr.  Aborting.'
861  n = len(x)
862  x = map(float, x)
863  y = map(float, y)
864  xmean = mean(x)
865  ymean = mean(y)
866  r_num = n * (summult(x, y)) - sum(x) * sum(y)
867  r_den = math.sqrt((n * ss(x) - square_of_sums(x)) *
868                    (n * ss(y) - square_of_sums(y)))
869  r = (r_num / r_den)  # denominator already a float
870  df = n - 2
871  t = r * math.sqrt(df / ((1.0 - r + TINY) * (1.0 + r + TINY)))
872  prob = betai(0.5 * df, 0.5, df / float(df + t * t))
873  return r, prob
874
875
876def llincc(x, y):
877  """
878Calculates Lin's concordance correlation coefficient.
879
880Usage:   alincc(x,y)    where x, y are equal-length arrays
881Returns: Lin's CC
882"""
883  covar = lcov(x, y) * (len(x) - 1) / float(len(x))  # correct denom to n
884  xvar = lvar(x) * (len(x) - 1) / float(len(x))  # correct denom to n
885  yvar = lvar(y) * (len(y) - 1) / float(len(y))  # correct denom to n
886  lincc = (2 * covar) / ((xvar + yvar) + ((amean(x) - amean(y))**2))
887  return lincc
888
889
890def lspearmanr(x, y):
891  """
892Calculates a Spearman rank-order correlation coefficient.  Taken
893from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
894
895Usage:   lspearmanr(x,y)      where x and y are equal-length lists
896Returns: Spearman's r, two-tailed p-value
897"""
898  TINY = 1e-30
899  if len(x) <> len(y):
900    raise ValueError, 'Input values not paired in spearmanr.  Aborting.'
901  n = len(x)
902  rankx = rankdata(x)
903  ranky = rankdata(y)
904  dsq = sumdiffsquared(rankx, ranky)
905  rs = 1 - 6 * dsq / float(n * (n**2 - 1))
906  t = rs * math.sqrt((n - 2) / ((rs + 1.0) * (1.0 - rs)))
907  df = n - 2
908  probrs = betai(0.5 * df, 0.5, df / (df + t * t))  # t already a float
909  # probability values for rs are from part 2 of the spearman function in
910  # Numerical Recipies, p.510.  They are close to tables, but not exact. (?)
911  return rs, probrs
912
913
914def lpointbiserialr(x, y):
915  """
916Calculates a point-biserial correlation coefficient and the associated
917probability value.  Taken from Heiman's Basic Statistics for the Behav.
918Sci (1st), p.194.
919
920Usage:   lpointbiserialr(x,y)      where x,y are equal-length lists
921Returns: Point-biserial r, two-tailed p-value
922"""
923  TINY = 1e-30
924  if len(x) <> len(y):
925    raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr.  ABORTING.'
926  data = pstat.abut(x, y)
927  categories = pstat.unique(x)
928  if len(categories) <> 2:
929    raise ValueError, 'Exactly 2 categories required for pointbiserialr().'
930  else:  # there are 2 categories, continue
931    codemap = pstat.abut(categories, range(2))
932    recoded = pstat.recode(data, codemap, 0)
933    x = pstat.linexand(data, 0, categories[0])
934    y = pstat.linexand(data, 0, categories[1])
935    xmean = mean(pstat.colex(x, 1))
936    ymean = mean(pstat.colex(y, 1))
937    n = len(data)
938    adjust = math.sqrt((len(x) / float(n)) * (len(y) / float(n)))
939    rpb = (ymean - xmean) / samplestdev(pstat.colex(data, 1)) * adjust
940    df = n - 2
941    t = rpb * math.sqrt(df / ((1.0 - rpb + TINY) * (1.0 + rpb + TINY)))
942    prob = betai(0.5 * df, 0.5, df / (df + t * t))  # t already a float
943    return rpb, prob
944
945
946def lkendalltau(x, y):
947  """
948Calculates Kendall's tau ... correlation of ordinal data.  Adapted
949from function kendl1 in Numerical Recipies.  Needs good test-routine.@@@
950
951Usage:   lkendalltau(x,y)
952Returns: Kendall's tau, two-tailed p-value
953"""
954  n1 = 0
955  n2 = 0
956  iss = 0
957  for j in range(len(x) - 1):
958    for k in range(j, len(y)):
959      a1 = x[j] - x[k]
960      a2 = y[j] - y[k]
961      aa = a1 * a2
962      if (aa):  # neither list has a tie
963        n1 = n1 + 1
964        n2 = n2 + 1
965        if aa > 0:
966          iss = iss + 1
967        else:
968          iss = iss - 1
969      else:
970        if (a1):
971          n1 = n1 + 1
972        else:
973          n2 = n2 + 1
974  tau = iss / math.sqrt(n1 * n2)
975  svar = (4.0 * len(x) + 10.0) / (9.0 * len(x) * (len(x) - 1))
976  z = tau / math.sqrt(svar)
977  prob = erfcc(abs(z) / 1.4142136)
978  return tau, prob
979
980
981def llinregress(x, y):
982  """
983Calculates a regression line on x,y pairs.
984
985Usage:   llinregress(x,y)      x,y are equal-length lists of x-y coordinates
986Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate
987"""
988  TINY = 1.0e-20
989  if len(x) <> len(y):
990    raise ValueError, 'Input values not paired in linregress.  Aborting.'
991  n = len(x)
992  x = map(float, x)
993  y = map(float, y)
994  xmean = mean(x)
995  ymean = mean(y)
996  r_num = float(n * (summult(x, y)) - sum(x) * sum(y))
997  r_den = math.sqrt((n * ss(x) - square_of_sums(x)) *
998                    (n * ss(y) - square_of_sums(y)))
999  r = r_num / r_den
1000  z = 0.5 * math.log((1.0 + r + TINY) / (1.0 - r + TINY))
1001  df = n - 2
1002  t = r * math.sqrt(df / ((1.0 - r + TINY) * (1.0 + r + TINY)))
1003  prob = betai(0.5 * df, 0.5, df / (df + t * t))
1004  slope = r_num / float(n * ss(x) - square_of_sums(x))
1005  intercept = ymean - slope * xmean
1006  sterrest = math.sqrt(1 - r * r) * samplestdev(y)
1007  return slope, intercept, r, prob, sterrest
1008
1009####################################
1010#####  INFERENTIAL STATISTICS  #####
1011####################################
1012
1013
1014def lttest_1samp(a, popmean, printit=0, name='Sample', writemode='a'):
1015  """
1016Calculates the t-obtained for the independent samples T-test on ONE group
1017of scores a, given a population mean.  If printit=1, results are printed
1018to the screen.  If printit='filename', the results are output to 'filename'
1019using the given writemode (default=append).  Returns t-value, and prob.
1020
1021Usage:   lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
1022Returns: t-value, two-tailed prob
1023"""
1024  x = mean(a)
1025  v = var(a)
1026  n = len(a)
1027  df = n - 1
1028  svar = ((n - 1) * v) / float(df)
1029  t = (x - popmean) / math.sqrt(svar * (1.0 / n))
1030  prob = betai(0.5 * df, 0.5, float(df) / (df + t * t))
1031
1032  if printit <> 0:
1033    statname = 'Single-sample T-test.'
1034    outputpairedstats(printit, writemode, 'Population', '--', popmean, 0, 0, 0,
1035                      name, n, x, v, min(a), max(a), statname, t, prob)
1036  return t, prob
1037
1038
1039def lttest_ind(a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
1040  """
1041Calculates the t-obtained T-test on TWO INDEPENDENT samples of
1042scores a, and b.  From Numerical Recipies, p.483.  If printit=1, results
1043are printed to the screen.  If printit='filename', the results are output
1044to 'filename' using the given writemode (default=append).  Returns t-value,
1045and prob.
1046
1047Usage:   lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a')
1048Returns: t-value, two-tailed prob
1049"""
1050  x1 = mean(a)
1051  x2 = mean(b)
1052  v1 = stdev(a)**2
1053  v2 = stdev(b)**2
1054  n1 = len(a)
1055  n2 = len(b)
1056  df = n1 + n2 - 2
1057  svar = ((n1 - 1) * v1 + (n2 - 1) * v2) / float(df)
1058  if not svar:
1059    svar = 1.0e-26
1060  t = (x1 - x2) / math.sqrt(svar * (1.0 / n1 + 1.0 / n2))
1061  prob = betai(0.5 * df, 0.5, df / (df + t * t))
1062
1063  if printit <> 0:
1064    statname = 'Independent samples T-test.'
1065    outputpairedstats(printit, writemode, name1, n1, x1, v1, min(a), max(a),
1066                      name2, n2, x2, v2, min(b), max(b), statname, t, prob)
1067  return t, prob
1068
1069
1070def lttest_rel(a,
1071               b,
1072               printit=0,
1073               name1='Sample1',
1074               name2='Sample2',
1075               writemode='a'):
1076  """
1077Calculates the t-obtained T-test on TWO RELATED samples of scores,
1078a and b.  From Numerical Recipies, p.483.  If printit=1, results are
1079printed to the screen.  If printit='filename', the results are output to
1080'filename' using the given writemode (default=append).  Returns t-value,
1081and prob.
1082
1083Usage:   lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a')
1084Returns: t-value, two-tailed prob
1085"""
1086  if len(a) <> len(b):
1087    raise ValueError, 'Unequal length lists in ttest_rel.'
1088  x1 = mean(a)
1089  x2 = mean(b)
1090  v1 = var(a)
1091  v2 = var(b)
1092  n = len(a)
1093  cov = 0
1094  for i in range(len(a)):
1095    cov = cov + (a[i] - x1) * (b[i] - x2)
1096  df = n - 1
1097  cov = cov / float(df)
1098  sd = math.sqrt((v1 + v2 - 2.0 * cov) / float(n))
1099  t = (x1 - x2) / sd
1100  prob = betai(0.5 * df, 0.5, df / (df + t * t))
1101
1102  if printit <> 0:
1103    statname = 'Related samples T-test.'
1104    outputpairedstats(printit, writemode, name1, n, x1, v1, min(a), max(a),
1105                      name2, n, x2, v2, min(b), max(b), statname, t, prob)
1106  return t, prob
1107
1108
1109def lchisquare(f_obs, f_exp=None):
1110  """
1111Calculates a one-way chi square for list of observed frequencies and returns
1112the result.  If no expected frequencies are given, the total N is assumed to
1113be equally distributed across all groups.
1114
1115Usage:   lchisquare(f_obs, f_exp=None)   f_obs = list of observed cell freq.
1116Returns: chisquare-statistic, associated p-value
1117"""
1118  k = len(f_obs)  # number of groups
1119  if f_exp == None:
1120    f_exp = [sum(f_obs) / float(k)] * len(f_obs)  # create k bins with = freq.
1121  chisq = 0
1122  for i in range(len(f_obs)):
1123    chisq = chisq + (f_obs[i] - f_exp[i])**2 / float(f_exp[i])
1124  return chisq, chisqprob(chisq, k - 1)
1125
1126
1127def lks_2samp(data1, data2):
1128  """
1129Computes the Kolmogorov-Smirnof statistic on 2 samples.  From
1130Numerical Recipies in C, page 493.
1131
1132Usage:   lks_2samp(data1,data2)   data1&2 are lists of values for 2 conditions
1133Returns: KS D-value, associated p-value
1134"""
1135  j1 = 0
1136  j2 = 0
1137  fn1 = 0.0
1138  fn2 = 0.0
1139  n1 = len(data1)
1140  n2 = len(data2)
1141  en1 = n1
1142  en2 = n2
1143  d = 0.0
1144  data1.sort()
1145  data2.sort()
1146  while j1 < n1 and j2 < n2:
1147    d1 = data1[j1]
1148    d2 = data2[j2]
1149    if d1 <= d2:
1150      fn1 = (j1) / float(en1)
1151      j1 = j1 + 1
1152    if d2 <= d1:
1153      fn2 = (j2) / float(en2)
1154      j2 = j2 + 1
1155    dt = (fn2 - fn1)
1156    if math.fabs(dt) > math.fabs(d):
1157      d = dt
1158  try:
1159    en = math.sqrt(en1 * en2 / float(en1 + en2))
1160    prob = ksprob((en + 0.12 + 0.11 / en) * abs(d))
1161  except:
1162    prob = 1.0
1163  return d, prob
1164
1165
1166def lmannwhitneyu(x, y):
1167  """
1168Calculates a Mann-Whitney U statistic on the provided scores and
1169returns the result.  Use only when the n in each condition is < 20 and
1170you have 2 independent samples of ranks.  NOTE: Mann-Whitney U is
1171significant if the u-obtained is LESS THAN or equal to the critical
1172value of U found in the tables.  Equivalent to Kruskal-Wallis H with
1173just 2 groups.
1174
1175Usage:   lmannwhitneyu(data)
1176Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
1177"""
1178  n1 = len(x)
1179  n2 = len(y)
1180  ranked = rankdata(x + y)
1181  rankx = ranked[0:n1]  # get the x-ranks
1182  ranky = ranked[n1:]  # the rest are y-ranks
1183  u1 = n1 * n2 + (n1 * (n1 + 1)) / 2.0 - sum(rankx)  # calc U for x
1184  u2 = n1 * n2 - u1  # remainder is U for y
1185  bigu = max(u1, u2)
1186  smallu = min(u1, u2)
1187  proportion = bigu / float(n1 * n2)
1188  T = math.sqrt(tiecorrect(ranked))  # correction factor for tied scores
1189  if T == 0:
1190    raise ValueError, 'All numbers are identical in lmannwhitneyu'
1191  sd = math.sqrt(T * n1 * n2 * (n1 + n2 + 1) / 12.0)
1192  z = abs((bigu - n1 * n2 / 2.0) / sd)  # normal approximation for prob calc
1193  return smallu, 1.0 - zprob(z)  #, proportion
1194
1195
1196def ltiecorrect(rankvals):
1197  """
1198Corrects for ties in Mann Whitney U and Kruskal Wallis H tests.  See
1199Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences.
1200New York: McGraw-Hill.  Code adapted from |Stat rankind.c code.
1201
1202Usage:   ltiecorrect(rankvals)
1203Returns: T correction factor for U or H
1204"""
1205  sorted, posn = shellsort(rankvals)
1206  n = len(sorted)
1207  T = 0.0
1208  i = 0
1209  while (i < n - 1):
1210    if sorted[i] == sorted[i + 1]:
1211      nties = 1
1212      while (i < n - 1) and (sorted[i] == sorted[i + 1]):
1213        nties = nties + 1
1214        i = i + 1
1215      T = T + nties**3 - nties
1216    i = i + 1
1217  T = T / float(n**3 - n)
1218  return 1.0 - T
1219
1220
1221def lranksums(x, y):
1222  """
1223Calculates the rank sums statistic on the provided scores and
1224returns the result.  Use only when the n in each condition is > 20 and you
1225have 2 independent samples of ranks.
1226
1227Usage:   lranksums(x,y)
1228Returns: a z-statistic, two-tailed p-value
1229"""
1230  n1 = len(x)
1231  n2 = len(y)
1232  alldata = x + y
1233  ranked = rankdata(alldata)
1234  x = ranked[:n1]
1235  y = ranked[n1:]
1236  s = sum(x)
1237  expected = n1 * (n1 + n2 + 1) / 2.0
1238  z = (s - expected) / math.sqrt(n1 * n2 * (n1 + n2 + 1) / 12.0)
1239  prob = 2 * (1.0 - zprob(abs(z)))
1240  return z, prob
1241
1242
1243def lwilcoxont(x, y):
1244  """
1245Calculates the Wilcoxon T-test for related samples and returns the
1246result.  A non-parametric T-test.
1247
1248Usage:   lwilcoxont(x,y)
1249Returns: a t-statistic, two-tail probability estimate
1250"""
1251  if len(x) <> len(y):
1252    raise ValueError, 'Unequal N in wilcoxont.  Aborting.'
1253  d = []
1254  for i in range(len(x)):
1255    diff = x[i] - y[i]
1256    if diff <> 0:
1257      d.append(diff)
1258  count = len(d)
1259  absd = map(abs, d)
1260  absranked = rankdata(absd)
1261  r_plus = 0.0
1262  r_minus = 0.0
1263  for i in range(len(absd)):
1264    if d[i] < 0:
1265      r_minus = r_minus + absranked[i]
1266    else:
1267      r_plus = r_plus + absranked[i]
1268  wt = min(r_plus, r_minus)
1269  mn = count * (count + 1) * 0.25
1270  se = math.sqrt(count * (count + 1) * (2.0 * count + 1.0) / 24.0)
1271  z = math.fabs(wt - mn) / se
1272  prob = 2 * (1.0 - zprob(abs(z)))
1273  return wt, prob
1274
1275
1276def lkruskalwallish(*args):
1277  """
1278The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
1279groups, requiring at least 5 subjects in each group.  This function
1280calculates the Kruskal-Wallis H-test for 3 or more independent samples
1281and returns the result.
1282
1283Usage:   lkruskalwallish(*args)
1284Returns: H-statistic (corrected for ties), associated p-value
1285"""
1286  args = list(args)
1287  n = [0] * len(args)
1288  all = []
1289  n = map(len, args)
1290  for i in range(len(args)):
1291    all = all + args[i]
1292  ranked = rankdata(all)
1293  T = tiecorrect(ranked)
1294  for i in range(len(args)):
1295    args[i] = ranked[0:n[i]]
1296    del ranked[0:n[i]]
1297  rsums = []
1298  for i in range(len(args)):
1299    rsums.append(sum(args[i])**2)
1300    rsums[i] = rsums[i] / float(n[i])
1301  ssbn = sum(rsums)
1302  totaln = sum(n)
1303  h = 12.0 / (totaln * (totaln + 1)) * ssbn - 3 * (totaln + 1)
1304  df = len(args) - 1
1305  if T == 0:
1306    raise ValueError, 'All numbers are identical in lkruskalwallish'
1307  h = h / float(T)
1308  return h, chisqprob(h, df)
1309
1310
1311def lfriedmanchisquare(*args):
1312  """
1313Friedman Chi-Square is a non-parametric, one-way within-subjects
1314ANOVA.  This function calculates the Friedman Chi-square test for repeated
1315measures and returns the result, along with the associated probability
1316value.  It assumes 3 or more repeated measures.  Only 3 levels requires a
1317minimum of 10 subjects in the study.  Four levels requires 5 subjects per
1318level(??).
1319
1320Usage:   lfriedmanchisquare(*args)
1321Returns: chi-square statistic, associated p-value
1322"""
1323  k = len(args)
1324  if k < 3:
1325    raise ValueError, 'Less than 3 levels.  Friedman test not appropriate.'
1326  n = len(args[0])
1327  data = apply(pstat.abut, tuple(args))
1328  for i in range(len(data)):
1329    data[i] = rankdata(data[i])
1330  ssbn = 0
1331  for i in range(k):
1332    ssbn = ssbn + sum(args[i])**2
1333  chisq = 12.0 / (k * n * (k + 1)) * ssbn - 3 * n * (k + 1)
1334  return chisq, chisqprob(chisq, k - 1)
1335
1336####################################
1337####  PROBABILITY CALCULATIONS  ####
1338####################################
1339
1340
1341def lchisqprob(chisq, df):
1342  """
1343Returns the (1-tailed) probability value associated with the provided
1344chi-square value and df.  Adapted from chisq.c in Gary Perlman's |Stat.
1345
1346Usage:   lchisqprob(chisq,df)
1347"""
1348  BIG = 20.0
1349
1350  def ex(x):
1351    BIG = 20.0
1352    if x < -BIG:
1353      return 0.0
1354    else:
1355      return math.exp(x)
1356
1357  if chisq <= 0 or df < 1:
1358    return 1.0
1359  a = 0.5 * chisq
1360  if df % 2 == 0:
1361    even = 1
1362  else:
1363    even = 0
1364  if df > 1:
1365    y = ex(-a)
1366  if even:
1367    s = y
1368  else:
1369    s = 2.0 * zprob(-math.sqrt(chisq))
1370  if (df > 2):
1371    chisq = 0.5 * (df - 1.0)
1372    if even:
1373      z = 1.0
1374    else:
1375      z = 0.5
1376    if a > BIG:
1377      if even:
1378        e = 0.0
1379      else:
1380        e = math.log(math.sqrt(math.pi))
1381      c = math.log(a)
1382      while (z <= chisq):
1383        e = math.log(z) + e
1384        s = s + ex(c * z - a - e)
1385        z = z + 1.0
1386      return s
1387    else:
1388      if even:
1389        e = 1.0
1390      else:
1391        e = 1.0 / math.sqrt(math.pi) / math.sqrt(a)
1392      c = 0.0
1393      while (z <= chisq):
1394        e = e * (a / float(z))
1395        c = c + e
1396        z = z + 1.0
1397      return (c * y + s)
1398  else:
1399    return s
1400
1401
1402def lerfcc(x):
1403  """
1404Returns the complementary error function erfc(x) with fractional
1405error everywhere less than 1.2e-7.  Adapted from Numerical Recipies.
1406
1407Usage:   lerfcc(x)
1408"""
1409  z = abs(x)
1410  t = 1.0 / (1.0 + 0.5 * z)
1411  ans = t * math.exp(-z * z - 1.26551223 + t * (1.00002368 + t * (
1412      0.37409196 + t * (0.09678418 + t * (-0.18628806 + t * (0.27886807 + t * (
1413          -1.13520398 + t * (1.48851587 + t * (-0.82215223 + t * 0.17087277)))))
1414                       ))))
1415  if x >= 0:
1416    return ans
1417  else:
1418    return 2.0 - ans
1419
1420
1421def lzprob(z):
1422  """
1423Returns the area under the normal curve 'to the left of' the given z value.
1424Thus,
1425    for z<0, zprob(z) = 1-tail probability
1426    for z>0, 1.0-zprob(z) = 1-tail probability
1427    for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
1428Adapted from z.c in Gary Perlman's |Stat.
1429
1430Usage:   lzprob(z)
1431"""
1432  Z_MAX = 6.0  # maximum meaningful z-value
1433  if z == 0.0:
1434    x = 0.0
1435  else:
1436    y = 0.5 * math.fabs(z)
1437    if y >= (Z_MAX * 0.5):
1438      x = 1.0
1439    elif (y < 1.0):
1440      w = y * y
1441      x = ((
1442          ((((((0.000124818987 * w - 0.001075204047) * w + 0.005198775019) * w -
1443              0.019198292004) * w + 0.059054035642) * w - 0.151968751364) * w +
1444           0.319152932694) * w - 0.531923007300) * w + 0.797884560593) * y * 2.0
1445    else:
1446      y = y - 2.0
1447      x = (((((((
1448          ((((((-0.000045255659 * y + 0.000152529290) * y - 0.000019538132) * y
1449              - 0.000676904986) * y + 0.001390604284) * y - 0.000794620820) * y
1450           - 0.002034254874) * y + 0.006549791214) * y - 0.010557625006) * y +
1451               0.011630447319) * y - 0.009279453341) * y + 0.005353579108) * y -
1452            0.002141268741) * y + 0.000535310849) * y + 0.999936657524
1453  if z > 0.0:
1454    prob = ((x + 1.0) * 0.5)
1455  else:
1456    prob = ((1.0 - x) * 0.5)
1457  return prob
1458
1459
1460def lksprob(alam):
1461  """
1462Computes a Kolmolgorov-Smirnov t-test significance level.  Adapted from
1463Numerical Recipies.
1464
1465Usage:   lksprob(alam)
1466"""
1467  fac = 2.0
1468  sum = 0.0
1469  termbf = 0.0
1470  a2 = -2.0 * alam * alam
1471  for j in range(1, 201):
1472    term = fac * math.exp(a2 * j * j)
1473    sum = sum + term
1474    if math.fabs(term) <= (0.001 * termbf) or math.fabs(term) < (1.0e-8 * sum):
1475      return sum
1476    fac = -fac
1477    termbf = math.fabs(term)
1478  return 1.0  # Get here only if fails to converge; was 0.0!!
1479
1480
1481def lfprob(dfnum, dfden, F):
1482  """
1483Returns the (1-tailed) significance level (p-value) of an F
1484statistic given the degrees of freedom for the numerator (dfR-dfF) and
1485the degrees of freedom for the denominator (dfF).
1486
1487Usage:   lfprob(dfnum, dfden, F)   where usually dfnum=dfbn, dfden=dfwn
1488"""
1489  p = betai(0.5 * dfden, 0.5 * dfnum, dfden / float(dfden + dfnum * F))
1490  return p
1491
1492
1493def lbetacf(a, b, x):
1494  """
1495This function evaluates the continued fraction form of the incomplete
1496Beta function, betai.  (Adapted from: Numerical Recipies in C.)
1497
1498Usage:   lbetacf(a,b,x)
1499"""
1500  ITMAX = 200
1501  EPS = 3.0e-7
1502
1503  bm = az = am = 1.0
1504  qab = a + b
1505  qap = a + 1.0
1506  qam = a - 1.0
1507  bz = 1.0 - qab * x / qap
1508  for i in range(ITMAX + 1):
1509    em = float(i + 1)
1510    tem = em + em
1511    d = em * (b - em) * x / ((qam + tem) * (a + tem))
1512    ap = az + d * am
1513    bp = bz + d * bm
1514    d = -(a + em) * (qab + em) * x / ((qap + tem) * (a + tem))
1515    app = ap + d * az
1516    bpp = bp + d * bz
1517    aold = az
1518    am = ap / bpp
1519    bm = bp / bpp
1520    az = app / bpp
1521    bz = 1.0
1522    if (abs(az - aold) < (EPS * abs(az))):
1523      return az
1524  print 'a or b too big, or ITMAX too small in Betacf.'
1525
1526
1527def lgammln(xx):
1528  """
1529Returns the gamma function of xx.
1530    Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
1531(Adapted from: Numerical Recipies in C.)
1532
1533Usage:   lgammln(xx)
1534"""
1535
1536  coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, 0.120858003e-2,
1537           -0.536382e-5]
1538  x = xx - 1.0
1539  tmp = x + 5.5
1540  tmp = tmp - (x + 0.5) * math.log(tmp)
1541  ser = 1.0
1542  for j in range(len(coeff)):
1543    x = x + 1
1544    ser = ser + coeff[j] / x
1545  return -tmp + math.log(2.50662827465 * ser)
1546
1547
1548def lbetai(a, b, x):
1549  """
1550Returns the incomplete beta function:
1551
1552    I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
1553
1554where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
1555function of a.  The continued fraction formulation is implemented here,
1556using the betacf function.  (Adapted from: Numerical Recipies in C.)
1557
1558Usage:   lbetai(a,b,x)
1559"""
1560  if (x < 0.0 or x > 1.0):
1561    raise ValueError, 'Bad x in lbetai'
1562  if (x == 0.0 or x == 1.0):
1563    bt = 0.0
1564  else:
1565    bt = math.exp(gammln(a + b) - gammln(a) - gammln(b) + a * math.log(x) + b *
1566                  math.log(1.0 - x))
1567  if (x < (a + 1.0) / (a + b + 2.0)):
1568    return bt * betacf(a, b, x) / float(a)
1569  else:
1570    return 1.0 - bt * betacf(b, a, 1.0 - x) / float(b)
1571
1572####################################
1573#######  ANOVA CALCULATIONS  #######
1574####################################
1575
1576
1577def lF_oneway(*lists):
1578  """
1579Performs a 1-way ANOVA, returning an F-value and probability given
1580any number of groups.  From Heiman, pp.394-7.
1581
1582Usage:   F_oneway(*lists)    where *lists is any number of lists, one per
1583                                  treatment group
1584Returns: F value, one-tailed p-value
1585"""
1586  a = len(lists)  # ANOVA on 'a' groups, each in it's own list
1587  means = [0] * a
1588  vars = [0] * a
1589  ns = [0] * a
1590  alldata = []
1591  tmp = map(N.array, lists)
1592  means = map(amean, tmp)
1593  vars = map(avar, tmp)
1594  ns = map(len, lists)
1595  for i in range(len(lists)):
1596    alldata = alldata + lists[i]
1597  alldata = N.array(alldata)
1598  bign = len(alldata)
1599  sstot = ass(alldata) - (asquare_of_sums(alldata) / float(bign))
1600  ssbn = 0
1601  for list in lists:
1602    ssbn = ssbn + asquare_of_sums(N.array(list)) / float(len(list))
1603  ssbn = ssbn - (asquare_of_sums(alldata) / float(bign))
1604  sswn = sstot - ssbn
1605  dfbn = a - 1
1606  dfwn = bign - a
1607  msb = ssbn / float(dfbn)
1608  msw = sswn / float(dfwn)
1609  f = msb / msw
1610  prob = fprob(dfbn, dfwn, f)
1611  return f, prob
1612
1613
1614def lF_value(ER, EF, dfnum, dfden):
1615  """
1616Returns an F-statistic given the following:
1617        ER  = error associated with the null hypothesis (the Restricted model)
1618        EF  = error associated with the alternate hypothesis (the Full model)
1619        dfR-dfF = degrees of freedom of the numerator
1620        dfF = degrees of freedom associated with the denominator/Full model
1621
1622Usage:   lF_value(ER,EF,dfnum,dfden)
1623"""
1624  return ((ER - EF) / float(dfnum) / (EF / float(dfden)))
1625
1626####################################
1627########  SUPPORT FUNCTIONS  #######
1628####################################
1629
1630
1631def writecc(listoflists, file, writetype='w', extra=2):
1632  """
1633Writes a list of lists to a file in columns, customized by the max
1634size of items within the columns (max size of items in col, +2 characters)
1635to specified file.  File-overwrite is the default.
1636
1637Usage:   writecc (listoflists,file,writetype='w',extra=2)
1638Returns: None
1639"""
1640  if type(listoflists[0]) not in [ListType, TupleType]:
1641    listoflists = [listoflists]
1642  outfile = open(file, writetype)
1643  rowstokill = []
1644  list2print = copy.deepcopy(listoflists)
1645  for i in range(len(listoflists)):
1646    if listoflists[i] == [
1647        '\n'
1648    ] or listoflists[i] == '\n' or listoflists[i] == 'dashes':
1649      rowstokill = rowstokill + [i]
1650  rowstokill.reverse()
1651  for row in rowstokill:
1652    del list2print[row]
1653  maxsize = [0] * len(list2print[0])
1654  for col in range(len(list2print[0])):
1655    items = pstat.colex(list2print, col)
1656    items = map(pstat.makestr, items)
1657    maxsize[col] = max(map(len, items)) + extra
1658  for row in listoflists:
1659    if row == ['\n'] or row == '\n':
1660      outfile.write('\n')
1661    elif row == ['dashes'] or row == 'dashes':
1662      dashes = [0] * len(maxsize)
1663      for j in range(len(maxsize)):
1664        dashes[j] = '-' * (maxsize[j] - 2)
1665      outfile.write(pstat.lineincustcols(dashes, maxsize))
1666    else:
1667      outfile.write(pstat.lineincustcols(row, maxsize))
1668    outfile.write('\n')
1669  outfile.close()
1670  return None
1671
1672
1673def lincr(l, cap):  # to increment a list up to a max-list of 'cap'
1674  """
1675Simulate a counting system from an n-dimensional list.
1676
1677Usage:   lincr(l,cap)   l=list to increment, cap=max values for each list pos'n
1678Returns: next set of values for list l, OR -1 (if overflow)
1679"""
1680  l[0] = l[0] + 1  # e.g., [0,0,0] --> [2,4,3] (=cap)
1681  for i in range(len(l)):
1682    if l[i] > cap[i] and i < len(l) - 1:  # if carryover AND not done
1683      l[i] = 0
1684      l[i + 1] = l[i + 1] + 1
1685    elif l[i] > cap[i] and i == len(
1686        l) - 1:  # overflow past last column, must be finished
1687      l = -1
1688  return l
1689
1690
1691def lsum(inlist):
1692  """
1693Returns the sum of the items in the passed list.
1694
1695Usage:   lsum(inlist)
1696"""
1697  s = 0
1698  for item in inlist:
1699    s = s + item
1700  return s
1701
1702
1703def lcumsum(inlist):
1704  """
1705Returns a list consisting of the cumulative sum of the items in the
1706passed list.
1707
1708Usage:   lcumsum(inlist)
1709"""
1710  newlist = copy.deepcopy(inlist)
1711  for i in range(1, len(newlist)):
1712    newlist[i] = newlist[i] + newlist[i - 1]
1713  return newlist
1714
1715
1716def lss(inlist):
1717  """
1718Squares each value in the passed list, adds up these squares and
1719returns the result.
1720
1721Usage:   lss(inlist)
1722"""
1723  ss = 0
1724  for item in inlist:
1725    ss = ss + item * item
1726  return ss
1727
1728
1729def lsummult(list1, list2):
1730  """
1731Multiplies elements in list1 and list2, element by element, and
1732returns the sum of all resulting multiplications.  Must provide equal
1733length lists.
1734
1735Usage:   lsummult(list1,list2)
1736"""
1737  if len(list1) <> len(list2):
1738    raise ValueError, 'Lists not equal length in summult.'
1739  s = 0
1740  for item1, item2 in pstat.abut(list1, list2):
1741    s = s + item1 * item2
1742  return s
1743
1744
1745def lsumdiffsquared(x, y):
1746  """
1747Takes pairwise differences of the values in lists x and y, squares
1748these differences, and returns the sum of these squares.
1749
1750Usage:   lsumdiffsquared(x,y)
1751Returns: sum[(x[i]-y[i])**2]
1752"""
1753  sds = 0
1754  for i in range(len(x)):
1755    sds = sds + (x[i] - y[i])**2
1756  return sds
1757
1758
1759def lsquare_of_sums(inlist):
1760  """
1761Adds the values in the passed list, squares the sum, and returns
1762the result.
1763
1764Usage:   lsquare_of_sums(inlist)
1765Returns: sum(inlist[i])**2
1766"""
1767  s = sum(inlist)
1768  return float(s) * s
1769
1770
1771def lshellsort(inlist):
1772  """
1773Shellsort algorithm.  Sorts a 1D-list.
1774
1775Usage:   lshellsort(inlist)
1776Returns: sorted-inlist, sorting-index-vector (for original list)
1777"""
1778  n = len(inlist)
1779  svec = copy.deepcopy(inlist)
1780  ivec = range(n)
1781  gap = n / 2  # integer division needed
1782  while gap > 0:
1783    for i in range(gap, n):
1784      for j in range(i - gap, -1, -gap):
1785        while j >= 0 and svec[j] > svec[j + gap]:
1786          temp = svec[j]
1787          svec[j] = svec[j + gap]
1788          svec[j + gap] = temp
1789          itemp = ivec[j]
1790          ivec[j] = ivec[j + gap]
1791          ivec[j + gap] = itemp
1792    gap = gap / 2  # integer division needed
1793# svec is now sorted inlist, and ivec has the order svec[i] = vec[ivec[i]]
1794  return svec, ivec
1795
1796
1797def lrankdata(inlist):
1798  """
1799Ranks the data in inlist, dealing with ties appropritely.  Assumes
1800a 1D inlist.  Adapted from Gary Perlman's |Stat ranksort.
1801
1802Usage:   lrankdata(inlist)
1803Returns: a list of length equal to inlist, containing rank scores
1804"""
1805  n = len(inlist)
1806  svec, ivec = shellsort(inlist)
1807  sumranks = 0
1808  dupcount = 0
1809  newlist = [0] * n
1810  for i in range(n):
1811    sumranks = sumranks + i
1812    dupcount = dupcount + 1
1813    if i == n - 1 or svec[i] <> svec[i + 1]:
1814      averank = sumranks / float(dupcount) + 1
1815      for j in range(i - dupcount + 1, i + 1):
1816        newlist[ivec[j]] = averank
1817      sumranks = 0
1818      dupcount = 0
1819  return newlist
1820
1821
1822def outputpairedstats(fname, writemode, name1, n1, m1, se1, min1, max1, name2,
1823                      n2, m2, se2, min2, max2, statname, stat, prob):
1824  """
1825Prints or write to a file stats for two groups, using the name, n,
1826mean, sterr, min and max for each group, as well as the statistic name,
1827its value, and the associated p-value.
1828
1829Usage:   outputpairedstats(fname,writemode,
1830                           name1,n1,mean1,stderr1,min1,max1,
1831                           name2,n2,mean2,stderr2,min2,max2,
1832                           statname,stat,prob)
1833Returns: None
1834"""
1835  suffix = ''  # for *s after the p-value
1836  try:
1837    x = prob.shape
1838    prob = prob[0]
1839  except:
1840    pass
1841  if prob < 0.001:
1842    suffix = '  ***'
1843  elif prob < 0.01:
1844    suffix = '  **'
1845  elif prob < 0.05:
1846    suffix = '  *'
1847  title = [['Name', 'N', 'Mean', 'SD', 'Min', 'Max']]
1848  lofl = title + [[name1, n1, round(m1, 3), round(
1849      math.sqrt(se1), 3), min1, max1], [name2, n2, round(m2, 3), round(
1850          math.sqrt(se2), 3), min2, max2]]
1851  if type(fname) <> StringType or len(fname) == 0:
1852    print
1853    print statname
1854    print
1855    pstat.printcc(lofl)
1856    print
1857    try:
1858      if stat.shape == ():
1859        stat = stat[0]
1860      if prob.shape == ():
1861        prob = prob[0]
1862    except:
1863      pass
1864    print 'Test statistic = ', round(stat, 3), '   p = ', round(prob, 3), suffix
1865    print
1866  else:
1867    file = open(fname, writemode)
1868    file.write('\n' + statname + '\n\n')
1869    file.close()
1870    writecc(lofl, fname, 'a')
1871    file = open(fname, 'a')
1872    try:
1873      if stat.shape == ():
1874        stat = stat[0]
1875      if prob.shape == ():
1876        prob = prob[0]
1877    except:
1878      pass
1879    file.write(pstat.list2string(['\nTest statistic = ', round(stat, 4),
1880                                  '   p = ', round(prob, 4), suffix, '\n\n']))
1881    file.close()
1882  return None
1883
1884
1885def lfindwithin(data):
1886  """
1887Returns an integer representing a binary vector, where 1=within-
1888subject factor, 0=between.  Input equals the entire data 2D list (i.e.,
1889column 0=random factor, column -1=measured values (those two are skipped).
1890Note: input data is in |Stat format ... a list of lists ("2D list") with
1891one row per measured value, first column=subject identifier, last column=
1892score, one in-between column per factor (these columns contain level
1893designations on each factor).  See also stats.anova.__doc__.
1894
1895Usage:   lfindwithin(data)     data in |Stat format
1896"""
1897
1898  numfact = len(data[0]) - 1
1899  withinvec = 0
1900  for col in range(1, numfact):
1901    examplelevel = pstat.unique(pstat.colex(data, col))[0]
1902    rows = pstat.linexand(data, col, examplelevel)  # get 1 level of this factor
1903    factsubjs = pstat.unique(pstat.colex(rows, 0))
1904    allsubjs = pstat.unique(pstat.colex(data, 0))
1905    if len(factsubjs) == len(allsubjs):  # fewer Ss than scores on this factor?
1906      withinvec = withinvec + (1 << col)
1907  return withinvec
1908
1909#########################################################
1910#########################################################
1911####### DISPATCH LISTS AND TUPLES TO ABOVE FCNS #########
1912#########################################################
1913#########################################################
1914
1915## CENTRAL TENDENCY:
1916geometricmean = Dispatch((lgeometricmean, (ListType, TupleType)),)
1917harmonicmean = Dispatch((lharmonicmean, (ListType, TupleType)),)
1918mean = Dispatch((lmean, (ListType, TupleType)),)
1919median = Dispatch((lmedian, (ListType, TupleType)),)
1920medianscore = Dispatch((lmedianscore, (ListType, TupleType)),)
1921mode = Dispatch((lmode, (ListType, TupleType)),)
1922
1923## MOMENTS:
1924moment = Dispatch((lmoment, (ListType, TupleType)),)
1925variation = Dispatch((lvariation, (ListType, TupleType)),)
1926skew = Dispatch((lskew, (ListType, TupleType)),)
1927kurtosis = Dispatch((lkurtosis, (ListType, TupleType)),)
1928describe = Dispatch((ldescribe, (ListType, TupleType)),)
1929
1930## FREQUENCY STATISTICS:
1931itemfreq = Dispatch((litemfreq, (ListType, TupleType)),)
1932scoreatpercentile = Dispatch((lscoreatpercentile, (ListType, TupleType)),)
1933percentileofscore = Dispatch((lpercentileofscore, (ListType, TupleType)),)
1934histogram = Dispatch((lhistogram, (ListType, TupleType)),)
1935cumfreq = Dispatch((lcumfreq, (ListType, TupleType)),)
1936relfreq = Dispatch((lrelfreq, (ListType, TupleType)),)
1937
1938## VARIABILITY:
1939obrientransform = Dispatch((lobrientransform, (ListType, TupleType)),)
1940samplevar = Dispatch((lsamplevar, (ListType, TupleType)),)
1941samplestdev = Dispatch((lsamplestdev, (ListType, TupleType)),)
1942var = Dispatch((lvar, (ListType, TupleType)),)
1943stdev = Dispatch((lstdev, (ListType, TupleType)),)
1944sterr = Dispatch((lsterr, (ListType, TupleType)),)
1945sem = Dispatch((lsem, (ListType, TupleType)),)
1946z = Dispatch((lz, (ListType, TupleType)),)
1947zs = Dispatch((lzs, (ListType, TupleType)),)
1948
1949## TRIMMING FCNS:
1950trimboth = Dispatch((ltrimboth, (ListType, TupleType)),)
1951trim1 = Dispatch((ltrim1, (ListType, TupleType)),)
1952
1953## CORRELATION FCNS:
1954paired = Dispatch((lpaired, (ListType, TupleType)),)
1955pearsonr = Dispatch((lpearsonr, (ListType, TupleType)),)
1956spearmanr = Dispatch((lspearmanr, (ListType, TupleType)),)
1957pointbiserialr = Dispatch((lpointbiserialr, (ListType, TupleType)),)
1958kendalltau = Dispatch((lkendalltau, (ListType, TupleType)),)
1959linregress = Dispatch((llinregress, (ListType, TupleType)),)
1960
1961## INFERENTIAL STATS:
1962ttest_1samp = Dispatch((lttest_1samp, (ListType, TupleType)),)
1963ttest_ind = Dispatch((lttest_ind, (ListType, TupleType)),)
1964ttest_rel = Dispatch((lttest_rel, (ListType, TupleType)),)
1965chisquare = Dispatch((lchisquare, (ListType, TupleType)),)
1966ks_2samp = Dispatch((lks_2samp, (ListType, TupleType)),)
1967mannwhitneyu = Dispatch((lmannwhitneyu, (ListType, TupleType)),)
1968ranksums = Dispatch((lranksums, (ListType, TupleType)),)
1969tiecorrect = Dispatch((ltiecorrect, (ListType, TupleType)),)
1970wilcoxont = Dispatch((lwilcoxont, (ListType, TupleType)),)
1971kruskalwallish = Dispatch((lkruskalwallish, (ListType, TupleType)),)
1972friedmanchisquare = Dispatch((lfriedmanchisquare, (ListType, TupleType)),)
1973
1974## PROBABILITY CALCS:
1975chisqprob = Dispatch((lchisqprob, (IntType, FloatType)),)
1976zprob = Dispatch((lzprob, (IntType, FloatType)),)
1977ksprob = Dispatch((lksprob, (IntType, FloatType)),)
1978fprob = Dispatch((lfprob, (IntType, FloatType)),)
1979betacf = Dispatch((lbetacf, (IntType, FloatType)),)
1980betai = Dispatch((lbetai, (IntType, FloatType)),)
1981erfcc = Dispatch((lerfcc, (IntType, FloatType)),)
1982gammln = Dispatch((lgammln, (IntType, FloatType)),)
1983
1984## ANOVA FUNCTIONS:
1985F_oneway = Dispatch((lF_oneway, (ListType, TupleType)),)
1986F_value = Dispatch((lF_value, (ListType, TupleType)),)
1987
1988## SUPPORT FUNCTIONS:
1989incr = Dispatch((lincr, (ListType, TupleType)),)
1990sum = Dispatch((lsum, (ListType, TupleType)),)
1991cumsum = Dispatch((lcumsum, (ListType, TupleType)),)
1992ss = Dispatch((lss, (ListType, TupleType)),)
1993summult = Dispatch((lsummult, (ListType, TupleType)),)
1994square_of_sums = Dispatch((lsquare_of_sums, (ListType, TupleType)),)
1995sumdiffsquared = Dispatch((lsumdiffsquared, (ListType, TupleType)),)
1996shellsort = Dispatch((lshellsort, (ListType, TupleType)),)
1997rankdata = Dispatch((lrankdata, (ListType, TupleType)),)
1998findwithin = Dispatch((lfindwithin, (ListType, TupleType)),)
1999
2000#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2001#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2002#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2003#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2004#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2005#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2006#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2007#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2008#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2009#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2010#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2011#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2012#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2013#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2014#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2015#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2016#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2017#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2018#=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  ===============
2019
2020try:  # DEFINE THESE *ONLY* IF NUMERIC IS AVAILABLE
2021  import numpy as N
2022  import numpy.linalg as LA
2023
2024  #####################################
2025  ########  ACENTRAL TENDENCY  ########
2026  #####################################
2027
2028
2029  def ageometricmean(inarray, dimension=None, keepdims=0):
2030    """
2031Calculates the geometric mean of the values in the passed array.
2032That is:  n-th root of (x1 * x2 * ... * xn).  Defaults to ALL values in
2033the passed array.  Use dimension=None to flatten array first.  REMEMBER: if
2034dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2035if dimension is a sequence, it collapses over all specified dimensions.  If
2036keepdims is set to 1, the resulting array will have as many dimensions as
2037inarray, with only 1 'level' per dim that was collapsed over.
2038
2039Usage:   ageometricmean(inarray,dimension=None,keepdims=0)
2040Returns: geometric mean computed over dim(s) listed in dimension
2041"""
2042    inarray = N.array(inarray, N.float_)
2043    if dimension == None:
2044      inarray = N.ravel(inarray)
2045      size = len(inarray)
2046      mult = N.power(inarray, 1.0 / size)
2047      mult = N.multiply.reduce(mult)
2048    elif type(dimension) in [IntType, FloatType]:
2049      size = inarray.shape[dimension]
2050      mult = N.power(inarray, 1.0 / size)
2051      mult = N.multiply.reduce(mult, dimension)
2052      if keepdims == 1:
2053        shp = list(inarray.shape)
2054        shp[dimension] = 1
2055        sum = N.reshape(sum, shp)
2056    else:  # must be a SEQUENCE of dims to average over
2057      dims = list(dimension)
2058      dims.sort()
2059      dims.reverse()
2060      size = N.array(N.multiply.reduce(N.take(inarray.shape, dims)), N.float_)
2061      mult = N.power(inarray, 1.0 / size)
2062      for dim in dims:
2063        mult = N.multiply.reduce(mult, dim)
2064      if keepdims == 1:
2065        shp = list(inarray.shape)
2066        for dim in dims:
2067          shp[dim] = 1
2068        mult = N.reshape(mult, shp)
2069    return mult
2070
2071  def aharmonicmean(inarray, dimension=None, keepdims=0):
2072    """
2073Calculates the harmonic mean of the values in the passed array.
2074That is:  n / (1/x1 + 1/x2 + ... + 1/xn).  Defaults to ALL values in
2075the passed array.  Use dimension=None to flatten array first.  REMEMBER: if
2076dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2077if dimension is a sequence, it collapses over all specified dimensions.  If
2078keepdims is set to 1, the resulting array will have as many dimensions as
2079inarray, with only 1 'level' per dim that was collapsed over.
2080
2081Usage:   aharmonicmean(inarray,dimension=None,keepdims=0)
2082Returns: harmonic mean computed over dim(s) in dimension
2083"""
2084    inarray = inarray.astype(N.float_)
2085    if dimension == None:
2086      inarray = N.ravel(inarray)
2087      size = len(inarray)
2088      s = N.add.reduce(1.0 / inarray)
2089    elif type(dimension) in [IntType, FloatType]:
2090      size = float(inarray.shape[dimension])
2091      s = N.add.reduce(1.0 / inarray, dimension)
2092      if keepdims == 1:
2093        shp = list(inarray.shape)
2094        shp[dimension] = 1
2095        s = N.reshape(s, shp)
2096    else:  # must be a SEQUENCE of dims to average over
2097      dims = list(dimension)
2098      dims.sort()
2099      nondims = []
2100      for i in range(len(inarray.shape)):
2101        if i not in dims:
2102          nondims.append(i)
2103      tinarray = N.transpose(inarray, nondims + dims)  # put keep-dims first
2104      idx = [0] * len(nondims)
2105      if idx == []:
2106        size = len(N.ravel(inarray))
2107        s = asum(1.0 / inarray)
2108        if keepdims == 1:
2109          s = N.reshape([s], N.ones(len(inarray.shape)))
2110      else:
2111        idx[0] = -1
2112        loopcap = N.array(tinarray.shape[0:len(nondims)]) - 1
2113        s = N.zeros(loopcap + 1, N.float_)
2114        while incr(idx, loopcap) <> -1:
2115          s[idx] = asum(1.0 / tinarray[idx])
2116        size = N.multiply.reduce(N.take(inarray.shape, dims))
2117        if keepdims == 1:
2118          shp = list(inarray.shape)
2119          for dim in dims:
2120            shp[dim] = 1
2121          s = N.reshape(s, shp)
2122    return size / s
2123
2124  def amean(inarray, dimension=None, keepdims=0):
2125    """
2126Calculates the arithmatic mean of the values in the passed array.
2127That is:  1/n * (x1 + x2 + ... + xn).  Defaults to ALL values in the
2128passed array.  Use dimension=None to flatten array first.  REMEMBER: if
2129dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2130if dimension is a sequence, it collapses over all specified dimensions.  If
2131keepdims is set to 1, the resulting array will have as many dimensions as
2132inarray, with only 1 'level' per dim that was collapsed over.
2133
2134Usage:   amean(inarray,dimension=None,keepdims=0)
2135Returns: arithematic mean calculated over dim(s) in dimension
2136"""
2137    if inarray.dtype in [N.int_, N.short, N.ubyte]:
2138      inarray = inarray.astype(N.float_)
2139    if dimension == None:
2140      inarray = N.ravel(inarray)
2141      sum = N.add.reduce(inarray)
2142      denom = float(len(inarray))
2143    elif type(dimension) in [IntType, FloatType]:
2144      sum = asum(inarray, dimension)
2145      denom = float(inarray.shape[dimension])
2146      if keepdims == 1:
2147        shp = list(inarray.shape)
2148        shp[dimension] = 1
2149        sum = N.reshape(sum, shp)
2150    else:  # must be a TUPLE of dims to average over
2151      dims = list(dimension)
2152      dims.sort()
2153      dims.reverse()
2154      sum = inarray * 1.0
2155      for dim in dims:
2156        sum = N.add.reduce(sum, dim)
2157      denom = N.array(N.multiply.reduce(N.take(inarray.shape, dims)), N.float_)
2158      if keepdims == 1:
2159        shp = list(inarray.shape)
2160        for dim in dims:
2161          shp[dim] = 1
2162        sum = N.reshape(sum, shp)
2163    return sum / denom
2164
2165  def amedian(inarray, numbins=1000):
2166    """
2167Calculates the COMPUTED median value of an array of numbers, given the
2168number of bins to use for the histogram (more bins approaches finding the
2169precise median value of the array; default number of bins = 1000).  From
2170G.W. Heiman's Basic Stats, or CRC Probability & Statistics.
2171NOTE:  THIS ROUTINE ALWAYS uses the entire passed array (flattens it first).
2172
2173Usage:   amedian(inarray,numbins=1000)
2174Returns: median calculated over ALL values in inarray
2175"""
2176    inarray = N.ravel(inarray)
2177    (hist, smallest, binsize, extras) = ahistogram(inarray, numbins,
2178                                                   [min(inarray), max(inarray)])
2179    cumhist = N.cumsum(hist)  # make cumulative histogram
2180    otherbins = N.greater_equal(cumhist, len(inarray) / 2.0)
2181    otherbins = list(otherbins)  # list of 0/1s, 1s start at median bin
2182    cfbin = otherbins.index(1)  # get 1st(!) index holding 50%ile score
2183    LRL = smallest + binsize * cfbin  # get lower read limit of that bin
2184    cfbelow = N.add.reduce(hist[0:cfbin])  # cum. freq. below bin
2185    freq = hist[cfbin]  # frequency IN the 50%ile bin
2186    median = LRL + (
2187        (len(inarray) / 2.0 - cfbelow) / float(freq)) * binsize  # MEDIAN
2188    return median
2189
2190  def amedianscore(inarray, dimension=None):
2191    """
2192Returns the 'middle' score of the passed array.  If there is an even
2193number of scores, the mean of the 2 middle scores is returned.  Can function
2194with 1D arrays, or on the FIRST dimension of 2D arrays (i.e., dimension can
2195be None, to pre-flatten the array, or else dimension must equal 0).
2196
2197Usage:   amedianscore(inarray,dimension=None)
2198Returns: 'middle' score of the array, or the mean of the 2 middle scores
2199"""
2200    if dimension == None:
2201      inarray = N.ravel(inarray)
2202      dimension = 0
2203    inarray = N.sort(inarray, dimension)
2204    if inarray.shape[dimension] % 2 == 0:  # if even number of elements
2205      indx = inarray.shape[dimension] / 2  # integer division correct
2206      median = N.asarray(inarray[indx] + inarray[indx - 1]) / 2.0
2207    else:
2208      indx = inarray.shape[dimension] / 2  # integer division correct
2209      median = N.take(inarray, [indx], dimension)
2210      if median.shape == (1,):
2211        median = median[0]
2212    return median
2213
2214  def amode(a, dimension=None):
2215    """
2216Returns an array of the modal (most common) score in the passed array.
2217If there is more than one such score, ONLY THE FIRST is returned.
2218The bin-count for the modal values is also returned.  Operates on whole
2219array (dimension=None), or on a given dimension.
2220
2221Usage:   amode(a, dimension=None)
2222Returns: array of bin-counts for mode(s), array of corresponding modal values
2223"""
2224
2225    if dimension == None:
2226      a = N.ravel(a)
2227      dimension = 0
2228    scores = pstat.aunique(N.ravel(a))  # get ALL unique values
2229    testshape = list(a.shape)
2230    testshape[dimension] = 1
2231    oldmostfreq = N.zeros(testshape)
2232    oldcounts = N.zeros(testshape)
2233    for score in scores:
2234      template = N.equal(a, score)
2235      counts = asum(template, dimension, 1)
2236      mostfrequent = N.where(counts > oldcounts, score, oldmostfreq)
2237      oldcounts = N.where(counts > oldcounts, counts, oldcounts)
2238      oldmostfreq = mostfrequent
2239    return oldcounts, mostfrequent
2240
2241  def atmean(a, limits=None, inclusive=(1, 1)):
2242    """
2243Returns the arithmetic mean of all values in an array, ignoring values
2244strictly outside the sequence passed to 'limits'.   Note: either limit
2245in the sequence, or the value of limits itself, can be set to None.  The
2246inclusive list/tuple determines whether the lower and upper limiting bounds
2247(respectively) are open/exclusive (0) or closed/inclusive (1).
2248
2249Usage:   atmean(a,limits=None,inclusive=(1,1))
2250"""
2251    if a.dtype in [N.int_, N.short, N.ubyte]:
2252      a = a.astype(N.float_)
2253    if limits == None:
2254      return mean(a)
2255    assert type(limits) in [ListType, TupleType, N.ndarray
2256                           ], 'Wrong type for limits in atmean'
2257    if inclusive[0]:
2258      lowerfcn = N.greater_equal
2259    else:
2260      lowerfcn = N.greater
2261    if inclusive[1]:
2262      upperfcn = N.less_equal
2263    else:
2264      upperfcn = N.less
2265    if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(
2266        N.ravel(a)):
2267      raise ValueError, 'No array values within given limits (atmean).'
2268    elif limits[0] == None and limits[1] <> None:
2269      mask = upperfcn(a, limits[1])
2270    elif limits[0] <> None and limits[1] == None:
2271      mask = lowerfcn(a, limits[0])
2272    elif limits[0] <> None and limits[1] <> None:
2273      mask = lowerfcn(a, limits[0]) * upperfcn(a, limits[1])
2274    s = float(N.add.reduce(N.ravel(a * mask)))
2275    n = float(N.add.reduce(N.ravel(mask)))
2276    return s / n
2277
2278  def atvar(a, limits=None, inclusive=(1, 1)):
2279    """
2280Returns the sample variance of values in an array, (i.e., using N-1),
2281ignoring values strictly outside the sequence passed to 'limits'.
2282Note: either limit in the sequence, or the value of limits itself,
2283can be set to None.  The inclusive list/tuple determines whether the lower
2284and upper limiting bounds (respectively) are open/exclusive (0) or
2285closed/inclusive (1). ASSUMES A FLAT ARRAY (OR ELSE PREFLATTENS).
2286
2287Usage:   atvar(a,limits=None,inclusive=(1,1))
2288"""
2289    a = a.astype(N.float_)
2290    if limits == None or limits == [None, None]:
2291      return avar(a)
2292    assert type(limits) in [ListType, TupleType, N.ndarray
2293                           ], 'Wrong type for limits in atvar'
2294    if inclusive[0]:
2295      lowerfcn = N.greater_equal
2296    else:
2297      lowerfcn = N.greater
2298    if inclusive[1]:
2299      upperfcn = N.less_equal
2300    else:
2301      upperfcn = N.less
2302    if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(
2303        N.ravel(a)):
2304      raise ValueError, 'No array values within given limits (atvar).'
2305    elif limits[0] == None and limits[1] <> None:
2306      mask = upperfcn(a, limits[1])
2307    elif limits[0] <> None and limits[1] == None:
2308      mask = lowerfcn(a, limits[0])
2309    elif limits[0] <> None and limits[1] <> None:
2310      mask = lowerfcn(a, limits[0]) * upperfcn(a, limits[1])
2311
2312    a = N.compress(mask, a)  # squish out excluded values
2313    return avar(a)
2314
2315  def atmin(a, lowerlimit=None, dimension=None, inclusive=1):
2316    """
2317Returns the minimum value of a, along dimension, including only values less
2318than (or equal to, if inclusive=1) lowerlimit.  If the limit is set to None,
2319all values in the array are used.
2320
2321Usage:   atmin(a,lowerlimit=None,dimension=None,inclusive=1)
2322"""
2323    if inclusive:
2324      lowerfcn = N.greater
2325    else:
2326      lowerfcn = N.greater_equal
2327    if dimension == None:
2328      a = N.ravel(a)
2329      dimension = 0
2330    if lowerlimit == None:
2331      lowerlimit = N.minimum.reduce(N.ravel(a)) - 11
2332    biggest = N.maximum.reduce(N.ravel(a))
2333    ta = N.where(lowerfcn(a, lowerlimit), a, biggest)
2334    return N.minimum.reduce(ta, dimension)
2335
2336  def atmax(a, upperlimit, dimension=None, inclusive=1):
2337    """
2338Returns the maximum value of a, along dimension, including only values greater
2339than (or equal to, if inclusive=1) upperlimit.  If the limit is set to None,
2340a limit larger than the max value in the array is used.
2341
2342Usage:   atmax(a,upperlimit,dimension=None,inclusive=1)
2343"""
2344    if inclusive:
2345      upperfcn = N.less
2346    else:
2347      upperfcn = N.less_equal
2348    if dimension == None:
2349      a = N.ravel(a)
2350      dimension = 0
2351    if upperlimit == None:
2352      upperlimit = N.maximum.reduce(N.ravel(a)) + 1
2353    smallest = N.minimum.reduce(N.ravel(a))
2354    ta = N.where(upperfcn(a, upperlimit), a, smallest)
2355    return N.maximum.reduce(ta, dimension)
2356
2357  def atstdev(a, limits=None, inclusive=(1, 1)):
2358    """
2359Returns the standard deviation of all values in an array, ignoring values
2360strictly outside the sequence passed to 'limits'.   Note: either limit
2361in the sequence, or the value of limits itself, can be set to None.  The
2362inclusive list/tuple determines whether the lower and upper limiting bounds
2363(respectively) are open/exclusive (0) or closed/inclusive (1).
2364
2365Usage:   atstdev(a,limits=None,inclusive=(1,1))
2366"""
2367    return N.sqrt(tvar(a, limits, inclusive))
2368
2369  def atsem(a, limits=None, inclusive=(1, 1)):
2370    """
2371Returns the standard error of the mean for the values in an array,
2372(i.e., using N for the denominator), ignoring values strictly outside
2373the sequence passed to 'limits'.   Note: either limit in the sequence,
2374or the value of limits itself, can be set to None.  The inclusive list/tuple
2375determines whether the lower and upper limiting bounds (respectively) are
2376open/exclusive (0) or closed/inclusive (1).
2377
2378Usage:   atsem(a,limits=None,inclusive=(1,1))
2379"""
2380    sd = tstdev(a, limits, inclusive)
2381    if limits == None or limits == [None, None]:
2382      n = float(len(N.ravel(a)))
2383      limits = [min(a) - 1, max(a) + 1]
2384    assert type(limits) in [ListType, TupleType, N.ndarray
2385                           ], 'Wrong type for limits in atsem'
2386    if inclusive[0]:
2387      lowerfcn = N.greater_equal
2388    else:
2389      lowerfcn = N.greater
2390    if inclusive[1]:
2391      upperfcn = N.less_equal
2392    else:
2393      upperfcn = N.less
2394    if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(
2395        N.ravel(a)):
2396      raise ValueError, 'No array values within given limits (atsem).'
2397    elif limits[0] == None and limits[1] <> None:
2398      mask = upperfcn(a, limits[1])
2399    elif limits[0] <> None and limits[1] == None:
2400      mask = lowerfcn(a, limits[0])
2401    elif limits[0] <> None and limits[1] <> None:
2402      mask = lowerfcn(a, limits[0]) * upperfcn(a, limits[1])
2403    term1 = N.add.reduce(N.ravel(a * a * mask))
2404    n = float(N.add.reduce(N.ravel(mask)))
2405    return sd / math.sqrt(n)
2406
2407#####################################
2408############  AMOMENTS  #############
2409#####################################
2410
2411  def amoment(a, moment=1, dimension=None):
2412    """
2413Calculates the nth moment about the mean for a sample (defaults to the
24141st moment).  Generally used to calculate coefficients of skewness and
2415kurtosis.  Dimension can equal None (ravel array first), an integer
2416(the dimension over which to operate), or a sequence (operate over
2417multiple dimensions).
2418
2419Usage:   amoment(a,moment=1,dimension=None)
2420Returns: appropriate moment along given dimension
2421"""
2422    if dimension == None:
2423      a = N.ravel(a)
2424      dimension = 0
2425    if moment == 1:
2426      return 0.0
2427    else:
2428      mn = amean(a, dimension, 1)  # 1=keepdims
2429      s = N.power((a - mn), moment)
2430      return amean(s, dimension)
2431
2432  def avariation(a, dimension=None):
2433    """
2434Returns the coefficient of variation, as defined in CRC Standard
2435Probability and Statistics, p.6. Dimension can equal None (ravel array
2436first), an integer (the dimension over which to operate), or a
2437sequence (operate over multiple dimensions).
2438
2439Usage:   avariation(a,dimension=None)
2440"""
2441    return 100.0 * asamplestdev(a, dimension) / amean(a, dimension)
2442
2443  def askew(a, dimension=None):
2444    """
2445Returns the skewness of a distribution (normal ==> 0.0; >0 means extra
2446weight in left tail).  Use askewtest() to see if it's close enough.
2447Dimension can equal None (ravel array first), an integer (the
2448dimension over which to operate), or a sequence (operate over multiple
2449dimensions).
2450
2451Usage:   askew(a, dimension=None)
2452Returns: skew of vals in a along dimension, returning ZERO where all vals equal
2453"""
2454    denom = N.power(amoment(a, 2, dimension), 1.5)
2455    zero = N.equal(denom, 0)
2456    if type(denom) == N.ndarray and asum(zero) <> 0:
2457      print 'Number of zeros in askew: ', asum(zero)
2458    denom = denom + zero  # prevent divide-by-zero
2459    return N.where(zero, 0, amoment(a, 3, dimension) / denom)
2460
2461  def akurtosis(a, dimension=None):
2462    """
2463Returns the kurtosis of a distribution (normal ==> 3.0; >3 means
2464heavier in the tails, and usually more peaked).  Use akurtosistest()
2465to see if it's close enough.  Dimension can equal None (ravel array
2466first), an integer (the dimension over which to operate), or a
2467sequence (operate over multiple dimensions).
2468
2469Usage:   akurtosis(a,dimension=None)
2470Returns: kurtosis of values in a along dimension, and ZERO where all vals equal
2471"""
2472    denom = N.power(amoment(a, 2, dimension), 2)
2473    zero = N.equal(denom, 0)
2474    if type(denom) == N.ndarray and asum(zero) <> 0:
2475      print 'Number of zeros in akurtosis: ', asum(zero)
2476    denom = denom + zero  # prevent divide-by-zero
2477    return N.where(zero, 0, amoment(a, 4, dimension) / denom)
2478
2479  def adescribe(inarray, dimension=None):
2480    """
2481Returns several descriptive statistics of the passed array.  Dimension
2482can equal None (ravel array first), an integer (the dimension over
2483which to operate), or a sequence (operate over multiple dimensions).
2484
2485Usage:   adescribe(inarray,dimension=None)
2486Returns: n, (min,max), mean, standard deviation, skew, kurtosis
2487"""
2488    if dimension == None:
2489      inarray = N.ravel(inarray)
2490      dimension = 0
2491    n = inarray.shape[dimension]
2492    mm = (N.minimum.reduce(inarray), N.maximum.reduce(inarray))
2493    m = amean(inarray, dimension)
2494    sd = astdev(inarray, dimension)
2495    skew = askew(inarray, dimension)
2496    kurt = akurtosis(inarray, dimension)
2497    return n, mm, m, sd, skew, kurt
2498
2499#####################################
2500########  NORMALITY TESTS  ##########
2501#####################################
2502
2503  def askewtest(a, dimension=None):
2504    """
2505Tests whether the skew is significantly different from a normal
2506distribution.  Dimension can equal None (ravel array first), an
2507integer (the dimension over which to operate), or a sequence (operate
2508over multiple dimensions).
2509
2510Usage:   askewtest(a,dimension=None)
2511Returns: z-score and 2-tail z-probability
2512"""
2513    if dimension == None:
2514      a = N.ravel(a)
2515      dimension = 0
2516    b2 = askew(a, dimension)
2517    n = float(a.shape[dimension])
2518    y = b2 * N.sqrt(((n + 1) * (n + 3)) / (6.0 * (n - 2)))
2519    beta2 = (3.0 * (n * n + 27 * n - 70) * (n + 1) *
2520             (n + 3)) / ((n - 2.0) * (n + 5) * (n + 7) * (n + 9))
2521    W2 = -1 + N.sqrt(2 * (beta2 - 1))
2522    delta = 1 / N.sqrt(N.log(N.sqrt(W2)))
2523    alpha = N.sqrt(2 / (W2 - 1))
2524    y = N.where(y == 0, 1, y)
2525    Z = delta * N.log(y / alpha + N.sqrt((y / alpha)**2 + 1))
2526    return Z, (1.0 - zprob(Z)) * 2
2527
2528  def akurtosistest(a, dimension=None):
2529    """
2530Tests whether a dataset has normal kurtosis (i.e.,
2531kurtosis=3(n-1)/(n+1)) Valid only for n>20.  Dimension can equal None
2532(ravel array first), an integer (the dimension over which to operate),
2533or a sequence (operate over multiple dimensions).
2534
2535Usage:   akurtosistest(a,dimension=None)
2536Returns: z-score and 2-tail z-probability, returns 0 for bad pixels
2537"""
2538    if dimension == None:
2539      a = N.ravel(a)
2540      dimension = 0
2541    n = float(a.shape[dimension])
2542    if n < 20:
2543      print 'akurtosistest only valid for n>=20 ... continuing anyway, n=', n
2544    b2 = akurtosis(a, dimension)
2545    E = 3.0 * (n - 1) / (n + 1)
2546    varb2 = 24.0 * n * (n - 2) * (n - 3) / ((n + 1) * (n + 1) * (n + 3) *
2547                                            (n + 5))
2548    x = (b2 - E) / N.sqrt(varb2)
2549    sqrtbeta1 = 6.0 * (n * n - 5 * n + 2) / ((n + 7) * (n + 9)) * N.sqrt(
2550        (6.0 * (n + 3) * (n + 5)) / (n * (n - 2) * (n - 3)))
2551    A = 6.0 + 8.0 / sqrtbeta1 * (2.0 / sqrtbeta1 +
2552                                 N.sqrt(1 + 4.0 / (sqrtbeta1**2)))
2553    term1 = 1 - 2 / (9.0 * A)
2554    denom = 1 + x * N.sqrt(2 / (A - 4.0))
2555    denom = N.where(N.less(denom, 0), 99, denom)
2556    term2 = N.where(
2557        N.equal(denom, 0), term1, N.power(
2558            (1 - 2.0 / A) / denom, 1 / 3.0))
2559    Z = (term1 - term2) / N.sqrt(2 / (9.0 * A))
2560    Z = N.where(N.equal(denom, 99), 0, Z)
2561    return Z, (1.0 - zprob(Z)) * 2
2562
2563  def anormaltest(a, dimension=None):
2564    """
2565Tests whether skew and/OR kurtosis of dataset differs from normal
2566curve.  Can operate over multiple dimensions.  Dimension can equal
2567None (ravel array first), an integer (the dimension over which to
2568operate), or a sequence (operate over multiple dimensions).
2569
2570Usage:   anormaltest(a,dimension=None)
2571Returns: z-score and 2-tail probability
2572"""
2573    if dimension == None:
2574      a = N.ravel(a)
2575      dimension = 0
2576    s, p = askewtest(a, dimension)
2577    k, p = akurtosistest(a, dimension)
2578    k2 = N.power(s, 2) + N.power(k, 2)
2579    return k2, achisqprob(k2, 2)
2580
2581#####################################
2582######  AFREQUENCY FUNCTIONS  #######
2583#####################################
2584
2585  def aitemfreq(a):
2586    """
2587Returns a 2D array of item frequencies.  Column 1 contains item values,
2588column 2 contains their respective counts.  Assumes a 1D array is passed.
2589@@@sorting OK?
2590
2591Usage:   aitemfreq(a)
2592Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
2593"""
2594    scores = pstat.aunique(a)
2595    scores = N.sort(scores)
2596    freq = N.zeros(len(scores))
2597    for i in range(len(scores)):
2598      freq[i] = N.add.reduce(N.equal(a, scores[i]))
2599    return N.array(pstat.aabut(scores, freq))
2600
2601  def ascoreatpercentile(inarray, percent):
2602    """
2603Usage:   ascoreatpercentile(inarray,percent)   0<percent<100
2604Returns: score at given percentile, relative to inarray distribution
2605"""
2606    percent = percent / 100.0
2607    targetcf = percent * len(inarray)
2608    h, lrl, binsize, extras = histogram(inarray)
2609    cumhist = cumsum(h * 1)
2610    for i in range(len(cumhist)):
2611      if cumhist[i] >= targetcf:
2612        break
2613    score = binsize * (
2614        (targetcf - cumhist[i - 1]) / float(h[i])) + (lrl + binsize * i)
2615    return score
2616
2617  def apercentileofscore(inarray, score, histbins=10, defaultlimits=None):
2618    """
2619Note: result of this function depends on the values used to histogram
2620the data(!).
2621
2622Usage:   apercentileofscore(inarray,score,histbins=10,defaultlimits=None)
2623Returns: percentile-position of score (0-100) relative to inarray
2624"""
2625    h, lrl, binsize, extras = histogram(inarray, histbins, defaultlimits)
2626    cumhist = cumsum(h * 1)
2627    i = int((score - lrl) / float(binsize))
2628    pct = (cumhist[i - 1] + ((score - (lrl + binsize * i)) / float(binsize)) *
2629           h[i]) / float(len(inarray)) * 100
2630    return pct
2631
2632  def ahistogram(inarray, numbins=10, defaultlimits=None, printextras=1):
2633    """
2634Returns (i) an array of histogram bin counts, (ii) the smallest value
2635of the histogram binning, and (iii) the bin width (the last 2 are not
2636necessarily integers).  Default number of bins is 10.  Defaultlimits
2637can be None (the routine picks bins spanning all the numbers in the
2638inarray) or a 2-sequence (lowerlimit, upperlimit).  Returns all of the
2639following: array of bin values, lowerreallimit, binsize, extrapoints.
2640
2641Usage:   ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1)
2642Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range)
2643"""
2644    inarray = N.ravel(inarray)  # flatten any >1D arrays
2645    if (defaultlimits <> None):
2646      lowerreallimit = defaultlimits[0]
2647      upperreallimit = defaultlimits[1]
2648      binsize = (upperreallimit - lowerreallimit) / float(numbins)
2649    else:
2650      Min = N.minimum.reduce(inarray)
2651      Max = N.maximum.reduce(inarray)
2652      estbinwidth = float(Max - Min) / float(numbins) + 1e-6
2653      binsize = (Max - Min + estbinwidth) / float(numbins)
2654      lowerreallimit = Min - binsize / 2.0  #lower real limit,1st bin
2655    bins = N.zeros(numbins)
2656    extrapoints = 0
2657    for num in inarray:
2658      try:
2659        if (num - lowerreallimit) < 0:
2660          extrapoints = extrapoints + 1
2661        else:
2662          bintoincrement = int((num - lowerreallimit) / float(binsize))
2663          bins[bintoincrement] = bins[bintoincrement] + 1
2664      except:  # point outside lower/upper limits
2665        extrapoints = extrapoints + 1
2666    if (extrapoints > 0 and printextras == 1):
2667      print '\nPoints outside given histogram range =', extrapoints
2668    return (bins, lowerreallimit, binsize, extrapoints)
2669
2670  def acumfreq(a, numbins=10, defaultreallimits=None):
2671    """
2672Returns a cumulative frequency histogram, using the histogram function.
2673Defaultreallimits can be None (use all data), or a 2-sequence containing
2674lower and upper limits on values to include.
2675
2676Usage:   acumfreq(a,numbins=10,defaultreallimits=None)
2677Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2678"""
2679    h, l, b, e = histogram(a, numbins, defaultreallimits)
2680    cumhist = cumsum(h * 1)
2681    return cumhist, l, b, e
2682
2683  def arelfreq(a, numbins=10, defaultreallimits=None):
2684    """
2685Returns a relative frequency histogram, using the histogram function.
2686Defaultreallimits can be None (use all data), or a 2-sequence containing
2687lower and upper limits on values to include.
2688
2689Usage:   arelfreq(a,numbins=10,defaultreallimits=None)
2690Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2691"""
2692    h, l, b, e = histogram(a, numbins, defaultreallimits)
2693    h = N.array(h / float(a.shape[0]))
2694    return h, l, b, e
2695
2696#####################################
2697######  AVARIABILITY FUNCTIONS  #####
2698#####################################
2699
2700  def aobrientransform(*args):
2701    """
2702Computes a transform on input data (any number of columns).  Used to
2703test for homogeneity of variance prior to running one-way stats.  Each
2704array in *args is one level of a factor.  If an F_oneway() run on the
2705transformed data and found significant, variances are unequal.   From
2706Maxwell and Delaney, p.112.
2707
2708Usage:   aobrientransform(*args)    *args = 1D arrays, one per level of factor
2709Returns: transformed data for use in an ANOVA
2710"""
2711    TINY = 1e-10
2712    k = len(args)
2713    n = N.zeros(k, N.float_)
2714    v = N.zeros(k, N.float_)
2715    m = N.zeros(k, N.float_)
2716    nargs = []
2717    for i in range(k):
2718      nargs.append(args[i].astype(N.float_))
2719      n[i] = float(len(nargs[i]))
2720      v[i] = var(nargs[i])
2721      m[i] = mean(nargs[i])
2722    for j in range(k):
2723      for i in range(n[j]):
2724        t1 = (n[j] - 1.5) * n[j] * (nargs[j][i] - m[j])**2
2725        t2 = 0.5 * v[j] * (n[j] - 1.0)
2726        t3 = (n[j] - 1.0) * (n[j] - 2.0)
2727        nargs[j][i] = (t1 - t2) / float(t3)
2728    check = 1
2729    for j in range(k):
2730      if v[j] - mean(nargs[j]) > TINY:
2731        check = 0
2732    if check <> 1:
2733      raise ValueError, 'Lack of convergence in obrientransform.'
2734    else:
2735      return N.array(nargs)
2736
2737  def asamplevar(inarray, dimension=None, keepdims=0):
2738    """
2739Returns the sample standard deviation of the values in the passed
2740array (i.e., using N).  Dimension can equal None (ravel array first),
2741an integer (the dimension over which to operate), or a sequence
2742(operate over multiple dimensions).  Set keepdims=1 to return an array
2743with the same number of dimensions as inarray.
2744
2745Usage:   asamplevar(inarray,dimension=None,keepdims=0)
2746"""
2747    if dimension == None:
2748      inarray = N.ravel(inarray)
2749      dimension = 0
2750    if dimension == 1:
2751      mn = amean(inarray, dimension)[:, N.NewAxis]
2752    else:
2753      mn = amean(inarray, dimension, keepdims=1)
2754    deviations = inarray - mn
2755    if type(dimension) == ListType:
2756      n = 1
2757      for d in dimension:
2758        n = n * inarray.shape[d]
2759    else:
2760      n = inarray.shape[dimension]
2761    svar = ass(deviations, dimension, keepdims) / float(n)
2762    return svar
2763
2764  def asamplestdev(inarray, dimension=None, keepdims=0):
2765    """
2766Returns the sample standard deviation of the values in the passed
2767array (i.e., using N).  Dimension can equal None (ravel array first),
2768an integer (the dimension over which to operate), or a sequence
2769(operate over multiple dimensions).  Set keepdims=1 to return an array
2770with the same number of dimensions as inarray.
2771
2772Usage:   asamplestdev(inarray,dimension=None,keepdims=0)
2773"""
2774    return N.sqrt(asamplevar(inarray, dimension, keepdims))
2775
2776  def asignaltonoise(instack, dimension=0):
2777    """
2778Calculates signal-to-noise.  Dimension can equal None (ravel array
2779first), an integer (the dimension over which to operate), or a
2780sequence (operate over multiple dimensions).
2781
2782Usage:   asignaltonoise(instack,dimension=0):
2783Returns: array containing the value of (mean/stdev) along dimension,
2784         or 0 when stdev=0
2785"""
2786    m = mean(instack, dimension)
2787    sd = stdev(instack, dimension)
2788    return N.where(sd == 0, 0, m / sd)
2789
2790  def acov(x, y, dimension=None, keepdims=0):
2791    """
2792Returns the estimated covariance of the values in the passed
2793array (i.e., N-1).  Dimension can equal None (ravel array first), an
2794integer (the dimension over which to operate), or a sequence (operate
2795over multiple dimensions).  Set keepdims=1 to return an array with the
2796same number of dimensions as inarray.
2797
2798Usage:   acov(x,y,dimension=None,keepdims=0)
2799"""
2800    if dimension == None:
2801      x = N.ravel(x)
2802      y = N.ravel(y)
2803      dimension = 0
2804    xmn = amean(x, dimension, 1)  # keepdims
2805    xdeviations = x - xmn
2806    ymn = amean(y, dimension, 1)  # keepdims
2807    ydeviations = y - ymn
2808    if type(dimension) == ListType:
2809      n = 1
2810      for d in dimension:
2811        n = n * x.shape[d]
2812    else:
2813      n = x.shape[dimension]
2814    covar = N.sum(xdeviations * ydeviations) / float(n - 1)
2815    return covar
2816
2817  def avar(inarray, dimension=None, keepdims=0):
2818    """
2819Returns the estimated population variance of the values in the passed
2820array (i.e., N-1).  Dimension can equal None (ravel array first), an
2821integer (the dimension over which to operate), or a sequence (operate
2822over multiple dimensions).  Set keepdims=1 to return an array with the
2823same number of dimensions as inarray.
2824
2825Usage:   avar(inarray,dimension=None,keepdims=0)
2826"""
2827    if dimension == None:
2828      inarray = N.ravel(inarray)
2829      dimension = 0
2830    mn = amean(inarray, dimension, 1)
2831    deviations = inarray - mn
2832    if type(dimension) == ListType:
2833      n = 1
2834      for d in dimension:
2835        n = n * inarray.shape[d]
2836    else:
2837      n = inarray.shape[dimension]
2838    var = ass(deviations, dimension, keepdims) / float(n - 1)
2839    return var
2840
2841  def astdev(inarray, dimension=None, keepdims=0):
2842    """
2843Returns the estimated population standard deviation of the values in
2844the passed array (i.e., N-1).  Dimension can equal None (ravel array
2845first), an integer (the dimension over which to operate), or a
2846sequence (operate over multiple dimensions).  Set keepdims=1 to return
2847an array with the same number of dimensions as inarray.
2848
2849Usage:   astdev(inarray,dimension=None,keepdims=0)
2850"""
2851    return N.sqrt(avar(inarray, dimension, keepdims))
2852
2853  def asterr(inarray, dimension=None, keepdims=0):
2854    """
2855Returns the estimated population standard error of the values in the
2856passed array (i.e., N-1).  Dimension can equal None (ravel array
2857first), an integer (the dimension over which to operate), or a
2858sequence (operate over multiple dimensions).  Set keepdims=1 to return
2859an array with the same number of dimensions as inarray.
2860
2861Usage:   asterr(inarray,dimension=None,keepdims=0)
2862"""
2863    if dimension == None:
2864      inarray = N.ravel(inarray)
2865      dimension = 0
2866    return astdev(inarray, dimension,
2867                  keepdims) / float(N.sqrt(inarray.shape[dimension]))
2868
2869  def asem(inarray, dimension=None, keepdims=0):
2870    """
2871Returns the standard error of the mean (i.e., using N) of the values
2872in the passed array.  Dimension can equal None (ravel array first), an
2873integer (the dimension over which to operate), or a sequence (operate
2874over multiple dimensions).  Set keepdims=1 to return an array with the
2875same number of dimensions as inarray.
2876
2877Usage:   asem(inarray,dimension=None, keepdims=0)
2878"""
2879    if dimension == None:
2880      inarray = N.ravel(inarray)
2881      dimension = 0
2882    if type(dimension) == ListType:
2883      n = 1
2884      for d in dimension:
2885        n = n * inarray.shape[d]
2886    else:
2887      n = inarray.shape[dimension]
2888    s = asamplestdev(inarray, dimension, keepdims) / N.sqrt(n - 1)
2889    return s
2890
2891  def az(a, score):
2892    """
2893Returns the z-score of a given input score, given thearray from which
2894that score came.  Not appropriate for population calculations, nor for
2895arrays > 1D.
2896
2897Usage:   az(a, score)
2898"""
2899    z = (score - amean(a)) / asamplestdev(a)
2900    return z
2901
2902  def azs(a):
2903    """
2904Returns a 1D array of z-scores, one for each score in the passed array,
2905computed relative to the passed array.
2906
2907Usage:   azs(a)
2908"""
2909    zscores = []
2910    for item in a:
2911      zscores.append(z(a, item))
2912    return N.array(zscores)
2913
2914  def azmap(scores, compare, dimension=0):
2915    """
2916Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to
2917array passed to compare (e.g., [time,x,y]).  Assumes collapsing over dim 0
2918of the compare array.
2919
2920Usage:   azs(scores, compare, dimension=0)
2921"""
2922    mns = amean(compare, dimension)
2923    sstd = asamplestdev(compare, 0)
2924    return (scores - mns) / sstd
2925
2926#####################################
2927#######  ATRIMMING FUNCTIONS  #######
2928#####################################
2929
2930## deleted around() as it's in numpy now
2931
2932  def athreshold(a, threshmin=None, threshmax=None, newval=0):
2933    """
2934Like Numeric.clip() except that values <threshmid or >threshmax are replaced
2935by newval instead of by threshmin/threshmax (respectively).
2936
2937Usage:   athreshold(a,threshmin=None,threshmax=None,newval=0)
2938Returns: a, with values <threshmin or >threshmax replaced with newval
2939"""
2940    mask = N.zeros(a.shape)
2941    if threshmin <> None:
2942      mask = mask + N.where(a < threshmin, 1, 0)
2943    if threshmax <> None:
2944      mask = mask + N.where(a > threshmax, 1, 0)
2945    mask = N.clip(mask, 0, 1)
2946    return N.where(mask, newval, a)
2947
2948  def atrimboth(a, proportiontocut):
2949    """
2950Slices off the passed proportion of items from BOTH ends of the passed
2951array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND
2952'rightmost' 10% of scores.  You must pre-sort the array if you want
2953"proper" trimming.  Slices off LESS if proportion results in a
2954non-integer slice index (i.e., conservatively slices off
2955proportiontocut).
2956
2957Usage:   atrimboth (a,proportiontocut)
2958Returns: trimmed version of array a
2959"""
2960    lowercut = int(proportiontocut * len(a))
2961    uppercut = len(a) - lowercut
2962    return a[lowercut:uppercut]
2963
2964  def atrim1(a, proportiontocut, tail='right'):
2965    """
2966Slices off the passed proportion of items from ONE end of the passed
2967array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
296810% of scores).  Slices off LESS if proportion results in a non-integer
2969slice index (i.e., conservatively slices off proportiontocut).
2970
2971Usage:   atrim1(a,proportiontocut,tail='right')  or set tail='left'
2972Returns: trimmed version of array a
2973"""
2974    if string.lower(tail) == 'right':
2975      lowercut = 0
2976      uppercut = len(a) - int(proportiontocut * len(a))
2977    elif string.lower(tail) == 'left':
2978      lowercut = int(proportiontocut * len(a))
2979      uppercut = len(a)
2980    return a[lowercut:uppercut]
2981
2982#####################################
2983#####  ACORRELATION FUNCTIONS  ######
2984#####################################
2985
2986  def acovariance(X):
2987    """
2988Computes the covariance matrix of a matrix X.  Requires a 2D matrix input.
2989
2990Usage:   acovariance(X)
2991Returns: covariance matrix of X
2992"""
2993    if len(X.shape) <> 2:
2994      raise TypeError, 'acovariance requires 2D matrices'
2995    n = X.shape[0]
2996    mX = amean(X, 0)
2997    return N.dot(N.transpose(X), X) / float(n) - N.multiply.outer(mX, mX)
2998
2999  def acorrelation(X):
3000    """
3001Computes the correlation matrix of a matrix X.  Requires a 2D matrix input.
3002
3003Usage:   acorrelation(X)
3004Returns: correlation matrix of X
3005"""
3006    C = acovariance(X)
3007    V = N.diagonal(C)
3008    return C / N.sqrt(N.multiply.outer(V, V))
3009
3010  def apaired(x, y):
3011    """
3012Interactively determines the type of data in x and y, and then runs the
3013appropriated statistic for paired group data.
3014
3015Usage:   apaired(x,y)     x,y = the two arrays of values to be compared
3016Returns: appropriate statistic name, value, and probability
3017"""
3018    samples = ''
3019    while samples not in ['i', 'r', 'I', 'R', 'c', 'C']:
3020      print '\nIndependent or related samples, or correlation (i,r,c): ',
3021      samples = raw_input()
3022
3023    if samples in ['i', 'I', 'r', 'R']:
3024      print '\nComparing variances ...',
3025      # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112
3026      r = obrientransform(x, y)
3027      f, p = F_oneway(pstat.colex(r, 0), pstat.colex(r, 1))
3028      if p < 0.05:
3029        vartype = 'unequal, p=' + str(round(p, 4))
3030      else:
3031        vartype = 'equal'
3032      print vartype
3033      if samples in ['i', 'I']:
3034        if vartype[0] == 'e':
3035          t, p = ttest_ind(x, y, None, 0)
3036          print '\nIndependent samples t-test:  ', round(t, 4), round(p, 4)
3037        else:
3038          if len(x) > 20 or len(y) > 20:
3039            z, p = ranksums(x, y)
3040            print '\nRank Sums test (NONparametric, n>20):  ', round(
3041                z, 4), round(p, 4)
3042          else:
3043            u, p = mannwhitneyu(x, y)
3044            print '\nMann-Whitney U-test (NONparametric, ns<20):  ', round(
3045                u, 4), round(p, 4)
3046
3047      else:  # RELATED SAMPLES
3048        if vartype[0] == 'e':
3049          t, p = ttest_rel(x, y, 0)
3050          print '\nRelated samples t-test:  ', round(t, 4), round(p, 4)
3051        else:
3052          t, p = ranksums(x, y)
3053          print '\nWilcoxon T-test (NONparametric):  ', round(t, 4), round(p, 4)
3054    else:  # CORRELATION ANALYSIS
3055      corrtype = ''
3056      while corrtype not in ['c', 'C', 'r', 'R', 'd', 'D']:
3057        print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
3058        corrtype = raw_input()
3059      if corrtype in ['c', 'C']:
3060        m, b, r, p, see = linregress(x, y)
3061        print '\nLinear regression for continuous variables ...'
3062        lol = [
3063            ['Slope', 'Intercept', 'r', 'Prob', 'SEestimate'],
3064            [round(m, 4), round(b, 4), round(r, 4), round(p, 4), round(see, 4)]
3065        ]
3066        pstat.printcc(lol)
3067      elif corrtype in ['r', 'R']:
3068        r, p = spearmanr(x, y)
3069        print '\nCorrelation for ranked variables ...'
3070        print "Spearman's r: ", round(r, 4), round(p, 4)
3071      else:  # DICHOTOMOUS
3072        r, p = pointbiserialr(x, y)
3073        print '\nAssuming x contains a dichotomous variable ...'
3074        print 'Point Biserial r: ', round(r, 4), round(p, 4)
3075    print '\n\n'
3076    return None
3077
3078  def dices(x, y):
3079    """
3080Calculates Dice's coefficient ... (2*number of common terms)/(number of terms in
3081x +
3082number of terms in y). Returns a value between 0 (orthogonal) and 1.
3083
3084Usage:  dices(x,y)
3085"""
3086    import sets
3087    x = sets.Set(x)
3088    y = sets.Set(y)
3089    common = len(x.intersection(y))
3090    total = float(len(x) + len(y))
3091    return 2 * common / total
3092
3093  def icc(x, y=None, verbose=0):
3094    """
3095Calculates intraclass correlation coefficients using simple, Type I sums of
3096squares.
3097If only one variable is passed, assumed it's an Nx2 matrix
3098
3099Usage:   icc(x,y=None,verbose=0)
3100Returns: icc rho, prob ####PROB IS A GUESS BASED ON PEARSON
3101"""
3102    TINY = 1.0e-20
3103    if y:
3104      all = N.concatenate([x, y], 0)
3105    else:
3106      all = x + 0
3107      x = all[:, 0]
3108      y = all[:, 1]
3109    totalss = ass(all - mean(all))
3110    pairmeans = (x + y) / 2.
3111    withinss = ass(x - pairmeans) + ass(y - pairmeans)
3112    withindf = float(len(x))
3113    betwdf = float(len(x) - 1)
3114    withinms = withinss / withindf
3115    betweenms = (totalss - withinss) / betwdf
3116    rho = (betweenms - withinms) / (withinms + betweenms)
3117    t = rho * math.sqrt(betwdf / ((1.0 - rho + TINY) * (1.0 + rho + TINY)))
3118    prob = abetai(0.5 * betwdf, 0.5, betwdf / (betwdf + t * t), verbose)
3119    return rho, prob
3120
3121  def alincc(x, y):
3122    """
3123Calculates Lin's concordance correlation coefficient.
3124
3125Usage:   alincc(x,y)    where x, y are equal-length arrays
3126Returns: Lin's CC
3127"""
3128    x = N.ravel(x)
3129    y = N.ravel(y)
3130    covar = acov(x, y) * (len(x) - 1) / float(len(x))  # correct denom to n
3131    xvar = avar(x) * (len(x) - 1) / float(len(x))  # correct denom to n
3132    yvar = avar(y) * (len(y) - 1) / float(len(y))  # correct denom to n
3133    lincc = (2 * covar) / ((xvar + yvar) + ((amean(x) - amean(y))**2))
3134    return lincc
3135
3136  def apearsonr(x, y, verbose=1):
3137    """
3138Calculates a Pearson correlation coefficient and returns p.  Taken
3139from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195.
3140
3141Usage:   apearsonr(x,y,verbose=1)      where x,y are equal length arrays
3142Returns: Pearson's r, two-tailed p-value
3143"""
3144    TINY = 1.0e-20
3145    n = len(x)
3146    xmean = amean(x)
3147    ymean = amean(y)
3148    r_num = n * (N.add.reduce(x * y)) - N.add.reduce(x) * N.add.reduce(y)
3149    r_den = math.sqrt((n * ass(x) - asquare_of_sums(x)) *
3150                      (n * ass(y) - asquare_of_sums(y)))
3151    r = (r_num / r_den)
3152    df = n - 2
3153    t = r * math.sqrt(df / ((1.0 - r + TINY) * (1.0 + r + TINY)))
3154    prob = abetai(0.5 * df, 0.5, df / (df + t * t), verbose)
3155    return r, prob
3156
3157  def aspearmanr(x, y):
3158    """
3159Calculates a Spearman rank-order correlation coefficient.  Taken
3160from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
3161
3162Usage:   aspearmanr(x,y)      where x,y are equal-length arrays
3163Returns: Spearman's r, two-tailed p-value
3164"""
3165    TINY = 1e-30
3166    n = len(x)
3167    rankx = rankdata(x)
3168    ranky = rankdata(y)
3169    dsq = N.add.reduce((rankx - ranky)**2)
3170    rs = 1 - 6 * dsq / float(n * (n**2 - 1))
3171    t = rs * math.sqrt((n - 2) / ((rs + 1.0) * (1.0 - rs)))
3172    df = n - 2
3173    probrs = abetai(0.5 * df, 0.5, df / (df + t * t))
3174    # probability values for rs are from part 2 of the spearman function in
3175    # Numerical Recipies, p.510.  They close to tables, but not exact.(?)
3176    return rs, probrs
3177
3178  def apointbiserialr(x, y):
3179    """
3180Calculates a point-biserial correlation coefficient and the associated
3181probability value.  Taken from Heiman's Basic Statistics for the Behav.
3182Sci (1st), p.194.
3183
3184Usage:   apointbiserialr(x,y)      where x,y are equal length arrays
3185Returns: Point-biserial r, two-tailed p-value
3186"""
3187    TINY = 1e-30
3188    categories = pstat.aunique(x)
3189    data = pstat.aabut(x, y)
3190    if len(categories) <> 2:
3191      raise ValueError, ('Exactly 2 categories required (in x) for '
3192                         'pointbiserialr().')
3193    else:  # there are 2 categories, continue
3194      codemap = pstat.aabut(categories, N.arange(2))
3195      recoded = pstat.arecode(data, codemap, 0)
3196      x = pstat.alinexand(data, 0, categories[0])
3197      y = pstat.alinexand(data, 0, categories[1])
3198      xmean = amean(pstat.acolex(x, 1))
3199      ymean = amean(pstat.acolex(y, 1))
3200      n = len(data)
3201      adjust = math.sqrt((len(x) / float(n)) * (len(y) / float(n)))
3202      rpb = (ymean - xmean) / asamplestdev(pstat.acolex(data, 1)) * adjust
3203      df = n - 2
3204      t = rpb * math.sqrt(df / ((1.0 - rpb + TINY) * (1.0 + rpb + TINY)))
3205      prob = abetai(0.5 * df, 0.5, df / (df + t * t))
3206      return rpb, prob
3207
3208  def akendalltau(x, y):
3209    """
3210Calculates Kendall's tau ... correlation of ordinal data.  Adapted
3211from function kendl1 in Numerical Recipies.  Needs good test-cases.@@@
3212
3213Usage:   akendalltau(x,y)
3214Returns: Kendall's tau, two-tailed p-value
3215"""
3216    n1 = 0
3217    n2 = 0
3218    iss = 0
3219    for j in range(len(x) - 1):
3220      for k in range(j, len(y)):
3221        a1 = x[j] - x[k]
3222        a2 = y[j] - y[k]
3223        aa = a1 * a2
3224        if (aa):  # neither array has a tie
3225          n1 = n1 + 1
3226          n2 = n2 + 1
3227          if aa > 0:
3228            iss = iss + 1
3229          else:
3230            iss = iss - 1
3231        else:
3232          if (a1):
3233            n1 = n1 + 1
3234          else:
3235            n2 = n2 + 1
3236    tau = iss / math.sqrt(n1 * n2)
3237    svar = (4.0 * len(x) + 10.0) / (9.0 * len(x) * (len(x) - 1))
3238    z = tau / math.sqrt(svar)
3239    prob = erfcc(abs(z) / 1.4142136)
3240    return tau, prob
3241
3242  def alinregress(*args):
3243    """
3244Calculates a regression line on two arrays, x and y, corresponding to x,y
3245pairs.  If a single 2D array is passed, alinregress finds dim with 2 levels
3246and splits data into x,y pairs along that dim.
3247
3248Usage:   alinregress(*args)    args=2 equal-length arrays, or one 2D array
3249Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate, n
3250"""
3251    TINY = 1.0e-20
3252    if len(args) == 1:  # more than 1D array?
3253      args = args[0]
3254      if len(args) == 2:
3255        x = args[0]
3256        y = args[1]
3257      else:
3258        x = args[:, 0]
3259        y = args[:, 1]
3260    else:
3261      x = args[0]
3262      y = args[1]
3263    n = len(x)
3264    xmean = amean(x)
3265    ymean = amean(y)
3266    r_num = n * (N.add.reduce(x * y)) - N.add.reduce(x) * N.add.reduce(y)
3267    r_den = math.sqrt((n * ass(x) - asquare_of_sums(x)) *
3268                      (n * ass(y) - asquare_of_sums(y)))
3269    r = r_num / r_den
3270    z = 0.5 * math.log((1.0 + r + TINY) / (1.0 - r + TINY))
3271    df = n - 2
3272    t = r * math.sqrt(df / ((1.0 - r + TINY) * (1.0 + r + TINY)))
3273    prob = abetai(0.5 * df, 0.5, df / (df + t * t))
3274    slope = r_num / (float(n) * ass(x) - asquare_of_sums(x))
3275    intercept = ymean - slope * xmean
3276    sterrest = math.sqrt(1 - r * r) * asamplestdev(y)
3277    return slope, intercept, r, prob, sterrest, n
3278
3279  def amasslinregress(*args):
3280    """
3281Calculates a regression line on one 1D array (x) and one N-D array (y).
3282
3283Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate, n
3284"""
3285    TINY = 1.0e-20
3286    if len(args) == 1:  # more than 1D array?
3287      args = args[0]
3288      if len(args) == 2:
3289        x = N.ravel(args[0])
3290        y = args[1]
3291      else:
3292        x = N.ravel(args[:, 0])
3293        y = args[:, 1]
3294    else:
3295      x = args[0]
3296      y = args[1]
3297    x = x.astype(N.float_)
3298    y = y.astype(N.float_)
3299    n = len(x)
3300    xmean = amean(x)
3301    ymean = amean(y, 0)
3302    shp = N.ones(len(y.shape))
3303    shp[0] = len(x)
3304    x.shape = shp
3305    print x.shape, y.shape
3306    r_num = n * (N.add.reduce(x * y, 0)) - N.add.reduce(x) * N.add.reduce(y, 0)
3307    r_den = N.sqrt((n * ass(x) - asquare_of_sums(x)) *
3308                   (n * ass(y, 0) - asquare_of_sums(y, 0)))
3309    zerodivproblem = N.equal(r_den, 0)
3310    r_den = N.where(zerodivproblem, 1, r_den
3311                   )  # avoid zero-division in 1st place
3312    r = r_num / r_den  # need to do this nicely for matrix division
3313    r = N.where(zerodivproblem, 0.0, r)
3314    z = 0.5 * N.log((1.0 + r + TINY) / (1.0 - r + TINY))
3315    df = n - 2
3316    t = r * N.sqrt(df / ((1.0 - r + TINY) * (1.0 + r + TINY)))
3317    prob = abetai(0.5 * df, 0.5, df / (df + t * t))
3318
3319    ss = float(n) * ass(x) - asquare_of_sums(x)
3320    s_den = N.where(ss == 0, 1, ss)  # avoid zero-division in 1st place
3321    slope = r_num / s_den
3322    intercept = ymean - slope * xmean
3323    sterrest = N.sqrt(1 - r * r) * asamplestdev(y, 0)
3324    return slope, intercept, r, prob, sterrest, n
3325
3326#####################################
3327#####  AINFERENTIAL STATISTICS  #####
3328#####################################
3329
3330  def attest_1samp(a, popmean, printit=0, name='Sample', writemode='a'):
3331    """
3332Calculates the t-obtained for the independent samples T-test on ONE group
3333of scores a, given a population mean.  If printit=1, results are printed
3334to the screen.  If printit='filename', the results are output to 'filename'
3335using the given writemode (default=append).  Returns t-value, and prob.
3336
3337Usage:   attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
3338Returns: t-value, two-tailed prob
3339"""
3340    if type(a) != N.ndarray:
3341      a = N.array(a)
3342    x = amean(a)
3343    v = avar(a)
3344    n = len(a)
3345    df = n - 1
3346    svar = ((n - 1) * v) / float(df)
3347    t = (x - popmean) / math.sqrt(svar * (1.0 / n))
3348    prob = abetai(0.5 * df, 0.5, df / (df + t * t))
3349
3350    if printit <> 0:
3351      statname = 'Single-sample T-test.'
3352      outputpairedstats(printit, writemode, 'Population', '--', popmean, 0, 0,
3353                        0, name, n, x, v, N.minimum.reduce(N.ravel(a)),
3354                        N.maximum.reduce(N.ravel(a)), statname, t, prob)
3355    return t, prob
3356
3357  def attest_ind(a,
3358                 b,
3359                 dimension=None,
3360                 printit=0,
3361                 name1='Samp1',
3362                 name2='Samp2',
3363                 writemode='a'):
3364    """
3365Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores
3366a, and b.  From Numerical Recipies, p.483.  If printit=1, results are
3367printed to the screen.  If printit='filename', the results are output
3368to 'filename' using the given writemode (default=append).  Dimension
3369can equal None (ravel array first), or an integer (the dimension over
3370which to operate on a and b).
3371
3372Usage:   attest_ind (a,b,dimension=None,printit=0,
3373                     Name1='Samp1',Name2='Samp2',writemode='a')
3374Returns: t-value, two-tailed p-value
3375"""
3376    if dimension == None:
3377      a = N.ravel(a)
3378      b = N.ravel(b)
3379      dimension = 0
3380    x1 = amean(a, dimension)
3381    x2 = amean(b, dimension)
3382    v1 = avar(a, dimension)
3383    v2 = avar(b, dimension)
3384    n1 = a.shape[dimension]
3385    n2 = b.shape[dimension]
3386    df = n1 + n2 - 2
3387    svar = ((n1 - 1) * v1 + (n2 - 1) * v2) / float(df)
3388    zerodivproblem = N.equal(svar, 0)
3389    svar = N.where(zerodivproblem, 1, svar)  # avoid zero-division in 1st place
3390    t = (x1 - x2) / N.sqrt(svar *
3391                           (1.0 / n1 + 1.0 / n2))  # N-D COMPUTATION HERE!!!!!!
3392    t = N.where(zerodivproblem, 1.0, t)  # replace NaN/wrong t-values with 1.0
3393    probs = abetai(0.5 * df, 0.5, float(df) / (df + t * t))
3394
3395    if type(t) == N.ndarray:
3396      probs = N.reshape(probs, t.shape)
3397    if probs.shape == (1,):
3398      probs = probs[0]
3399
3400    if printit <> 0:
3401      if type(t) == N.ndarray:
3402        t = t[0]
3403      if type(probs) == N.ndarray:
3404        probs = probs[0]
3405      statname = 'Independent samples T-test.'
3406      outputpairedstats(printit, writemode, name1, n1, x1, v1,
3407                        N.minimum.reduce(N.ravel(a)),
3408                        N.maximum.reduce(N.ravel(a)), name2, n2, x2, v2,
3409                        N.minimum.reduce(N.ravel(b)),
3410                        N.maximum.reduce(N.ravel(b)), statname, t, probs)
3411      return
3412    return t, probs
3413
3414  def ap2t(pval, df):
3415    """
3416Tries to compute a t-value from a p-value (or pval array) and associated df.
3417SLOW for large numbers of elements(!) as it re-computes p-values 20 times
3418(smaller step-sizes) at which point it decides it's done. Keeps the signs
3419of the input array. Returns 1000 (or -1000) if t>100.
3420
3421Usage:  ap2t(pval,df)
3422Returns: an array of t-values with the shape of pval
3423    """
3424    pval = N.array(pval)
3425    signs = N.sign(pval)
3426    pval = abs(pval)
3427    t = N.ones(pval.shape, N.float_) * 50
3428    step = N.ones(pval.shape, N.float_) * 25
3429    print 'Initial ap2t() prob calc'
3430    prob = abetai(0.5 * df, 0.5, float(df) / (df + t * t))
3431    print 'ap2t() iter: ',
3432    for i in range(10):
3433      print i, ' ',
3434      t = N.where(pval < prob, t + step, t - step)
3435      prob = abetai(0.5 * df, 0.5, float(df) / (df + t * t))
3436      step = step / 2
3437    print
3438    # since this is an ugly hack, we get ugly boundaries
3439    t = N.where(t > 99.9, 1000, t)  # hit upper-boundary
3440    t = t + signs
3441    return t  #, prob, pval
3442
3443  def attest_rel(a,
3444                 b,
3445                 dimension=None,
3446                 printit=0,
3447                 name1='Samp1',
3448                 name2='Samp2',
3449                 writemode='a'):
3450    """
3451Calculates the t-obtained T-test on TWO RELATED samples of scores, a
3452and b.  From Numerical Recipies, p.483.  If printit=1, results are
3453printed to the screen.  If printit='filename', the results are output
3454to 'filename' using the given writemode (default=append).  Dimension
3455can equal None (ravel array first), or an integer (the dimension over
3456which to operate on a and b).
3457
3458Usage:   attest_rel(a,b,dimension=None,printit=0,
3459                    name1='Samp1',name2='Samp2',writemode='a')
3460Returns: t-value, two-tailed p-value
3461"""
3462    if dimension == None:
3463      a = N.ravel(a)
3464      b = N.ravel(b)
3465      dimension = 0
3466    if len(a) <> len(b):
3467      raise ValueError, 'Unequal length arrays.'
3468    x1 = amean(a, dimension)
3469    x2 = amean(b, dimension)
3470    v1 = avar(a, dimension)
3471    v2 = avar(b, dimension)
3472    n = a.shape[dimension]
3473    df = float(n - 1)
3474    d = (a - b).astype('d')
3475
3476    denom = N.sqrt(
3477        (n * N.add.reduce(d * d, dimension) - N.add.reduce(d, dimension)**2) /
3478        df)
3479    zerodivproblem = N.equal(denom, 0)
3480    denom = N.where(zerodivproblem, 1, denom
3481                   )  # avoid zero-division in 1st place
3482    t = N.add.reduce(d, dimension) / denom  # N-D COMPUTATION HERE!!!!!!
3483    t = N.where(zerodivproblem, 1.0, t)  # replace NaN/wrong t-values with 1.0
3484    probs = abetai(0.5 * df, 0.5, float(df) / (df + t * t))
3485    if type(t) == N.ndarray:
3486      probs = N.reshape(probs, t.shape)
3487    if probs.shape == (1,):
3488      probs = probs[0]
3489
3490    if printit <> 0:
3491      statname = 'Related samples T-test.'
3492      outputpairedstats(printit, writemode, name1, n, x1, v1,
3493                        N.minimum.reduce(N.ravel(a)),
3494                        N.maximum.reduce(N.ravel(a)), name2, n, x2, v2,
3495                        N.minimum.reduce(N.ravel(b)),
3496                        N.maximum.reduce(N.ravel(b)), statname, t, probs)
3497      return
3498    return t, probs
3499
3500  def achisquare(f_obs, f_exp=None):
3501    """
3502Calculates a one-way chi square for array of observed frequencies and returns
3503the result.  If no expected frequencies are given, the total N is assumed to
3504be equally distributed across all groups.
3505@@@NOT RIGHT??
3506
3507Usage:   achisquare(f_obs, f_exp=None)   f_obs = array of observed cell freq.
3508Returns: chisquare-statistic, associated p-value
3509"""
3510
3511    k = len(f_obs)
3512    if f_exp == None:
3513      f_exp = N.array([sum(f_obs) / float(k)] * len(f_obs), N.float_)
3514    f_exp = f_exp.astype(N.float_)
3515    chisq = N.add.reduce((f_obs - f_exp)**2 / f_exp)
3516    return chisq, achisqprob(chisq, k - 1)
3517
3518  def aks_2samp(data1, data2):
3519    """
3520Computes the Kolmogorov-Smirnof statistic on 2 samples.  Modified from
3521Numerical Recipies in C, page 493.  Returns KS D-value, prob.  Not ufunc-
3522like.
3523
3524Usage:   aks_2samp(data1,data2)  where data1 and data2 are 1D arrays
3525Returns: KS D-value, p-value
3526"""
3527    j1 = 0  # N.zeros(data1.shape[1:]) TRIED TO MAKE THIS UFUNC-LIKE
3528    j2 = 0  # N.zeros(data2.shape[1:])
3529    fn1 = 0.0  # N.zeros(data1.shape[1:],N.float_)
3530    fn2 = 0.0  # N.zeros(data2.shape[1:],N.float_)
3531    n1 = data1.shape[0]
3532    n2 = data2.shape[0]
3533    en1 = n1 * 1
3534    en2 = n2 * 1
3535    d = N.zeros(data1.shape[1:], N.float_)
3536    data1 = N.sort(data1, 0)
3537    data2 = N.sort(data2, 0)
3538    while j1 < n1 and j2 < n2:
3539      d1 = data1[j1]
3540      d2 = data2[j2]
3541      if d1 <= d2:
3542        fn1 = (j1) / float(en1)
3543        j1 = j1 + 1
3544      if d2 <= d1:
3545        fn2 = (j2) / float(en2)
3546        j2 = j2 + 1
3547      dt = (fn2 - fn1)
3548      if abs(dt) > abs(d):
3549        d = dt
3550#    try:
3551    en = math.sqrt(en1 * en2 / float(en1 + en2))
3552    prob = aksprob((en + 0.12 + 0.11 / en) * N.fabs(d))
3553    #    except:
3554    #        prob = 1.0
3555    return d, prob
3556
3557  def amannwhitneyu(x, y):
3558    """
3559Calculates a Mann-Whitney U statistic on the provided scores and
3560returns the result.  Use only when the n in each condition is < 20 and
3561you have 2 independent samples of ranks.  REMEMBER: Mann-Whitney U is
3562significant if the u-obtained is LESS THAN or equal to the critical
3563value of U.
3564
3565Usage:   amannwhitneyu(x,y)     where x,y are arrays of values for 2 conditions
3566Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
3567"""
3568    n1 = len(x)
3569    n2 = len(y)
3570    ranked = rankdata(N.concatenate((x, y)))
3571    rankx = ranked[0:n1]  # get the x-ranks
3572    ranky = ranked[n1:]  # the rest are y-ranks
3573    u1 = n1 * n2 + (n1 * (n1 + 1)) / 2.0 - sum(rankx)  # calc U for x
3574    u2 = n1 * n2 - u1  # remainder is U for y
3575    bigu = max(u1, u2)
3576    smallu = min(u1, u2)
3577    proportion = bigu / float(n1 * n2)
3578    T = math.sqrt(tiecorrect(ranked))  # correction factor for tied scores
3579    if T == 0:
3580      raise ValueError, 'All numbers are identical in amannwhitneyu'
3581    sd = math.sqrt(T * n1 * n2 * (n1 + n2 + 1) / 12.0)
3582    z = abs((bigu - n1 * n2 / 2.0) / sd)  # normal approximation for prob calc
3583    return smallu, 1.0 - azprob(z), proportion
3584
3585  def atiecorrect(rankvals):
3586    """
3587Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests.
3588See Siegel, S. (1956) Nonparametric Statistics for the Behavioral
3589Sciences.  New York: McGraw-Hill.  Code adapted from |Stat rankind.c
3590code.
3591
3592Usage:   atiecorrect(rankvals)
3593Returns: T correction factor for U or H
3594"""
3595    sorted, posn = ashellsort(N.array(rankvals))
3596    n = len(sorted)
3597    T = 0.0
3598    i = 0
3599    while (i < n - 1):
3600      if sorted[i] == sorted[i + 1]:
3601        nties = 1
3602        while (i < n - 1) and (sorted[i] == sorted[i + 1]):
3603          nties = nties + 1
3604          i = i + 1
3605        T = T + nties**3 - nties
3606      i = i + 1
3607    T = T / float(n**3 - n)
3608    return 1.0 - T
3609
3610  def aranksums(x, y):
3611    """
3612Calculates the rank sums statistic on the provided scores and returns
3613the result.
3614
3615Usage:   aranksums(x,y)     where x,y are arrays of values for 2 conditions
3616Returns: z-statistic, two-tailed p-value
3617"""
3618    n1 = len(x)
3619    n2 = len(y)
3620    alldata = N.concatenate((x, y))
3621    ranked = arankdata(alldata)
3622    x = ranked[:n1]
3623    y = ranked[n1:]
3624    s = sum(x)
3625    expected = n1 * (n1 + n2 + 1) / 2.0
3626    z = (s - expected) / math.sqrt(n1 * n2 * (n1 + n2 + 1) / 12.0)
3627    prob = 2 * (1.0 - azprob(abs(z)))
3628    return z, prob
3629
3630  def awilcoxont(x, y):
3631    """
3632Calculates the Wilcoxon T-test for related samples and returns the
3633result.  A non-parametric T-test.
3634
3635Usage:   awilcoxont(x,y)     where x,y are equal-length arrays for 2 conditions
3636Returns: t-statistic, two-tailed p-value
3637"""
3638    if len(x) <> len(y):
3639      raise ValueError, 'Unequal N in awilcoxont.  Aborting.'
3640    d = x - y
3641    d = N.compress(N.not_equal(d, 0), d)  # Keep all non-zero differences
3642    count = len(d)
3643    absd = abs(d)
3644    absranked = arankdata(absd)
3645    r_plus = 0.0
3646    r_minus = 0.0
3647    for i in range(len(absd)):
3648      if d[i] < 0:
3649        r_minus = r_minus + absranked[i]
3650      else:
3651        r_plus = r_plus + absranked[i]
3652    wt = min(r_plus, r_minus)
3653    mn = count * (count + 1) * 0.25
3654    se = math.sqrt(count * (count + 1) * (2.0 * count + 1.0) / 24.0)
3655    z = math.fabs(wt - mn) / se
3656    z = math.fabs(wt - mn) / se
3657    prob = 2 * (1.0 - zprob(abs(z)))
3658    return wt, prob
3659
3660  def akruskalwallish(*args):
3661    """
3662The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
3663groups, requiring at least 5 subjects in each group.  This function
3664calculates the Kruskal-Wallis H and associated p-value for 3 or more
3665independent samples.
3666
3667Usage:   akruskalwallish(*args)     args are separate arrays for 3+ conditions
3668Returns: H-statistic (corrected for ties), associated p-value
3669"""
3670    assert len(args) == 3, 'Need at least 3 groups in stats.akruskalwallish()'
3671    args = list(args)
3672    n = [0] * len(args)
3673    n = map(len, args)
3674    all = []
3675    for i in range(len(args)):
3676      all = all + args[i].tolist()
3677    ranked = rankdata(all)
3678    T = tiecorrect(ranked)
3679    for i in range(len(args)):
3680      args[i] = ranked[0:n[i]]
3681      del ranked[0:n[i]]
3682    rsums = []
3683    for i in range(len(args)):
3684      rsums.append(sum(args[i])**2)
3685      rsums[i] = rsums[i] / float(n[i])
3686    ssbn = sum(rsums)
3687    totaln = sum(n)
3688    h = 12.0 / (totaln * (totaln + 1)) * ssbn - 3 * (totaln + 1)
3689    df = len(args) - 1
3690    if T == 0:
3691      raise ValueError, 'All numbers are identical in akruskalwallish'
3692    h = h / float(T)
3693    return h, chisqprob(h, df)
3694
3695  def afriedmanchisquare(*args):
3696    """
3697Friedman Chi-Square is a non-parametric, one-way within-subjects
3698ANOVA.  This function calculates the Friedman Chi-square test for
3699repeated measures and returns the result, along with the associated
3700probability value.  It assumes 3 or more repeated measures.  Only 3
3701levels requires a minimum of 10 subjects in the study.  Four levels
3702requires 5 subjects per level(??).
3703
3704Usage:   afriedmanchisquare(*args)   args are separate arrays for 2+ conditions
3705Returns: chi-square statistic, associated p-value
3706"""
3707    k = len(args)
3708    if k < 3:
3709      raise ValueError, ('\nLess than 3 levels.  Friedman test not '
3710                         'appropriate.\n')
3711    n = len(args[0])
3712    data = apply(pstat.aabut, args)
3713    data = data.astype(N.float_)
3714    for i in range(len(data)):
3715      data[i] = arankdata(data[i])
3716    ssbn = asum(asum(args, 1)**2)
3717    chisq = 12.0 / (k * n * (k + 1)) * ssbn - 3 * n * (k + 1)
3718    return chisq, achisqprob(chisq, k - 1)
3719
3720#####################################
3721####  APROBABILITY CALCULATIONS  ####
3722#####################################
3723
3724  def achisqprob(chisq, df):
3725    """
3726Returns the (1-tail) probability value associated with the provided chi-square
3727value and df.  Heavily modified from chisq.c in Gary Perlman's |Stat.  Can
3728handle multiple dimensions.
3729
3730Usage:   achisqprob(chisq,df)    chisq=chisquare stat., df=degrees of freedom
3731"""
3732    BIG = 200.0
3733
3734    def ex(x):
3735      BIG = 200.0
3736      exponents = N.where(N.less(x, -BIG), -BIG, x)
3737      return N.exp(exponents)
3738
3739    if type(chisq) == N.ndarray:
3740      arrayflag = 1
3741    else:
3742      arrayflag = 0
3743      chisq = N.array([chisq])
3744    if df < 1:
3745      return N.ones(chisq.shape, N.float)
3746    probs = N.zeros(chisq.shape, N.float_)
3747    probs = N.where(
3748        N.less_equal(chisq, 0), 1.0, probs)  # set prob=1 for chisq<0
3749    a = 0.5 * chisq
3750    if df > 1:
3751      y = ex(-a)
3752    if df % 2 == 0:
3753      even = 1
3754      s = y * 1
3755      s2 = s * 1
3756    else:
3757      even = 0
3758      s = 2.0 * azprob(-N.sqrt(chisq))
3759      s2 = s * 1
3760    if (df > 2):
3761      chisq = 0.5 * (df - 1.0)
3762      if even:
3763        z = N.ones(probs.shape, N.float_)
3764      else:
3765        z = 0.5 * N.ones(probs.shape, N.float_)
3766      if even:
3767        e = N.zeros(probs.shape, N.float_)
3768      else:
3769        e = N.log(N.sqrt(N.pi)) * N.ones(probs.shape, N.float_)
3770      c = N.log(a)
3771      mask = N.zeros(probs.shape)
3772      a_big = N.greater(a, BIG)
3773      a_big_frozen = -1 * N.ones(probs.shape, N.float_)
3774      totalelements = N.multiply.reduce(N.array(probs.shape))
3775      while asum(mask) <> totalelements:
3776        e = N.log(z) + e
3777        s = s + ex(c * z - a - e)
3778        z = z + 1.0
3779        #            print z, e, s
3780        newmask = N.greater(z, chisq)
3781        a_big_frozen = N.where(newmask * N.equal(mask, 0) * a_big, s,
3782                               a_big_frozen)
3783        mask = N.clip(newmask + mask, 0, 1)
3784      if even:
3785        z = N.ones(probs.shape, N.float_)
3786        e = N.ones(probs.shape, N.float_)
3787      else:
3788        z = 0.5 * N.ones(probs.shape, N.float_)
3789        e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape, N.float_)
3790      c = 0.0
3791      mask = N.zeros(probs.shape)
3792      a_notbig_frozen = -1 * N.ones(probs.shape, N.float_)
3793      while asum(mask) <> totalelements:
3794        e = e * (a / z.astype(N.float_))
3795        c = c + e
3796        z = z + 1.0
3797        #            print '#2', z, e, c, s, c*y+s2
3798        newmask = N.greater(z, chisq)
3799        a_notbig_frozen = N.where(newmask * N.equal(mask, 0) * (1 - a_big),
3800                                  c * y + s2, a_notbig_frozen)
3801        mask = N.clip(newmask + mask, 0, 1)
3802      probs = N.where(
3803          N.equal(probs, 1), 1, N.where(
3804              N.greater(a, BIG), a_big_frozen, a_notbig_frozen))
3805      return probs
3806    else:
3807      return s
3808
3809  def aerfcc(x):
3810    """
3811Returns the complementary error function erfc(x) with fractional error
3812everywhere less than 1.2e-7.  Adapted from Numerical Recipies.  Can
3813handle multiple dimensions.
3814
3815Usage:   aerfcc(x)
3816"""
3817    z = abs(x)
3818    t = 1.0 / (1.0 + 0.5 * z)
3819    ans = t * N.exp(-z * z - 1.26551223 + t * (1.00002368 + t * (
3820        0.37409196 + t * (0.09678418 + t * (-0.18628806 + t * (
3821            0.27886807 + t * (-1.13520398 + t * (1.48851587 + t * (
3822                -0.82215223 + t * 0.17087277)))))))))
3823    return N.where(N.greater_equal(x, 0), ans, 2.0 - ans)
3824
3825  def azprob(z):
3826    """
3827Returns the area under the normal curve 'to the left of' the given z value.
3828Thus,
3829    for z<0, zprob(z) = 1-tail probability
3830    for z>0, 1.0-zprob(z) = 1-tail probability
3831    for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
3832Adapted from z.c in Gary Perlman's |Stat.  Can handle multiple dimensions.
3833
3834Usage:   azprob(z)    where z is a z-value
3835"""
3836
3837    def yfunc(y):
3838      x = (((((((
3839          ((((((-0.000045255659 * y + 0.000152529290) * y - 0.000019538132) * y
3840              - 0.000676904986) * y + 0.001390604284) * y - 0.000794620820) * y
3841           - 0.002034254874) * y + 0.006549791214) * y - 0.010557625006) * y +
3842               0.011630447319) * y - 0.009279453341) * y + 0.005353579108) * y -
3843            0.002141268741) * y + 0.000535310849) * y + 0.999936657524
3844      return x
3845
3846    def wfunc(w):
3847      x = ((((((((0.000124818987 * w - 0.001075204047) * w + 0.005198775019) * w
3848                - 0.019198292004) * w + 0.059054035642) * w - 0.151968751364) *
3849             w + 0.319152932694) * w - 0.531923007300) * w +
3850           0.797884560593) * N.sqrt(w) * 2.0
3851      return x
3852
3853    Z_MAX = 6.0  # maximum meaningful z-value
3854    x = N.zeros(z.shape, N.float_)  # initialize
3855    y = 0.5 * N.fabs(z)
3856    x = N.where(N.less(y, 1.0), wfunc(y * y), yfunc(y - 2.0))  # get x's
3857    x = N.where(N.greater(y, Z_MAX * 0.5), 1.0, x)  # kill those with big Z
3858    prob = N.where(N.greater(z, 0), (x + 1) * 0.5, (1 - x) * 0.5)
3859    return prob
3860
3861  def aksprob(alam):
3862    """
3863Returns the probability value for a K-S statistic computed via ks_2samp.
3864Adapted from Numerical Recipies.  Can handle multiple dimensions.
3865
3866Usage:   aksprob(alam)
3867"""
3868    if type(alam) == N.ndarray:
3869      frozen = -1 * N.ones(alam.shape, N.float64)
3870      alam = alam.astype(N.float64)
3871      arrayflag = 1
3872    else:
3873      frozen = N.array(-1.)
3874      alam = N.array(alam, N.float64)
3875      arrayflag = 1
3876    mask = N.zeros(alam.shape)
3877    fac = 2.0 * N.ones(alam.shape, N.float_)
3878    sum = N.zeros(alam.shape, N.float_)
3879    termbf = N.zeros(alam.shape, N.float_)
3880    a2 = N.array(-2.0 * alam * alam, N.float64)
3881    totalelements = N.multiply.reduce(N.array(mask.shape))
3882    for j in range(1, 201):
3883      if asum(mask) == totalelements:
3884        break
3885      exponents = (a2 * j * j)
3886      overflowmask = N.less(exponents, -746)
3887      frozen = N.where(overflowmask, 0, frozen)
3888      mask = mask + overflowmask
3889      term = fac * N.exp(exponents)
3890      sum = sum + term
3891      newmask = N.where(
3892          N.less_equal(
3893              abs(term), (0.001 * termbf)) + N.less(
3894                  abs(term), 1.0e-8 * sum), 1, 0)
3895      frozen = N.where(newmask * N.equal(mask, 0), sum, frozen)
3896      mask = N.clip(mask + newmask, 0, 1)
3897      fac = -fac
3898      termbf = abs(term)
3899    if arrayflag:
3900      return N.where(
3901          N.equal(frozen, -1), 1.0, frozen)  # 1.0 if doesn't converge
3902    else:
3903      return N.where(
3904          N.equal(frozen, -1), 1.0, frozen)[0]  # 1.0 if doesn't converge
3905
3906  def afprob(dfnum, dfden, F):
3907    """
3908Returns the 1-tailed significance level (p-value) of an F statistic
3909given the degrees of freedom for the numerator (dfR-dfF) and the degrees
3910of freedom for the denominator (dfF).  Can handle multiple dims for F.
3911
3912Usage:   afprob(dfnum, dfden, F)   where usually dfnum=dfbn, dfden=dfwn
3913"""
3914    if type(F) == N.ndarray:
3915      return abetai(0.5 * dfden, 0.5 * dfnum, dfden / (1.0 * dfden + dfnum * F))
3916    else:
3917      return abetai(0.5 * dfden, 0.5 * dfnum, dfden / float(dfden + dfnum * F))
3918
3919  def abetacf(a, b, x, verbose=1):
3920    """
3921Evaluates the continued fraction form of the incomplete Beta function,
3922betai.  (Adapted from: Numerical Recipies in C.)  Can handle multiple
3923dimensions for x.
3924
3925Usage:   abetacf(a,b,x,verbose=1)
3926"""
3927    ITMAX = 200
3928    EPS = 3.0e-7
3929
3930    arrayflag = 1
3931    if type(x) == N.ndarray:
3932      frozen = N.ones(x.shape,
3933                      N.float_) * -1  #start out w/ -1s, should replace all
3934    else:
3935      arrayflag = 0
3936      frozen = N.array([-1])
3937      x = N.array([x])
3938    mask = N.zeros(x.shape)
3939    bm = az = am = 1.0
3940    qab = a + b
3941    qap = a + 1.0
3942    qam = a - 1.0
3943    bz = 1.0 - qab * x / qap
3944    for i in range(ITMAX + 1):
3945      if N.sum(N.ravel(N.equal(frozen, -1))) == 0:
3946        break
3947      em = float(i + 1)
3948      tem = em + em
3949      d = em * (b - em) * x / ((qam + tem) * (a + tem))
3950      ap = az + d * am
3951      bp = bz + d * bm
3952      d = -(a + em) * (qab + em) * x / ((qap + tem) * (a + tem))
3953      app = ap + d * az
3954      bpp = bp + d * bz
3955      aold = az * 1
3956      am = ap / bpp
3957      bm = bp / bpp
3958      az = app / bpp
3959      bz = 1.0
3960      newmask = N.less(abs(az - aold), EPS * abs(az))
3961      frozen = N.where(newmask * N.equal(mask, 0), az, frozen)
3962      mask = N.clip(mask + newmask, 0, 1)
3963    noconverge = asum(N.equal(frozen, -1))
3964    if noconverge <> 0 and verbose:
3965      print 'a or b too big, or ITMAX too small in Betacf for ', noconverge, ' elements'
3966    if arrayflag:
3967      return frozen
3968    else:
3969      return frozen[0]
3970
3971  def agammln(xx):
3972    """
3973Returns the gamma function of xx.
3974    Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
3975Adapted from: Numerical Recipies in C.  Can handle multiple dims ... but
3976probably doesn't normally have to.
3977
3978Usage:   agammln(xx)
3979"""
3980    coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
3981             0.120858003e-2, -0.536382e-5]
3982    x = xx - 1.0
3983    tmp = x + 5.5
3984    tmp = tmp - (x + 0.5) * N.log(tmp)
3985    ser = 1.0
3986    for j in range(len(coeff)):
3987      x = x + 1
3988      ser = ser + coeff[j] / x
3989    return -tmp + N.log(2.50662827465 * ser)
3990
3991  def abetai(a, b, x, verbose=1):
3992    """
3993Returns the incomplete beta function:
3994
3995    I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
3996
3997where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
3998function of a.  The continued fraction formulation is implemented
3999here, using the betacf function.  (Adapted from: Numerical Recipies in
4000C.)  Can handle multiple dimensions.
4001
4002Usage:   abetai(a,b,x,verbose=1)
4003"""
4004    TINY = 1e-15
4005    if type(a) == N.ndarray:
4006      if asum(N.less(x, 0) + N.greater(x, 1)) <> 0:
4007        raise ValueError, 'Bad x in abetai'
4008    x = N.where(N.equal(x, 0), TINY, x)
4009    x = N.where(N.equal(x, 1.0), 1 - TINY, x)
4010
4011    bt = N.where(N.equal(x, 0) + N.equal(x, 1), 0, -1)
4012    exponents = (gammln(a + b) - gammln(a) - gammln(b) + a * N.log(x) + b *
4013                 N.log(1.0 - x))
4014    # 746 (below) is the MAX POSSIBLE BEFORE OVERFLOW
4015    exponents = N.where(N.less(exponents, -740), -740, exponents)
4016    bt = N.exp(exponents)
4017    if type(x) == N.ndarray:
4018      ans = N.where(
4019          N.less(x, (a + 1) / (a + b + 2.0)), bt * abetacf(a, b, x, verbose) /
4020          float(a), 1.0 - bt * abetacf(b, a, 1.0 - x, verbose) / float(b))
4021    else:
4022      if x < (a + 1) / (a + b + 2.0):
4023        ans = bt * abetacf(a, b, x, verbose) / float(a)
4024      else:
4025        ans = 1.0 - bt * abetacf(b, a, 1.0 - x, verbose) / float(b)
4026    return ans
4027
4028#####################################
4029#######  AANOVA CALCULATIONS  #######
4030#####################################
4031
4032  import numpy.linalg, operator
4033  LA = numpy.linalg
4034
4035  def aglm(data, para):
4036    """
4037Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken
4038from:
4039    Peterson et al. Statistical limitations in functional neuroimaging
4040    I. Non-inferential methods and statistical models.  Phil Trans Royal Soc
4041    Lond B 354: 1239-1260.
4042
4043Usage:   aglm(data,para)
4044Returns: statistic, p-value ???
4045"""
4046    if len(para) <> len(data):
4047      print 'data and para must be same length in aglm'
4048      return
4049    n = len(para)
4050    p = pstat.aunique(para)
4051    x = N.zeros((n, len(p)))  # design matrix
4052    for l in range(len(p)):
4053      x[:, l] = N.equal(para, p[l])
4054    b = N.dot(
4055        N.dot(
4056            LA.inv(N.dot(
4057                N.transpose(x), x)),  # i.e., b=inv(X'X)X'Y
4058            N.transpose(x)),
4059        data)
4060    diffs = (data - N.dot(x, b))
4061    s_sq = 1. / (n - len(p)) * N.dot(N.transpose(diffs), diffs)
4062
4063    if len(p) == 2:  # ttest_ind
4064      c = N.array([1, -1])
4065      df = n - 2
4066      fact = asum(1.0 / asum(x, 0))  # i.e., 1/n1 + 1/n2 + 1/n3 ...
4067      t = N.dot(c, b) / N.sqrt(s_sq * fact)
4068      probs = abetai(0.5 * df, 0.5, float(df) / (df + t * t))
4069      return t, probs
4070
4071  def aF_oneway(*args):
4072    """
4073Performs a 1-way ANOVA, returning an F-value and probability given
4074any number of groups.  From Heiman, pp.394-7.
4075
4076Usage:   aF_oneway (*args)    where *args is 2 or more arrays, one per
4077                                  treatment group
4078Returns: f-value, probability
4079"""
4080    na = len(args)  # ANOVA on 'na' groups, each in it's own array
4081    means = [0] * na
4082    vars = [0] * na
4083    ns = [0] * na
4084    alldata = []
4085    tmp = map(N.array, args)
4086    means = map(amean, tmp)
4087    vars = map(avar, tmp)
4088    ns = map(len, args)
4089    alldata = N.concatenate(args)
4090    bign = len(alldata)
4091    sstot = ass(alldata) - (asquare_of_sums(alldata) / float(bign))
4092    ssbn = 0
4093    for a in args:
4094      ssbn = ssbn + asquare_of_sums(N.array(a)) / float(len(a))
4095    ssbn = ssbn - (asquare_of_sums(alldata) / float(bign))
4096    sswn = sstot - ssbn
4097    dfbn = na - 1
4098    dfwn = bign - na
4099    msb = ssbn / float(dfbn)
4100    msw = sswn / float(dfwn)
4101    f = msb / msw
4102    prob = fprob(dfbn, dfwn, f)
4103    return f, prob
4104
4105  def aF_value(ER, EF, dfR, dfF):
4106    """
4107Returns an F-statistic given the following:
4108        ER  = error associated with the null hypothesis (the Restricted model)
4109        EF  = error associated with the alternate hypothesis (the Full model)
4110        dfR = degrees of freedom the Restricted model
4111        dfF = degrees of freedom associated with the Restricted model
4112"""
4113    return ((ER - EF) / float(dfR - dfF) / (EF / float(dfF)))
4114
4115  def outputfstats(Enum, Eden, dfnum, dfden, f, prob):
4116    Enum = round(Enum, 3)
4117    Eden = round(Eden, 3)
4118    dfnum = round(Enum, 3)
4119    dfden = round(dfden, 3)
4120    f = round(f, 3)
4121    prob = round(prob, 3)
4122    suffix = ''  # for *s after the p-value
4123    if prob < 0.001:
4124      suffix = '  ***'
4125    elif prob < 0.01:
4126      suffix = '  **'
4127    elif prob < 0.05:
4128      suffix = '  *'
4129    title = [['EF/ER', 'DF', 'Mean Square', 'F-value', 'prob', '']]
4130    lofl = title + [[Enum, dfnum, round(Enum / float(dfnum), 3), f, prob, suffix
4131                    ], [Eden, dfden, round(Eden / float(dfden), 3), '', '', '']]
4132    pstat.printcc(lofl)
4133    return
4134
4135  def F_value_multivariate(ER, EF, dfnum, dfden):
4136    """
4137Returns an F-statistic given the following:
4138        ER  = error associated with the null hypothesis (the Restricted model)
4139        EF  = error associated with the alternate hypothesis (the Full model)
4140        dfR = degrees of freedom the Restricted model
4141        dfF = degrees of freedom associated with the Restricted model
4142where ER and EF are matrices from a multivariate F calculation.
4143"""
4144    if type(ER) in [IntType, FloatType]:
4145      ER = N.array([[ER]])
4146    if type(EF) in [IntType, FloatType]:
4147      EF = N.array([[EF]])
4148    n_um = (LA.det(ER) - LA.det(EF)) / float(dfnum)
4149    d_en = LA.det(EF) / float(dfden)
4150    return n_um / d_en
4151
4152#####################################
4153#######  ASUPPORT FUNCTIONS  ########
4154#####################################
4155
4156  def asign(a):
4157    """
4158Usage:   asign(a)
4159Returns: array shape of a, with -1 where a<0 and +1 where a>=0
4160"""
4161    a = N.asarray(a)
4162    if ((type(a) == type(1.4)) or (type(a) == type(1))):
4163      return a - a - N.less(a, 0) + N.greater(a, 0)
4164    else:
4165      return N.zeros(N.shape(a)) - N.less(a, 0) + N.greater(a, 0)
4166
4167  def asum(a, dimension=None, keepdims=0):
4168    """
4169An alternative to the Numeric.add.reduce function, which allows one to
4170(1) collapse over multiple dimensions at once, and/or (2) to retain
4171all dimensions in the original array (squashing one down to size.
4172Dimension can equal None (ravel array first), an integer (the
4173dimension over which to operate), or a sequence (operate over multiple
4174dimensions).  If keepdims=1, the resulting array will have as many
4175dimensions as the input array.
4176
4177Usage:   asum(a, dimension=None, keepdims=0)
4178Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1
4179"""
4180    if type(a) == N.ndarray and a.dtype in [N.int_, N.short, N.ubyte]:
4181      a = a.astype(N.float_)
4182    if dimension == None:
4183      s = N.sum(N.ravel(a))
4184    elif type(dimension) in [IntType, FloatType]:
4185      s = N.add.reduce(a, dimension)
4186      if keepdims == 1:
4187        shp = list(a.shape)
4188        shp[dimension] = 1
4189        s = N.reshape(s, shp)
4190    else:  # must be a SEQUENCE of dims to sum over
4191      dims = list(dimension)
4192      dims.sort()
4193      dims.reverse()
4194      s = a * 1.0
4195      for dim in dims:
4196        s = N.add.reduce(s, dim)
4197      if keepdims == 1:
4198        shp = list(a.shape)
4199        for dim in dims:
4200          shp[dim] = 1
4201        s = N.reshape(s, shp)
4202    return s
4203
4204  def acumsum(a, dimension=None):
4205    """
4206Returns an array consisting of the cumulative sum of the items in the
4207passed array.  Dimension can equal None (ravel array first), an
4208integer (the dimension over which to operate), or a sequence (operate
4209over multiple dimensions, but this last one just barely makes sense).
4210
4211Usage:   acumsum(a,dimension=None)
4212"""
4213    if dimension == None:
4214      a = N.ravel(a)
4215      dimension = 0
4216    if type(dimension) in [ListType, TupleType, N.ndarray]:
4217      dimension = list(dimension)
4218      dimension.sort()
4219      dimension.reverse()
4220      for d in dimension:
4221        a = N.add.accumulate(a, d)
4222      return a
4223    else:
4224      return N.add.accumulate(a, dimension)
4225
4226  def ass(inarray, dimension=None, keepdims=0):
4227    """
4228Squares each value in the passed array, adds these squares & returns
4229the result.  Unfortunate function name. :-) Defaults to ALL values in
4230the array.  Dimension can equal None (ravel array first), an integer
4231(the dimension over which to operate), or a sequence (operate over
4232multiple dimensions).  Set keepdims=1 to maintain the original number
4233of dimensions.
4234
4235Usage:   ass(inarray, dimension=None, keepdims=0)
4236Returns: sum-along-'dimension' for (inarray*inarray)
4237"""
4238    if dimension == None:
4239      inarray = N.ravel(inarray)
4240      dimension = 0
4241    return asum(inarray * inarray, dimension, keepdims)
4242
4243  def asummult(array1, array2, dimension=None, keepdims=0):
4244    """
4245Multiplies elements in array1 and array2, element by element, and
4246returns the sum (along 'dimension') of all resulting multiplications.
4247Dimension can equal None (ravel array first), an integer (the
4248dimension over which to operate), or a sequence (operate over multiple
4249dimensions).  A trivial function, but included for completeness.
4250
4251Usage:   asummult(array1,array2,dimension=None,keepdims=0)
4252"""
4253    if dimension == None:
4254      array1 = N.ravel(array1)
4255      array2 = N.ravel(array2)
4256      dimension = 0
4257    return asum(array1 * array2, dimension, keepdims)
4258
4259  def asquare_of_sums(inarray, dimension=None, keepdims=0):
4260    """
4261Adds the values in the passed array, squares that sum, and returns the
4262result.  Dimension can equal None (ravel array first), an integer (the
4263dimension over which to operate), or a sequence (operate over multiple
4264dimensions).  If keepdims=1, the returned array will have the same
4265NUMBER of dimensions as the original.
4266
4267Usage:   asquare_of_sums(inarray, dimension=None, keepdims=0)
4268Returns: the square of the sum over dim(s) in dimension
4269"""
4270    if dimension == None:
4271      inarray = N.ravel(inarray)
4272      dimension = 0
4273    s = asum(inarray, dimension, keepdims)
4274    if type(s) == N.ndarray:
4275      return s.astype(N.float_) * s
4276    else:
4277      return float(s) * s
4278
4279  def asumdiffsquared(a, b, dimension=None, keepdims=0):
4280    """
4281Takes pairwise differences of the values in arrays a and b, squares
4282these differences, and returns the sum of these squares.  Dimension
4283can equal None (ravel array first), an integer (the dimension over
4284which to operate), or a sequence (operate over multiple dimensions).
4285keepdims=1 means the return shape = len(a.shape) = len(b.shape)
4286
4287Usage:   asumdiffsquared(a,b)
4288Returns: sum[ravel(a-b)**2]
4289"""
4290    if dimension == None:
4291      inarray = N.ravel(a)
4292      dimension = 0
4293    return asum((a - b)**2, dimension, keepdims)
4294
4295  def ashellsort(inarray):
4296    """
4297Shellsort algorithm.  Sorts a 1D-array.
4298
4299Usage:   ashellsort(inarray)
4300Returns: sorted-inarray, sorting-index-vector (for original array)
4301"""
4302    n = len(inarray)
4303    svec = inarray * 1.0
4304    ivec = range(n)
4305    gap = n / 2  # integer division needed
4306    while gap > 0:
4307      for i in range(gap, n):
4308        for j in range(i - gap, -1, -gap):
4309          while j >= 0 and svec[j] > svec[j + gap]:
4310            temp = svec[j]
4311            svec[j] = svec[j + gap]
4312            svec[j + gap] = temp
4313            itemp = ivec[j]
4314            ivec[j] = ivec[j + gap]
4315            ivec[j + gap] = itemp
4316      gap = gap / 2  # integer division needed
4317#    svec is now sorted input vector, ivec has the order svec[i] = vec[ivec[i]]
4318    return svec, ivec
4319
4320  def arankdata(inarray):
4321    """
4322Ranks the data in inarray, dealing with ties appropritely.  Assumes
4323a 1D inarray.  Adapted from Gary Perlman's |Stat ranksort.
4324
4325Usage:   arankdata(inarray)
4326Returns: array of length equal to inarray, containing rank scores
4327"""
4328    n = len(inarray)
4329    svec, ivec = ashellsort(inarray)
4330    sumranks = 0
4331    dupcount = 0
4332    newarray = N.zeros(n, N.float_)
4333    for i in range(n):
4334      sumranks = sumranks + i
4335      dupcount = dupcount + 1
4336      if i == n - 1 or svec[i] <> svec[i + 1]:
4337        averank = sumranks / float(dupcount) + 1
4338        for j in range(i - dupcount + 1, i + 1):
4339          newarray[ivec[j]] = averank
4340        sumranks = 0
4341        dupcount = 0
4342    return newarray
4343
4344  def afindwithin(data):
4345    """
4346Returns a binary vector, 1=within-subject factor, 0=between.  Input
4347equals the entire data array (i.e., column 1=random factor, last
4348column = measured values.
4349
4350Usage:   afindwithin(data)     data in |Stat format
4351"""
4352    numfact = len(data[0]) - 2
4353    withinvec = [0] * numfact
4354    for col in range(1, numfact + 1):
4355      rows = pstat.linexand(data, col, pstat.unique(pstat.colex(data, 1))[0]
4356                           )  # get 1 level of this factor
4357      if len(pstat.unique(pstat.colex(rows, 0))) < len(
4358          rows):  # if fewer subjects than scores on this factor
4359        withinvec[col - 1] = 1
4360    return withinvec
4361
4362  #########################################################
4363  #########################################################
4364  ######  RE-DEFINE DISPATCHES TO INCLUDE ARRAYS  #########
4365  #########################################################
4366  #########################################################
4367
4368  ## CENTRAL TENDENCY:
4369  geometricmean = Dispatch(
4370      (lgeometricmean, (ListType, TupleType)), (ageometricmean, (N.ndarray,)))
4371  harmonicmean = Dispatch(
4372      (lharmonicmean, (ListType, TupleType)), (aharmonicmean, (N.ndarray,)))
4373  mean = Dispatch((lmean, (ListType, TupleType)), (amean, (N.ndarray,)))
4374  median = Dispatch((lmedian, (ListType, TupleType)), (amedian, (N.ndarray,)))
4375  medianscore = Dispatch(
4376      (lmedianscore, (ListType, TupleType)), (amedianscore, (N.ndarray,)))
4377  mode = Dispatch((lmode, (ListType, TupleType)), (amode, (N.ndarray,)))
4378  tmean = Dispatch((atmean, (N.ndarray,)))
4379  tvar = Dispatch((atvar, (N.ndarray,)))
4380  tstdev = Dispatch((atstdev, (N.ndarray,)))
4381  tsem = Dispatch((atsem, (N.ndarray,)))
4382
4383  ## VARIATION:
4384  moment = Dispatch((lmoment, (ListType, TupleType)), (amoment, (N.ndarray,)))
4385  variation = Dispatch(
4386      (lvariation, (ListType, TupleType)), (avariation, (N.ndarray,)))
4387  skew = Dispatch((lskew, (ListType, TupleType)), (askew, (N.ndarray,)))
4388  kurtosis = Dispatch(
4389      (lkurtosis, (ListType, TupleType)), (akurtosis, (N.ndarray,)))
4390  describe = Dispatch(
4391      (ldescribe, (ListType, TupleType)), (adescribe, (N.ndarray,)))
4392
4393  ## DISTRIBUTION TESTS
4394
4395  skewtest = Dispatch(
4396      (askewtest, (ListType, TupleType)), (askewtest, (N.ndarray,)))
4397  kurtosistest = Dispatch(
4398      (akurtosistest, (ListType, TupleType)), (akurtosistest, (N.ndarray,)))
4399  normaltest = Dispatch(
4400      (anormaltest, (ListType, TupleType)), (anormaltest, (N.ndarray,)))
4401
4402  ## FREQUENCY STATS:
4403  itemfreq = Dispatch(
4404      (litemfreq, (ListType, TupleType)), (aitemfreq, (N.ndarray,)))
4405  scoreatpercentile = Dispatch(
4406      (lscoreatpercentile, (ListType, TupleType)), (ascoreatpercentile,
4407                                                    (N.ndarray,)))
4408  percentileofscore = Dispatch(
4409      (lpercentileofscore, (ListType, TupleType)), (apercentileofscore,
4410                                                    (N.ndarray,)))
4411  histogram = Dispatch(
4412      (lhistogram, (ListType, TupleType)), (ahistogram, (N.ndarray,)))
4413  cumfreq = Dispatch(
4414      (lcumfreq, (ListType, TupleType)), (acumfreq, (N.ndarray,)))
4415  relfreq = Dispatch(
4416      (lrelfreq, (ListType, TupleType)), (arelfreq, (N.ndarray,)))
4417
4418  ## VARIABILITY:
4419  obrientransform = Dispatch(
4420      (lobrientransform, (ListType, TupleType)), (aobrientransform,
4421                                                  (N.ndarray,)))
4422  samplevar = Dispatch(
4423      (lsamplevar, (ListType, TupleType)), (asamplevar, (N.ndarray,)))
4424  samplestdev = Dispatch(
4425      (lsamplestdev, (ListType, TupleType)), (asamplestdev, (N.ndarray,)))
4426  signaltonoise = Dispatch((asignaltonoise, (N.ndarray,)),)
4427  var = Dispatch((lvar, (ListType, TupleType)), (avar, (N.ndarray,)))
4428  stdev = Dispatch((lstdev, (ListType, TupleType)), (astdev, (N.ndarray,)))
4429  sterr = Dispatch((lsterr, (ListType, TupleType)), (asterr, (N.ndarray,)))
4430  sem = Dispatch((lsem, (ListType, TupleType)), (asem, (N.ndarray,)))
4431  z = Dispatch((lz, (ListType, TupleType)), (az, (N.ndarray,)))
4432  zs = Dispatch((lzs, (ListType, TupleType)), (azs, (N.ndarray,)))
4433
4434  ## TRIMMING FCNS:
4435  threshold = Dispatch((athreshold, (N.ndarray,)),)
4436  trimboth = Dispatch(
4437      (ltrimboth, (ListType, TupleType)), (atrimboth, (N.ndarray,)))
4438  trim1 = Dispatch((ltrim1, (ListType, TupleType)), (atrim1, (N.ndarray,)))
4439
4440  ## CORRELATION FCNS:
4441  paired = Dispatch((lpaired, (ListType, TupleType)), (apaired, (N.ndarray,)))
4442  lincc = Dispatch((llincc, (ListType, TupleType)), (alincc, (N.ndarray,)))
4443  pearsonr = Dispatch(
4444      (lpearsonr, (ListType, TupleType)), (apearsonr, (N.ndarray,)))
4445  spearmanr = Dispatch(
4446      (lspearmanr, (ListType, TupleType)), (aspearmanr, (N.ndarray,)))
4447  pointbiserialr = Dispatch(
4448      (lpointbiserialr, (ListType, TupleType)), (apointbiserialr, (N.ndarray,)))
4449  kendalltau = Dispatch(
4450      (lkendalltau, (ListType, TupleType)), (akendalltau, (N.ndarray,)))
4451  linregress = Dispatch(
4452      (llinregress, (ListType, TupleType)), (alinregress, (N.ndarray,)))
4453
4454  ## INFERENTIAL STATS:
4455  ttest_1samp = Dispatch(
4456      (lttest_1samp, (ListType, TupleType)), (attest_1samp, (N.ndarray,)))
4457  ttest_ind = Dispatch(
4458      (lttest_ind, (ListType, TupleType)), (attest_ind, (N.ndarray,)))
4459  ttest_rel = Dispatch(
4460      (lttest_rel, (ListType, TupleType)), (attest_rel, (N.ndarray,)))
4461  chisquare = Dispatch(
4462      (lchisquare, (ListType, TupleType)), (achisquare, (N.ndarray,)))
4463  ks_2samp = Dispatch(
4464      (lks_2samp, (ListType, TupleType)), (aks_2samp, (N.ndarray,)))
4465  mannwhitneyu = Dispatch(
4466      (lmannwhitneyu, (ListType, TupleType)), (amannwhitneyu, (N.ndarray,)))
4467  tiecorrect = Dispatch(
4468      (ltiecorrect, (ListType, TupleType)), (atiecorrect, (N.ndarray,)))
4469  ranksums = Dispatch(
4470      (lranksums, (ListType, TupleType)), (aranksums, (N.ndarray,)))
4471  wilcoxont = Dispatch(
4472      (lwilcoxont, (ListType, TupleType)), (awilcoxont, (N.ndarray,)))
4473  kruskalwallish = Dispatch(
4474      (lkruskalwallish, (ListType, TupleType)), (akruskalwallish, (N.ndarray,)))
4475  friedmanchisquare = Dispatch(
4476      (lfriedmanchisquare, (ListType, TupleType)), (afriedmanchisquare,
4477                                                    (N.ndarray,)))
4478
4479  ## PROBABILITY CALCS:
4480  chisqprob = Dispatch(
4481      (lchisqprob, (IntType, FloatType)), (achisqprob, (N.ndarray,)))
4482  zprob = Dispatch((lzprob, (IntType, FloatType)), (azprob, (N.ndarray,)))
4483  ksprob = Dispatch((lksprob, (IntType, FloatType)), (aksprob, (N.ndarray,)))
4484  fprob = Dispatch((lfprob, (IntType, FloatType)), (afprob, (N.ndarray,)))
4485  betacf = Dispatch((lbetacf, (IntType, FloatType)), (abetacf, (N.ndarray,)))
4486  betai = Dispatch((lbetai, (IntType, FloatType)), (abetai, (N.ndarray,)))
4487  erfcc = Dispatch((lerfcc, (IntType, FloatType)), (aerfcc, (N.ndarray,)))
4488  gammln = Dispatch((lgammln, (IntType, FloatType)), (agammln, (N.ndarray,)))
4489
4490  ## ANOVA FUNCTIONS:
4491  F_oneway = Dispatch(
4492      (lF_oneway, (ListType, TupleType)), (aF_oneway, (N.ndarray,)))
4493  F_value = Dispatch(
4494      (lF_value, (ListType, TupleType)), (aF_value, (N.ndarray,)))
4495
4496  ## SUPPORT FUNCTIONS:
4497  incr = Dispatch((lincr, (ListType, TupleType, N.ndarray)),)
4498  sum = Dispatch((lsum, (ListType, TupleType)), (asum, (N.ndarray,)))
4499  cumsum = Dispatch((lcumsum, (ListType, TupleType)), (acumsum, (N.ndarray,)))
4500  ss = Dispatch((lss, (ListType, TupleType)), (ass, (N.ndarray,)))
4501  summult = Dispatch(
4502      (lsummult, (ListType, TupleType)), (asummult, (N.ndarray,)))
4503  square_of_sums = Dispatch(
4504      (lsquare_of_sums, (ListType, TupleType)), (asquare_of_sums, (N.ndarray,)))
4505  sumdiffsquared = Dispatch(
4506      (lsumdiffsquared, (ListType, TupleType)), (asumdiffsquared, (N.ndarray,)))
4507  shellsort = Dispatch(
4508      (lshellsort, (ListType, TupleType)), (ashellsort, (N.ndarray,)))
4509  rankdata = Dispatch(
4510      (lrankdata, (ListType, TupleType)), (arankdata, (N.ndarray,)))
4511  findwithin = Dispatch(
4512      (lfindwithin, (ListType, TupleType)), (afindwithin, (N.ndarray,)))
4513
4514######################  END OF NUMERIC FUNCTION BLOCK  #####################
4515
4516######################  END OF STATISTICAL FUNCTIONS  ######################
4517
4518except ImportError:
4519  pass
4520