scipy.stats.chi2.ppf() without scipy

Allantools has a dependence on scipy because it uses scipy.stats.chi2.ppf() (inverse of the chi-squared cumulative distribution function) in the code for confidence intervals.

When testing scipy takes quite a long time to install and the whole package seems a bit overkill for just this one function.

So I tried implementing it using just numpy. It sort of works, but not for corner cases where p is very close to 1.0 or k is a big number. I get for example:

Maybe there's something trick when numerically evaluating lower_gamma() using the series expansion that I am missing...

4 Comments

  1. Try computing dg using log-math. not sure if this will render in a comment or not:

    however, this has some failures of the 'isclose' test e.g., at
    190 0.9999 False 271.179788314 271.221549805
    (relative error .00015)

    full test program adapted from yours at https://emergent.unpythonic.net/files/sandbox/chi2_ppf_jepler.py

    boost discusses their implementation: http://www.boost.org/doc/libs/1_35_0/libs/math/doc/sf_and_dist/html/math_toolkit/special/sf_gamma/igamma.html#math_toolkit.special.sf_gamma.igamma.implementation using several different methods depending on the values of (what they call) x and a, may be worth a gander.

    • admin

      May 5, 2016 at 04:52

      Thanks! I will take a look at the links.

      HTML tags 'pre lang="python"' around the code makes it render using the WP-Syntax plugin.

  2. You could maybe get away with exploiting the fact that the gamma function in the denominator of the pdf of a chi-squared random variable is just a normalizing constant. So, e.g., these three functions get me pretty close to the output of scipy.stats.chi2.ppf():

    The "binsize" of x matters, both for speed and accuracy. And if you don't know what k is ahead of time, the fact that you have to define x is kind of a pain, though you can probably also exploit the fact that the expected value of a chi-squared random variable is k and the variance is 2k to define x over an appropriate range once k is known.

    Here's hoping I used the "pre" tag correctly...

  3. admin

    May 6, 2016 at 10:02

    I have a feeling that the cdf(x,k) function is very flat at high x as we get closer to p=1.0. Conversely ppf(p,k) is very steep as we get closer to p=1.0

    One improvement could be to change the bisection() function that looks for the right value of x to instead stop searching with a tolerance check on p (instead of x). Maybe?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

© 2019 anderswallin.net

Theme by Anders NorénUp ↑