
For new functions:
  .h, .c, XS, export, t/02, Changes, doc, test

- prime_count - needs the pc(#) option as well as pc(#,#)

- Do a GMP version of LMO prime_count.  Possible versions:
    - 32-bit main, 16-bit support
    - 64-bit main, 32-bit support   (using __uint64_t if necessary)
    - 128-bit main, 64-bit support  (gcc only)
    - GMP main, 32-bit support      (portable)
    - GMP main, 64-bit support      (mostly portable)

- nth_prime

- GMP SQUFOF could use a better implementation, though low priority since it
  just isn't going to be the right algorithm for numbers > 2^64.  Mainly what
  it needs is to pay attention to the rounds argument.  Perhaps race.

- Tune and improve SIMPQS for our uses.  Check FLINT 2.3 for improvements.

- Write our own QS.

- The statics in ecm and QS won't play well with threading.

- ECPP: Perhaps more HCPs/WCPs could be loaded if needed?

- ECPP: Another idea is related to Atkin/Morain's EAS.  When we have a large
  number, we can process more Ds, delaying the downrun.  We then use the
  smallest q we found.  Combine with lightened stage 1 factoring as above.
  This drops our q sizes faster, at the expense of more up-front time.
  I have this running, but for small numbers it doesn't matter much, and for
  large numbers it just highlights how much nicer FAS would be.

- ECPP: All discriminants with D % 8 != 3 have been converted to Weber.  We're
  still left with lots of those D values.  Figure out a different invariant
  that will make smaller polynomials, along with a root conversion.

- ECPP: Add a fast BLS5 to downrun?

- Add BLS17 proof.  Merge into BLS5 code since the end is the same.

- Add tests for proofs, similar to MPU t/23.

- Handle objects of type:
     Math::GMP
     Math::GMP::Fast
     Math::GMPz
  We should parse their mpz_t directly, do our processing, and output the
  result as one of these types.

- Recognize Math::BigInt / Math::Pari objects.  Shortcut validation.
  Create results as new objects of their type.

- These functions should be added:
     legendre_phi
     znlog

- Any fast primality pretest would be nice.  I've tested:
    - Colin Plumb's Euler Criterion test
    - Fermat base 210, which is done in GMP's internal millerrabin.c.
    - Fermat base 2 also no faster than SPRP-2, though some claim it is.
      mpz_t e, r;  int composite;
      mpz_init(e);
      mpz_init_set_ui(r, 2);  mpz_sub_ui(e, n, 1);  mpz_powm(r, r, e, n);
      composite = mpz_cmp_ui(r, 1) != 0;
      mpz_clear(r); mpz_clear(e);
      if (composite) return 0;
  None of these are faster on average than just doing BPSW.

- merge the two frobenius tests.  cp is faster, needs the deterministic
  version, we should switch to the two input version (allow GMP), etc.

- tests for sieve_primes.

- speed up range sieving.

- fast prime printing routine.  The following could be trivially 2x faster:
    $n = 10**20; say for Math::Prime::Util::GMP::sieve_primes($n,$n+8e9,0);
  About 3 minutes vs. 7-8 just by using gmp_printf.
  Using sieve_range doesn't help, as the issue is the massive return array.

- Consider ranged ramanujan_tau.  See:
  https://cs.uwaterloo.ca/journals/JIS/VOL13/Lygeros/lygeros5.pdf
  Where we could compute a number of hclassno values, then generate the
  tau values.  This might be more efficient.

- We could do LLR and Proth in prob_prime and return 1 instead of 2, leaving
  certs possible.

- consider probabilistic is_primitive_root for large inputs
  [2024:  what does this mean?]

- Verify speed and memory use of GMP's two binomials for various versions
  and compare.  Looks like Luschny sent his changes after 5.0.0.
  https://gmplib.org/list-archives/gmp-discuss/2010-February/004036.html

- Identify places where 32-bit GMP on 64-bit Perl will trip us up.

- BLS75 methods: check if we can return 0 instead of 1 in many cases.

- Make a trial factor routine that returns all factors.  This can be done
  with a single treesieve.  ECPP could make use of this (since it is doing
  many calls, keeping the product tree around would be useful).
  All current calls are testing primality, so don't need multiple returns.
  Changing the XS call to return multiple values might be useful.

- zeta below -20 or so is increasingly wrong.  We should use reflection
  formula, but that requires some new functions.  The MPU interface doesn't
  even allow negative inputs at all so it isn't critical.

- random prime should be more efficient (use A2 instead of A1).

- gamma, lngamma
  See http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.527.8519
  https://www.researchgate.net/publication/272039164_A_new_fast_asymptotic_series_for_the_gamma_function
  https://arxiv.org/pdf/2301.09699.pdf

- Euler using:
   = binary splitting
   = B3 from Brent/McMillan (1980)
   = FLINT's B3 (b6ebd880cea55e6e6cf99d6c454607a5872b2c96)

- Functions used in MPU that were MPFR.
    = (++) Pi is about 2x faster than MPFR
    = (--) li(10^17) is 2x slower than MPFR
    = (--) ei(0.5) is much slower than MPFR (20x and worse exponent)
    = (--) Euler is much slower than MPFR (10x and worse exponent)
    = (--) zeta is much slower than MPFR (10x and worse exponent)
    =      harmreal, riemannr, prime_count_lower, prime_count_upper

- tinyqs needs a destroy to clear all the init vars

- clean up dependencies links.  We now have to include *every* C file to
  compile SIMPQS standalone.  Completely unnecessary, but some utility
  functions are pulling in other things.

- add sumliouville
  almost_prime_count(k,n)
  nth_almost_prime(k,n)
  smooth_count
  rough_count
  qnr

- make the Bernoulli number cache incremental

- fromdigits input digit array arguably should be signed bigints.

- optimize prime_omega, prime_bigomega

- Rademacher partitions

- overpartitions
  https://arxiv.org/pdf/2303.15895

- rootmod, rootmodp
- allsqrtmod
- allrootmod

- znlog is needed by rootmod
  simple rho
  distinguished point rho

- next_perfect_power, prev_perfect_power
  https://github.com/trizen/sidef/commit/81af8f018cca4ab7813d4cfaa46fd2e45796a52d

- make is_carmichael deterministic test go faster

- have trial factor use treesieve static function.

- the 2021 BPSW test
  https://arxiv.org/pdf/2006.14425
  https://community.wolfram.com/groups/-/m/t/2344199?p_p_auth=UZ0BHnEu

- Look into FLINT's poly ramanujan tau method by Fredrik Johansson.

- Fubini

- subfactorial for large enough n should use floor((n!+1)/e).  See Sidef.

- factorial_valuation.  See MPU's util.c and
  https://github.com/trizen/sidef/commit/e25e9b8429837fb32642bb77275592700f577829

- Look at Sidef's factor_upto(n,limit)

- better hclassno

- consider exporting a sieve segment or sieve from/to routine.  Then our
  trial iterator can sieve segments, vecprod them, do gcd.  Maybe a
  prime_iterator_next64 to get 64 values.

- many files cache small primes.  Do it once.  primes under 64k.

- faster next/prev perfect power, look for odd powers, no loop needed.

- addmod, mulmod, powmod, divmod -- work with GMP version 36 if n > 0

- finish looking at all mp*_ui functions to check for UV/IV vs long use.

- check overflow in prime iterator / trial division for 32-bit unsigned long.

- binomialmod better using factorialmod:
  factorial_valuation and _binoval, then use factorialmod
  also maybe better factorialmod

- real functions should have a settable default precision
