C Implementation of Minimum Kolmogorov-Smirnov Estimator

Obtain Source Code
Using the Code as is
Modifying Code to Solve One's Own Problem
Control Parameter Guidelines
 

To obtain the source code for this implementation:

Step 1:  Download the gzipped tar file of the source code: mksfitter.tar.gz

Step 2:  From the command line, do
          gunzip  mksfitter.tar.gz
            This unzips the tar file, creating a new file named bcb.C.cont.tar

Step 3:  From the command line, do
          tar -xvf mksfitter.tar
      This extracts these files from the archive:
            bb.dat
            bcb.c
            bcb.h
            bcbparams.dat
            rngs.c
            rngs.h
            runbcb.c
            rvgs.c
            rvgs.h
            rvms.c
            rvms.h
 
 
 

To use this implementation as is:

As it is configured, the code reads a set of data points from standard input.  Then
for each of these families of parametric probability distributions:
    Exponential
    Normal
    Weibull
    Lognormal
    Log logistic
    Gompertz
    Gamma
    Exponential Power
the program uses the BCB algorithm to solve for parameter estimates that minimize
the Kolmogorov-Smirnov goodness-of-fit statistic.  The famous ball-bearing data
set is included in the file bb.dat.
   To compile and run the program:
    Step 1:  From the command line, compile the source by doing
                   gcc bcb.c runbcb.c rngs.c rvgs.c rvms.c -lm -o mksfitter

    Step 2:   From the command line, run the program by doing
                   mksfitter < bb.dat

               As the programming is running,  the list of distribution families will be
               displayed along with the best parameter estimates that BCB identifies, and
               the value of the K-S statistic that these estimates yield.
 
 

To modify this implementation to solve a different problem:

The user should not need to modify any of the source code to find parameter estimates for
a different data set.  Rather, for a new data set, the user should sort it and store it in a file,
say dataset.dat, and then invoke the program as
mksfitter < dataset.dat

Currently, the scale parameters are allowed to range from 0.01 to 100.0, and shape parameters
are allowed to range from 0.0001 to 5.0.   To change these bounds, one must edit the input file
bcbparams.dat .  One must change the UBOUNDS and LBOUNDS entries.  For example, to allow
the scale parameters to range on [0.1, 200.0] and the shape parameters to range on [0.05, 10.0], the
entries should be:
UBOUNDS     200.0 10.0
LBOUNDS     0.1  0.05
to reflect the new upper and lower bounds.

Finally, the user may wish to incorporate additional probability models into
MKSFitter.  This requires editing the files bcb.c and runbcb.c.  The user should examine
the function evalobj() in the file \verb+bcb.c+, as well as the function print_results() in runbcb.c.
These contain similar case statements which loop over the available models.  It should be
clear how to extend these to consider another model.
 

Control Parameters Guidelines:

In addition to the parameters we have seen in the example above, the input file
bcbparams.dat provides seven other control parameters to the user:

POPSIZE  the number of solutions maintained in the population; increasing this quantity
                tends to improve solution quality at the cost of extra computation.

NUMGENS  the number of generations through which the population is evolved; again,
                  increasing the value of this parameter tends to improve solution quality at
                  the cost of extra compuatation.

SIGMA_M    this parameter influences the variability of the normal distribution which
                  shifts the weighted midpoint of the line between two parents.

SIGMA_R     this parameter influences the variability of the normal distribution which
                   governs the length of the raduis of the hypershpere during recombination.

PENALTY1   this is a factor by which a solution's maximum violated constraint is multiplied;
                   the solution's fitness is then penalized by this amount.  This factor applies
                   to all solutions when the best solution so far is feasible.  Increasing this value
                   makes infeasible solutions less attractive once a good feasible solution has been
                   found.
 

PENALTY2   this is a factor by which a solution's maximum violated constraint is multiplied;
                   the solution's fitness is then penalized by this amount.  This factor applies
                   to all solutions when the best solution so far is infeasible.  Increasing this value
                   makes infeasible solutions less attractive once a good infeasible solution has been
                   found.
 
 

PARCHOOSE     this parameter toggles between two methods for selecting parents to mate.
                        When its value is 5, Baker's SUS method is used to select parents based
                        on relative fitness alone.  When its value is 6, a two-criteria version of
                        SUS is used to select parents (one parent is selected based on objective
                        value; the other is selected based on max. constraint value.)

NUMDV        this parameter reflects how many parameters are associated with the
                   probability distributions;  we consider each to be a decision variable.
                   If the user codes a new probability distribution which has more than
                   two parameters, this value should be increased accordingly.