![]() |
Home · Pages · Index · Overviews |
Histogram Class ReferenceFlexible statistical analyses of the values in any AForm object. More... #include <histogram.h> Visible Fields
Macro Constants
Routines
Detailed DescriptionHistogram objects capture the distribution of values in an Array, Slice, or Frame, i.e. AForm. From a given histogram one can compute any of a variety of statistics about the values in an array form, and to compute foreground/background thresholds that are a function of its distribution. Almost all of the fields of a histogram are visibile to a user, but should only be used in a read-only fashion. It is not safe for a user to change any of the visible fields as doing so may create a semantic conflict with the private fields. A histogram can be read from and written to a file. A histogram is created (a) for an AForm by calling Histogram_Array, (b) for a Region, Partition P_Vertex, or Level_Tree Level_Set by calling Histogram_Region, Histogram_P_Vertex, or Histogram_Level_Set, respectively, or (c) by reforming the bin range of an existing histogram with Histogram_Slice. An initially empty histogram can also be created directly by calling Make_Histogram. Every bin boundary corresponds to a value which in turn corresponds to a percentile of the data points greater than said value. There are conversion routines to map between bins, values, and percentiles (Bin2Value, Value2Bin, Bin2Percentile, Percentile2Bin, Value2Percentile, Percentile2Value). The routine Print_Histogram provides a flexible set of options for producing an ASCII display of a histogram using the set of flags given in the Macro Constants section. Histograms, once computed, provide a convenient way to deliver statistics about the distribution of values in the array form or region the histogram was computed from. For example, if one builds a 256 bin histogram for a UINT8_TYPE array form, then thereafter one can deliver the mean, standard deviation, and any central moment with Histogram_Mean, Histogram_Sigma, Histogram_Variance, and Histogram_Central_Moment in O(nbins) time versus time proportional to the number of elements in the array form. Of course, the accuracy of a statistic versus that computed over the set of values in the array, depends on the bin size and the total number of elements. In the example above the statistics are perfectly accurate as the bin size is 1. But if a histogram of 256 bins were builts for a UINT16_TYPE array form, then each bin covers 256 values and depending on the distribution of values in the array, the computed statistic could deviate from the true value. As long as the total number of data points is large, the accuracy of distributional statistics is generally acceptable. In addition to distributional statistics, one can compute entropy, and the cross entropy and relative entropy between two histograms (Histogram_Entropy, Histogram_Cross_Entropy, Histogram_Relative_Entropy). The class also currently offers three distribution-based methods for computing a foreground/background threshold (Otsu_Threshold, Triangle_Threshold, Intermeans_Threshold). A common scenario is to use a histogram as the means to compute one or more of the above quantities with respect to a Frame that is passed over every element in an array, implying that a histogram of the same type and range is needed thousands if not millions of times. Rather than create and free a histogram for each frame placement, the routine Histagain_Array allows one to accumulate the counts for a given array form into the bins of an existing histogram. Note carefully that the counts accumulate. To get a histogram for just the object at hand, one must explicitly reset the buckets to zero with Empty_Histogram. Please see the example included with the description of this routine. Moreover, one can also refill a histogram with the pixel values in a Region, Partition P_Vertex, or Level_Tree Level_Set by calling Histagain_Region, Histagain_P_Vertex, or Histagain_Level_Set, respectively. Visible Fields Documentation
A histogram can be over values of any of the three Value_Kinds. This field specifies which one.
The number of bins in the histogram.
The width or size of each bin in the histogram
The offset or lower boundary of the first bin, bin 0, in the histogram. Thus, the range covered by bin b is [offset+b*binsize,offset+(b+1)*binsize).
The total number of data points in the histogram or equivalently the sum of all the counts.
A vector of nbins counts for each bin 0 to nbins−1, i.e. counts[b] is the number of data points whose value is int the range covered by bin b. Routine Documentation
Return the lower border of bin b, i.e. offset + b*binsize. The parameter b does not need to be between 0 and nbins−1 in which case the return value will not be in the domain of the histogram.
Return the bin number b that contains the value v, i.e. maxb s.t. Bin2Value(b) ≤ v. The parameter v does not need to be in the domain of the histogram, in which case the return value will not be a bin number between 0 and nbins−1.
Return the percent of values in the histogram in bins b and higher, i.e. Σc≥b counts[c] / total. The parameter b does not need to be between 0 and nbins−1 but the return value will always be between 0 and 1.
Return the larget bin number b for which the tail of the histogram starting at b contains more than fraction f of the data points, i.e. maxb s.t. Bin2Percentile(b) ≥ f. The parameter f must be between 0 and 1.
Return the percentile of values in the histogram that are estimated to be not less than v, i.e. Bin2Percentile(b) − (v−b)*counts[b]/total where b = Value2Bin(v). Note that linear interpolation is used to estimate the number of values in bin b not less than v.
Return the value for which the tail of the histogram greater than v contains fraction f of the data points, i.e. maxb s.t. Value2Percentile(v) ≥ f.
Generate a histogram of the given kind with nbins bins of size binsize covering the range starting at offset. The histogram is initialized to be empty, i.e. total is zero as is every element of counts.
Empty all buckets of the histogram, i.e. set all buckets to 0.
Generate a histogram of AForm a with nbins of width binsize where the first bin's lower boundary is offset (see field descriptor for offset). The type of values given for binsize and offset should be congruent with the type of a. When nbins or binsize or both are zero, then the value of the parameter(s) with value 0, and the value of offset are selected according to the rules given in the following table, where the range of values in a is assumed to be [min,max]:
Note that when either of the two parameters is 0 then the value provided for offset is ignored as it is irrelevant. Moreover, in the case where a is of one of the small integer types -- UINT8_TYPE, UINT16_TYPE, INT8_TYPE, INT16_TYPE -- if one specifies nbins as the number of values in the range of the type, and offset as the smallest value in the range of the type, and binsize as 1, then the histogram module is smart enough to know that it does not need to check that values are in range, and consequently fills such histograms with the efficiency of hand-tailored code. Concretely, the calls below make for very efficient histogram constructions: Histogram_Array(a,256,VALU(1),VALU(0)) // for UINT8_TYPE Histogram_Array(a,0x10000,VALU(1),VALU(0)) // for UINT16_TYPE Histogram_Array(a,256,VALI(1),VALI(-128)) // for INT8_TYPE Histogram_Array(a,0x10000,VALI(1),VALI(0xffff)) // for INT16_TYPE
Generate a histogram of the pixel values in the respective Region, P_Vertex region v of Partition p, or Level_Set r of Level_Tree t. The interpretation of the arguments bins, binsize, and offset are exactly as for Histogram_Array above.
Accumulates into the bins of histogram h, the counts for the values in a. One must explicitly reset the histogram with Empty_Histogram if one wants just the histogram of a. The type of array a must be compatible with the type of h, and one should only call this routine with clip set to 0 (false) if one is certain that the values in a are within the range of the domain covered by h, i.e. [offset,offset+nbins*binsize]. Otherwise one must call it with clip set to true, in which case out of range values will be detected and clipped (and therefore the routine will not refill the histogram as efficiently). The idea of this routine is to allow one to more efficiently accumulate thousands of histograms, over for example frame windows, where the domain and type of the histogram are fixed. This saves the expense of creating and freeing every histogram. For example, suppose A is a 3D, UINT8_TYPE array. The code snippet below computes the entropy in a 5x5x5 window centered at every pixel of A: Frame *f; Histogram *h; Indx_Type p; double end; h = Make_Histogram(UVAL,256,VALU(1),VALU(0)); f = Make_Frame(A,Coord3(5,5,5),Coord3(2,2,2)); Place_Frame(f,0); for (p = 0; p < A->size; p++) { Histagain_Array(Empty_Histogram(h),f,0); ent = Histogram_Entropy(h); // ... do something with ent = the entropy in a 5x5x5 shape about p Move_Frame_Forward(f); } Free_Frame(f); Free_Histogram(h);
Accumulate into the bins of histogram h, the counts for pixels in the respective Region, P_Vertex region v of Partition p, or Level_Set r of Level_Tree t. Note carefully, that you must explicitly reset the histogram with Empty_Histogram if you want just the histogram of the given object. The interpretation of the argument clip is exactly as for Histagain_Array above.
Generate a histogram based on the histogram h consisting of the bins whose indices are in the interval [min,max-1]. min and max need not be between 0 and h->nbins but it must be that min < max. If min < 0 or max > h->nbins then the new histogram's domain will be expanded as necessary to cover the implied range of [Bin2Value(min),Bin2Value(max)].
Return the requested statistics for the histogram h. Histogram_Mean returns the mean or average. Histogram_Sigma returns the standard deviation. Histogram_Variance returns the variance, or square of the standard deviation. Histogram_Central_Moment returns the nth moment of the distribution, i.e. Σi (xi - μ)n where μ is the mean of the data series { x1, x2, ... xN }. Note carefully, that the accuracy of a statistic computed from the histogram in relation to the same statistic computed directly over the series of values in the AForm from which the histogram was derived, depends on the binning of values into the histogram. The general rule of thumb is that the smaller the binsize the more accurate the statistic.
Histogram_Entropy returns the entropy of the distribution of h, i.e. − Σb p(b) log2 p(b) where p(b) is counts[b]/total for each bin b. The cross entropy between two historams, h and g, returned by Histogram_Cross_Entropy is − Σb p(b) log2 q(b) where q(b), analagous to p(b), is the distribution for g. The histograms h and g must have the same binsize and while their offsets can be different, the difference must be a multiple of the bin size. Finally, Histogram_Relative_Entropy returns the relative entropy between h and g, i.e. − Σb p(b) log2 p(b)/q(b). Note carefully, that the accuracy of a statistic computed from the histogram in relation to the same statistic computed directly over the series of values in the AForm from which the histogram was derived, depends on the binning of values into the histogram. The general rule of thumb is that the smaller the binsize the more accurate the statistic.
Return the Otsu threshold (IEEE Trans. Sys., Man, & Cybernetics, 9, 1 (1979), 62-66) for the AForm over which the histogram h was built. The routine returns the index of the smallest bucket that should be considered foreground as the threshold is only accurate to the nearest multiple of the binsize of the histogram. Use Bin2Value in order to convert the index into a threshold value. In brief, the method picks the threshold that produces the greatest inter-class variance between the induced foreground and background classes.
Apply the "triangle method" of Zack, Rogers, and Latt (J. Histochem. Cytochem. 25, 7 (1977), 741-753) to compute and return the smallest bin index that should be considered foreground in the image array the histogram was derived from. Use Bin2Value in order to convert the index into a threshold value. In brief, the method converges to the threshold that is half-way between the means of the foreground and background classes. Initially the threshold is placed at the half way between the min and the max value in the histogram.
Return the inter-means threshold (IEEE. Trans. on Cybernetics (1978) 630-632 by Ridlar and Calvard) for the image array the histogram was derived from, to the nearest bin index. Use Bin2Value in order to convert the index into a threshold value. In brief, the method picks the threshold whose bin top is the furthest from the line that runs from the top of the bin with the highest count to the empty bin one greater than the non-empty bin with the greatest index.
Print an ascii display of histogram h on stream output indented by indent spaces. If binsize is not 0 then the histogram will be displayed in bins of the given size, with the bin boundaries being multiples of binsize. If binsize is not a multiple of the histogram's binsize (h->binsize), then the counts of spanning bins in the underlying histogram are interpolated. The parameter flag is the bitwise or of one or more of the constants introduced at the top of this page and that determine what is displayed and how it is displayed as outlined in the following table:
As an example, suppose A is a UINT8_TYPE array with a million elements that are randomly generated numbers between 1 and 6 inclusive. Suppose that we make a histogram h with the call Make_Array(A,8,VALU(1),VALU(0)). Then calling Print_Histogram with the flags BIN_COUNT| CUMULATIVE_COUNT|CUMULATIVE_PERCENT and binsize set to 0 (i.e. no effect), produces the following output: 7: 0 0 0.0% 6: 166678 166678 16.7% 5: 167038 333716 33.4% 4: 166591 500307 50.0% 3: 166329 666636 66.7% 2: 166517 833153 83.3% 1: 166847 1000000 100.0% 0: 0 1000000 100.0% Calling Print_Histogram with the flags CUMULATIVE_COUNT|ASCENDING_HGRAM|CLIP_HGRAM and binsize set to 0 (i.e. no effect), on the same histogram would produce: 1: 166847 2: 333364 3: 499693 4: 666284 5: 833322 6: 1000000 As another example, suppose A is a FLOAT64_TYPE array with a million elements that are randomly generated numbers between .2 and .3 inclusive. Suppose that we make a histogram h with the call Make_Array(A,8,VALF(0),VALU(0)). Recall that with such parameter settings we are asking Make_Array to pick the largest number of bins not greater than 8 such that bins whose size is of the form [1,2,5]*10a cover the range of elements in A. In this instance it is 5 bins of size .02. Then calling Print_Histogram with the flags BIN_COUNT|CUMULATIVE_PERCENT and binsize set to 0 (i.e. no effect), produces the following output: 0.28 - 0.30: 200167 20.0% 0.26 - 0.28: 200357 40.1% 0.24 - 0.26: 199802 60.0% 0.22 - 0.24: 199738 80.0% 0.20 - 0.22: 199936 100.0% Calling Print_Histogram with the flags BIN_COUNT|CUMULATIVE_PERCENT and binsize set to VALF(.0125) on the same histogram produces the output below. Note carefully that the numbers for each "pseudo-bin" of size .0125 are obtained by linearly interperlating fractional parts of the .02 bins recorded in the histogram. 0.2875 - 0.3000: 125105 12.5% 0.2750 - 0.2875: 125152 25.0% 0.2625 - 0.2750: 125223 37.5% 0.2500 - 0.2625: 124946 50.0% 0.2375 - 0.2500: 124868 62.5% 0.2250 - 0.2375: 124836 75.0% 0.2125 - 0.2250: 124911 87.5% 0.2000 - 0.2125: 124959 100.0% Finally, calling Print_Histogram with the flags BIN_COUNT|CUMULATIVE_PERCENT|BIN_MIDDLE and binsize set to 0 (i.e. no effect), on the same histogram would produce: BIN_COUNT|CUMULATIVE_PERCENT|BIN_MIDDLE,VALF(0) 0.29: 200167 20.0% 0.27: 200357 40.1% 0.25: 199802 60.0% 0.23: 199738 80.0% 0.21: 199936 100.0% |