Probability distribution fitting or simply distribution fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon. The aim of distribution fitting is to predict the probability or to forecast the frequency of occurrence of the magnitude of the phenomenon in a certain interval.
Distribution fitting is the procedure of selecting a statistical distribution that best fits to a data set generated by some random process. In other words, if there are some random data available, and someone would like to know what particular distribution can be used to describe the data, then distribution fitting is what is being searched for.
Another common application where distribution fitting procedures are useful is when we want to verify the assumption of normality before using some parametric test
In most cases, there will be a need to fit two or more distributions, compare the results, and select the most valid model. The “candidate” distributions that fit should be chosen depending on the nature of your probability data. For example, if someone needs to analyze the time between failures of technical devices, he/she should fit non-negative distributions such as Exponential or Weibull, since the failure time cannot be negative.
The selection of the appropriate distribution depends on the presence or absence of symmetry of the data set with respect to the mean value.
- Symmetrical distributions – When the data are symmetrically distributed around the mean while the frequency of occurrence of data farther away from the mean diminishes, one may for example select the normal distribution, the logistic distribution, or the Student’s t-distribution.
- Skew distributions to the right – When the larger values tend to be farther away from the mean than the smaller values, one has a skew distribution to the right (i.e. there is positive skewness).
- Skew distributions to the left – When the smaller values tend to be farther away from the mean than the larger values, one has a skew distribution to the left (i.e. there is negative skewness).
Techniques of fitting
The following techniques of distribution fitting exist
- Parametric methods, by which the parameters of the distribution are calculated from the data series. The parametric methods are – method of moments, method of L-moments and Maximum likelihood method
- Regression method, using a transformation of the cumulative distribution function so that a linear relation is found between the cumulative probability and the values of the data, which may also need to be transformed, depending on the selected probability distribution.
The chi-square test is used to test if a sample of data came from a population with a specific distribution. Another way of looking at that is to ask if the frequency distribution fits a specific pattern.
Assumptions – The data are obtained from a random sample. The expected frequency of each category must be at-least 5. This goes back to the requirement that the data be normally distributed. You’re simulating a multinomial experiment (using a discrete distribution) with the goodness-of-fit test (and a continuous distribution), and if each expected frequency is at least five then you can use the normal distribution to approximate.