statistical scattering

So taking huge data-sources and looking at it is not enough to have an idea what values we really got here. So most of the times we need to break it down to just one or at least less values that we can check out to have an idea what data-source-values we got. So every one of this “less-values” that we want to see to have an idea about the complete data-values is called a parameter in the statistic. Every parameter is like an attribute of the data-source-values. The most common parameter is the average.

sum(x_i) / x_count

But we can not see how the scattering is. For this the easiest way to describe the scattering is by taking the max-value and the min-value and subtracting one by the other. This is called Range (R):

R = x_max – x_min

But this just takes 2 of all values and this is bad. We want to express the scattering of all values. So the next idea would be to take every singe value and compare it to the average. After this we could sum all these values. But by doing this we could get to zero in sum because it is the difference between the average. So to have the real range we also need to use amount strokes. At the end we divide this by the count of the values. This is called: Middle linear difference

sum(|x – x_average|) / x_count

To go further it is easier to calculate this by ^2 instead of amount strokes because you avoid of many case-differences. This is called variance:

sum((x – x_average)^2) / x_count

So many times we will get a higher value with this as the range is which does not make to much sense at all so we have to take the square root. This is called std-difference:

squareroot(sum((x – x_average)^2) / x_count)

However we still can improve her because just with this value we can not see the average. And it is a clear difference about having a std-difference of 1k with an average of 1k or having a std-difference of 1k with an average of 100k. So So handle this we just make sure we use the value more relatively by dividing std-difference by average. This is called variance-coefficient:

squareroot(sum((x – x_average)^2) / x_count) / sum(x_i) / x_count

Advertisements