 | Shattering: Encyclopedia - Shattering
Shattering
The concept of shattering of a set of points plays an important role in Vapnik Chervonenkis theory, also known as VC-theory. Shattering and VC-theory are used in the study of empirical processes as well as in statistical computational learning theory.
Shattering - Definition
A class of sets, C, shatters a set A if and only if, for all there exists some such that , that is, if and only if
For example, the class C of all discs in the plane (two-dimensional space) cannot shatter every set F of four points, yet the class of all convex sets in the plane shatters every finite set. (For the collection of all convex sets, connect the dots!)
We employ the letter C to refer to "class" or "collection" of sets, as in a Vapnik-Chervonenkis class (VC-class). The set A is often assumed to be finite, because, in empirical processes, we are interested in the shattering of finite sets of data points.
Shattering plays a role in the probability theory; hence in statistics. In probability theory, the cumulative distribution function (cdf), often called simply the distribution function, is of great importance, and is defined as
F ( x ) = P { Y ≤ x },
where P is the measure of probability andY is a random variable associated with the cdf F(x). In other words, F(x) represents the probability that we shall observe a value less than or equal to the quantity x when we perform an experiment. An experiment can be anything that involves a measure of uncertainty, such as throwing dice, playing blackjack, or measuring the effects of medication. When playing blackjack, for example, we might be interested in the event that dealer has a value less than or equal to 17; in this case, x = 17. Since we are interested in the event that our observation falls below a value x, we want to study sets on the real line of the form { v : v ≤ x }, that is, sets of values that are less than or equal to x. Let C be the collection of all such sets on the real line, that is, of the form , { v : v ≤ x } for all real numbers x. This class C cannot shatter a set with more than one point.
To quantify the richness of a collection C of sets, we use the concept of shattering coefficients (also known as shatter coefficients). For a collection C of sets Ω, Ω being any space, often a probability space, we define: The nth shattering coefficient of C as
where card denotes the cardinality, that is the number of elements of a set.
SC(n) is equal to the largest number of subsets of any set A of n points that can be formed by intersecting A with the sets in collection C.
Other related archivesVapnik Chervonenkis theory, cardinality, class, computational learning theory, convex sets, cumulative distribution function, discs, finite, if and only if, plane, probability space, probability theory, random variable, sets, statistics
 Adapted from the Wikipedia article "Shattering", under the G.N U Free Docmentation License. Please also see http://en.wikipedia.org/wiki |