Data mining is a process that allows businesses and other organizations to use the information consumers provide to reveal more than information than they might realize.
Data mining is used to simplify and summarize data in a manner that can be understood, and then allow us to infer things about specific cases based on the patterns we have observed. Of course, specific applications of data mining methods are normally tailored for specific needs and goals. However, there are a few main types of pattern detection that are frequently used.
Anomaly detection: In a large data set it is possible to get a picture of what the data tends to look like in a typical case. Statistics can be used to determine if something is notably different from this pattern.
Association learning: This is the type of data mining that drives the Amazon recommendation system. For instance, this might reveal that customers who bought a cocktail shaker and a cocktail recipe book also often buy martini glasses. These types of findings are often used for targeting advertising.
Cluster detection: One type of pattern recognition that is particularly useful is recognizing distinct clusters or sub-categories within the data. Without data mining, an analyst would have to look at the data and decide on a set of categories which they believe captures the relevant distinctions between apparent groups in the data. This would risk missing important categories. With data mining it is possible to let the data itself determine the groups.
Classification: If an existing structure is already known, data mining can be used to classify new cases into these pre-determined categories. Learning from a large set of pre-classified examples, algorithms can detect persistent systemic differences between items in each group and apply these rules to new classification problems.
Regression: Data mining can be used to construct predictive models based on many variables. Facebook, for example, might be interested in predicting future engagement for a user based on past behavior. Factors like the amount of personal information shared, number of photos tagged, friend requests initiated or accepted, comments, likes etc. could all be included in such a model.
Data mining, in this way, can grant immense inferential power. If an algorithm can correctly classify a case into known category based on limited data, it is possible to estimate a wide-range of other information about that case based on the properties of all the other cases in that category. This may sound dry, but it is how most successful Internet companies make their money and from where they draw their power.