INTRODUCTION TO DATA MINING
By Adesile Ajisafe, PhD CEng MIMechE
In today’s data-driven world where an enormous amount of data is available in different industries and organisations. The availability of huge quantity of data is of no use unless it is transformed into valuable information. Otherwise, we are sleeping on data, but starving ourselves of knowledge. The solution to this problem is data mining which is the extraction of useful information from the huge amount of data. Data mining involves making new patterns with massive datasets using machine learning, statistics, and other database systems to generate new insights about the data.
Use of data Mining
Data mining can be used to automate the process of finding predictive information in large databases. Questions that required extensive hands-on analysis can now be answered from the data. Targeted marketing is a typical example of predictive marketing. As we also use data mining on past promotional mailings. That is to identify the targets to maximize return on investment in future mailings. Data mining can also be for the discovery of previously hidden patterns. For example, it could be used in the analysis of retail sales data to identify unrelated products that often purchase together. Data mining can be applied to the following applications:
- Weather forecasting.
- Self-driving cars.
- Fraud detection.
- Stock trade analysis.
- Business forecasting.
- Social networks.
Steps involved in data mining
Business understanding- Understanding every aspect of the business objectives and needs.
Data understanding- Choosing best data set from where information can be extracted.
Data preparation- Clean, construct, and formatted in the desired form when data set have been identified.
Data modelling- Data is remodelled according to user’s requirements and carefully assessed to ensure created models meet business initiatives.
Evaluation- Check for any possible fault and data leakage.
Deployment- Presentation of knowledge and making it accessible to stakeholders.
Techniques used in data mining
Cluster analysis-Cluster Analysis enables us to identify a given user group according to standard features in a database. These features could include age, geographic location, education level, and so on.
Anomaly detection- It is used to determine when something is noticeably different from the regular pattern. It is used to eliminate any database inconsistencies or anomalies at the source.
Regression Analysis- used to make predictions based on relationships within the data set. For example, one can predict a particular product’s stock rate by analysing the past quality and considering the different factors that determine the stock rate.
Classification- This approach normally uses a training set where all objects are already associated with known class labels. The classification algorithm learns from the training set and builds a model. The model is used to classify new objects. Classification can be used to analyse which things tend to occur together either in pairs or larger groups.
Data Mining Pros and Cons
- Predict future trends, customer purchase habits
- Help with decision making
- Improve company revenue and lower costs
- Market basket analysis
- Fraud detection
- User privacy/security
- Amount of data is overwhelming
- Great cost at an implementation stage
- Possible misuse of information
- The possible inaccuracy of data
The biggest impediment to effective data mining is poor data quality, such as incomplete data, missing or incorrect values, poor representation in data sampling, or noisy data. It can also be immensely difficult to integrate conflicting or redundant data from multiple sources and forms, such as combining structured and unstructured data. Nevertheless, data mining helps deliver tremendous insights for businesses into the problems they face and aids in identifying new opportunities. It further helps businesses to solve more complex problems and make smarter decisions.