Hello everyone, as a statistician, I can say that most statisticians dream of becoming a data miner but the road to be followed for this is long and bumpy. According to Google Trends data, “Data mining” and “Data Miner” searches in Google Web Search are very popular around the world. So what makes data mining so attractive?
Currently, the multiplicity of data and the difficulty of using the information required after processing data has increased the need for data mining.
Data mining is an automatic or semi-automated technical process used to analyze and interpret large amounts of dispersed information and turn it into information. Data mining is frequently used in marketing, retail, banking, healthcare, and e-commerce application areas.
Stages of Data Mining
We can basically consider the data mining process is:
- Obtain and secure the data stack
- Data Reduction
- Applying Related Data Mining Algorithms
- Testing and training results in related software languages (R, Python, Java)
- Evaluation and presentation of results
To become a data miner requires programming, mathematics, statistics, machine learning, and some personal skills. Let’s examine these requirements in a little more detail together.
- Algorithmic approach
- Programming logic
- Big data technologies(Spark, Hive, Impala, DBS, etc.)
- SQL(databases), NoSQL, Bash Script, R, Python, Scala, SPSS, SAS, MATLAB, etc.
- Cloud technologies (AWS, Google Cloud, Microsoft Azure, IBM, etc.)
2)Statistical Learning (SL):
- Tidy data process and data preprocessing
- Regression Models
- Linearity and causality
- Inference Statistics
- Multivariate Statistical Methods
- Association Rule Learning
- Text Mining, NLP
- Reinforcement Learning
- Deep Learning
- Being Able To Ask The Right Questions
- Analytical Perspective
- Problem Solving Ability
- Storytelling and presentation ability
As a result, we talked briefly about the definition, stages, and requirements of data mining in this blog. Hope to see you in our next blog.