Data Mining

Data Mining

London, 24th April 2016

Word Cloud Data Mining


Data pours in at unprecedented speeds and volumes from everywhere. But making fact-based decisions is not totally dependent on the amount of data we have. Actually, having so much data can be a paralyzing factor sometimes as it is hard to know where do we even begin?

So, success will depend on how quickly insights can be discovered from all that data and use those insights to drive better actions across the entire organization.

So much data and multitudes of possible decisions! It seems that the majorities of organizations everywhere struggle with this dilemma. The data is growing, but what about the ability to make decisions based on those huge volumes of data? Is that growing too? For many, unfortunately, the answer is no.

That’s where predictive analytics, data mining, machine learning and decision management come into play.

Predictive analytics helps assess what will happen in the future.

Data mining looks for hidden patterns in data that can be used to predict future behaviour.

Businesses, scientists and governments have used this approach for years to transform data into proactive insights.

Decision management turns those insights into actions that are used in your operational processes.

So while the same approaches can still be applied today – they need to happen faster and at a larger scale, using the most modern techniques available.

Forward-thinking organizations, like Facebook, Wall-Mart,  Amazon, Pfizer  use data mining and predictive analytics to detect fraud and cybersecurity issues, manage risk, anticipate resource demands, increase response rates for marketing campaigns, generate next-best offers, curb customer attrition and identify adverse drug effects during clinical trials, among many other things.

Because they can produce predictive insights from large and diverse data, the technologies of data mining, machine learning and advanced analytical modelling are essential for identifying the factors that can improve organizational performance and, when automated in everyday decisions, create competitive advantage. And with more of everything these days (data, computing power, business questions, risks and consumers), the ability to scale analytical power is essential for staying ahead of your competitors.

Deploying analytical insights quickly ensures that the timeliness of models is not lost due to slow processing of writing code. If you we can rapidly deploy an analytical models, the context and relevance of the models is not lost and competitive advantage is retained. So how do we create an environment that can help an organization to deal with all of the data being collected, all of the models being created and all of the decisions that need to be made, all at an increasing scale? The answer is an iterative analytical life cycle that brings together:

• Data – the foundation for decisions.

• Discovery – the process of identifying new insights in data.

• Deployment – the process of using newly found insights to drive improved actions.


Even though the majority of this blog is focused on using data mining for insights discovery, let’s see at the entire iterative analytical life cycle, because that’s what makes predictive discovery achievable and the actions from it more valuable.

  • Ask a business question. It all starts here. First we need a question to start the process. The discovery process is driven by asking business questions that produce innovation. This step is focused on exploring what need to be known, and how predictive analytics can be applied to the data to solve a problem or improve a process.
  • Prepare data. Collecting data certainly isn’t a problem these days – it’s streaming in from everywhere. Technologies like Hadoop and faster, cheaper computers have made it possible to store and use more data, and more types of data, than ever before. But there is still the issue of joining data in different forms and format from different sources and the need to transform raw data into data that can be used as input for data mining. Has been assessed that data scientists still spend much of their time, up to 90%, dealing with completeness of data.
  • Explore the data. Interactive, self-service visualization tools need to serve a wide range of users in an organization (from the business analyst with no analytical knowledge to a data scientist) to allow searches for relationships, trends and patterns to gain deeper understanding of the information captured by variables in the data. In this step, the hypothesis formed in the initial phase of the project should be refined and ideas on how to address the business problem from an analytical perspective are developed and tested.
  • Model the data. In this stage, the data scientist applies numerous analytical modelling algorithms to the data to find out a robust representation of the relationships in the data that help answers the business question. Analytical tools search for a combination of data and modelling techniques that reliably predict a desired outcome. Experimentation is key to finding the most reliable answer, and automated model building can help minimize the time to results and boost the productivity of analytical teams. In the past, with manual model-building tools, data miners and data scientists were able to create several models in a week or month. Today, they can create hundreds or even thousands. But how can they quickly and reliably find the one model (out of many) that performs best? With automated tournaments of machine-learning algorithms and a clearly defined champion model, this has become a fairly easy process. Analysts and data scientists can now spend their time focusing on more strategic questions and investigations.
  • Implement the models. Here there is the transition from the discovery phase to deployment phase  – taking the insights learned and putting them into action using repeatable, automated processes. The faster the business can use the answers generated by predictive analytics for better decision making, the more value will be generated. And, a transparent process is important for everyone – especially auditors.

Act on the new information. There are two types of decisions that can be made based on analytical results. Strategic decisions are made by humans who look at results and take action. Operational decisions are often automated – like credit scores or recommended best offers – and require a very little human intervention, if not none.

Evaluate your results. The next – and perhaps most important – step is to evaluate the outcome of the actions produced by the analytical model. Did the predictive models produce tangible results, such as increased revenue or decreased costs? With continuous monitoring and measurement of the models’ performance, success can be evaluated making sure they continue to produce the desired results.

More and more organizations are looking to automate operational decisions and provide real-time answers and results to reduce decision latencies. Basing operational decisions on answers from analytical models also makes decisions more objective, repeatable and measurable. The integration with enterprise decision management tools enables organizations to build comprehensive and complete operational decision flows that combine data-driven analytics and business rules for optimal automated decisions.

Ask again. Because the data is always growing and continuosly changing, relationships in data that  models use for predictions also change over time. Constant evaluation of analytical results should identify the degradation of model accuracy. Even the most accurate models will have to be refreshed over time, and organizations will need to go through the discovery and deployment steps again. It’s a constant and evolving process.