Can CRISP-DM and Scrum Unite for Agile Data Science Success?

December 13, 2024

Over the past two years, artificial intelligence has leaped from the confines of research and development labs—where data science experts crafted powerful yet often unheralded solutions—to the forefront of every product conversation. To truly excel in building intelligent products, we must critically evaluate our methodologies to get there. Product development teams’ use of AI surged from 40 percent in 2023 to a staggering 78 percent in 2024, according to Forbes. There’s a significant difference, however, between using AI and mastering the art of building AI-infused products.

Traditional product development has long been engineering-centric, frequently overlooking the pivotal role of data science. Data scientists often find themselves at odds with agile purists, more specifically Scrum purists, as the exploratory nature of data science can be at odds with Scrum’s structured sprints and predictable cycles. If you believe that the future hinges on artificial intelligence and solutions powered by data science, then it’s clear that Scrum needs an infusion to stay relevant, or perhaps to be replaced altogether. Could a hybrid approach, blending Scrum with CRISP-DM, be the answer to this conundrum?

Business Comprehension

The first phase of the CRISP-DM process, business comprehension, emphasizes a clear grasp of the project’s goals and requirements from a business standpoint. This ensures that data scientists align their efforts with organizational objectives, laying a solid groundwork for the project. An accurate understanding of business needs helps define the scope of data science endeavors and ensures that all stakeholders are on the same page. The outcome of this phase forms the foundation for the subsequent steps and drives the project’s direction.

Establishing a shared vision between data scientists and business stakeholders is critical during this phase. Without a coherent understanding, the project faces the risk of misalignment, leading to ineffective outcomes. Business comprehension involves collaborative discussions, extensive documentation, and creating a structured roadmap that everyone can follow. Teams must thoroughly understand business goals to avoid any missteps that could derail the project. The better the initial understanding, the more effectively the data science team can tailor their approach to meet those goals, setting the stage for success.

Data Comprehension

In the data comprehension phase, the focus shifts to gathering data and familiarizing the team with its intricacies. This involves exploratory data analysis to uncover initial insights, assess data quality, and identify underlying patterns or anomalies. This phase plays a pivotal role in preparing the data for subsequent stages. Crucial data quality assessments and recognizing any existing data limitations happen here, equipping teams to make informed decisions as they move forward.

Effective data comprehension requires diligence and a keen analytical eye. By diving deeply into data sets, data scientists can unlock key patterns that inform the upcoming modeling and analysis phases. The insights drawn during this stage enable teams to anticipate potential challenges and adjust their strategies accordingly. Being meticulous in data comprehension not only ensures the accuracy of the analysis but also helps in setting realistic expectations. Teams work together to annotate these initial findings and use them to guide the project’s next phases, forming a critical base for the entire project.

Data Preparation

The data preparation phase is likely the most time-consuming step in the CRISP-DM process. It involves cleaning and transforming raw data into a suitable format for modeling. In this stage, issues like missing values, outliers, and data normalization are addressed, which are critical for the success of subsequent modeling efforts. A well-prepared dataset is essential to ensure that the modeling phase proceeds smoothly without any significant hurdles.

Addressing data quality issues is crucial for the project’s overall success. Data preparation may include techniques such as data cleaning, integration, and transformation. These processes help to remove inconsistencies and enhance data quality, creating a robust dataset. Data scientists must meticulously vet each aspect of the data to avoid skewed results that could misinform the project’s trajectory. This step, albeit tedious, forms the backbone of the project’s analytical efforts, enabling accurate and reliable analysis in later stages. By dedicating ample time and resources to data preparation, teams can avoid potential pitfalls that may emerge during modeling.

Modeling

During the modeling phase, the team selects and applies various modeling techniques using the prepared data. This experimental phase may involve trying multiple algorithms, tuning parameters, and iteratively refining models to improve performance. Modeling is at the heart of data science, where theoretical concepts meet practical application. The success of this phase depends largely on the quality of the data prepared earlier and the expertise of the data science team.

Effective modeling necessitates a blend of creativity and technical rigor. Data scientists experiment with different approaches, continuously seeking the best fit for their data. Open communication and iterative feedback are essential in this phase, allowing the team to refine their models for optimal performance. Models are rigorously tested and evaluated to ensure they meet the project’s objectives and align with the overall business goals. This rigorous approach to modeling facilitates the discovery of the most effective solutions, laying the groundwork for successful deployment and application.

Assessment

Before deployment, you must rigorously evaluate the models to ensure they meet the business objectives established in the first phase. This phase includes validating model performance, assessing whether all critical business issues have been sufficiently addressed, and determining the next steps. Comprehensive evaluation ensures that the models are robust, reliable, and ready for deployment in real-world scenarios.

Assessment plays a critical role in mitigating risks and ensuring the solution’s effectiveness. Evaluation involves cross-validation, performance metrics analysis, and stakeholder review to confirm that the model meets the established criteria. Only after passing these evaluations should a model be considered ready for deployment. Stakeholders are engaged once more to verify that the project aligns with the initial business goals. This phase ensures a seamless transition into deployment, avoiding any last-minute surprises and ensuring all parties are prepared for the final implementation. Rigorous assessment is a safeguard for maintaining the project’s integrity and value.

Implementation

In the past two years, artificial intelligence (AI) has moved rapidly from research labs into the spotlight of product discussions. Data science experts were previously crafting powerful but often unrecognized solutions. To truly excel in creating intelligent products, evaluating our methods critically is essential. According to Forbes, AI usage in product development teams surged from 40 percent in 2023 to 78 percent in 2024. However, there’s a significant gap between merely using AI and mastering the development of AI-driven products.

Traditionally, product development has been heavily engineering-centric, often neglecting the crucial role of data science. Data scientists frequently clash with agile purists—particularly Scrum purists—because the exploratory nature of data science doesn’t always align well with Scrum’s structured sprints and predictable cycles. If you believe that the future depends on AI and data science solutions, it’s clear that Scrum needs either an upgrade or a complete rethinking. A hybrid approach, blending Scrum with CRISP-DM methods, might just be the solution we need to bridge this gap.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later