Data Mesh: Understanding Its Applications, Opportunities, and Constraints 


Data has experienced a metamorphosis in its perceived value and management within the corporate sphere. Previously underestimated and frequently discarded, data was often relegated to basic reports or neglected due to a lack of understanding and governance. This limited vision, combined with emerging technologies, led to an overwhelming influx of data, and nowhere for it to go. There was little to no governance or understanding of what data they had, or how long they had it.  

In the early 2000s, enterprises primarily used siloed databases, isolated data sets with limited accessibility. The 2010s saw the rise of Data Warehouses, which brought together disparate datasets but often led to bottlenecks. Data Lakes emerged as a solution to store vast quantities of raw data and quickly became swamps without adequate governance. Monolithic IT and data engineering groups would struggle to document, catalog, and secure the growing stockpile of data. Product owners and teams that would want, or need access to data would have to request access and wait. Sometimes those requests would end up in a backlog and forgotten about.  

In this new dawn of data awareness, the Data Mesh emerges as a revolutionary concept, enabling organizations to efficiently manage, process, and gain insights from their data. As organizations realize data’s pivotal role in digital transformation, it becomes imperative to shift from legacy architectures to more adaptive solutions, making Data Mesh an attractive option.  

 

The Basics of a Data Mesh 

The importance of personalized customer experiences should not be understated. More than ever, consumers are faced with endless options. To stand out from competitors, businesses must use data and customer behavior insights to curate tailored and dynamic customer journeys that both delight and command their audience. Analyze purchasing history, demographics, web activity, and other data to understand your customer, as well as their likes and dislikes. Use these insights to design customized customer experiences that increase conversion, retention, and ultimately, satisfaction.  

When discussing data architecture concepts, the terms “legacy” or “traditional” imply centralized data management concepts, characterized by monolithic architectures developed and maintained by a data engineering organization within the company. Business units outside of IT would often feel left in the dark, waiting for the data team to address their specific needs and leading to inefficiencies. 

First coined in 2019, the Data Mesh paradigm is a decentralized, self-service approach to data architecture. There are four central principles that Data Mesh is based on: Domain ownership, treating data as a product, self-service infrastructure, and federated computational governance. 

With Data Mesh, teams (Domains) are empowered to own and manage their data (Product). This requires stewardship at the team level to effectively manage their own resources to ingest, persist and serve data to their end users. Data stewards are responsible for the quality, reliability, security, and accessibility of the data. Data stewards bridge the gap between decentralized teams and enterprise-level governance and oversight. 

While teams enjoy autonomy, chaos would ensue without a federated governance approach. This ensures standards, policies and best practices are followed across all product owners and data stewards.  

Implementing a Data Mesh requires significant investment in both infrastructure and enhancing teams with the resources and expertise required to manage their own resources. It requires a fundamental change in companies’ mindset of how they treat data.  

While a Lakehouse would aim to combine the best of Data Lakes and Data Warehouses, Data Mesh ventures further by decentralizing ownership and control of data. While Data Fabric focuses on seamless data access and integration across disparate sources, Data Mesh emphasizes domain-based ownership. On the other hand, event-driven architectures prioritize real-time data flow and reactions, which can be complementary to Data Mesh. 

data mesh decentralized architecture

When and Where to Implement Data Mesh 

  1. Large Organizations with Data Rich Domains: With large organizations, departments often deal with a deluge of data.  From Human Resources to Sales, each team has their own requirements for how their data is used, stored, and accessed. As teams consume more data, time to market and development efficiency suffer in centralized architectures. External resources and time constraints are often the biggest issue. By implementing Data Mesh, teams can work independently and take control of their data, increasing efficiency and quality. As a result, teams can optimize and enrich their product offering and cut costs by streamlining ELT/ETL processes and workflows. 

With direct control over their data, teams can tune and tailor their data solutions to better meet customer needs.  

  1. Complex Ecosystem: Organizations, especially those operating in dynamic environments with intricate interdependencies, often face challenges in centralized data structures. In such architectures, there’s limited control over resource allocation, utilization, and management, which can hinder teams from maximizing the potential of their data. Centralized approaches can curtail innovation due to rigid schemas, inflexible data pipelines, and lack of domain-specific customization. Data Mesh offers organizations the flexibility to adapt to evolving data needs and utilize domain-specific expertise to curate, process, and consume data tailored to their unique requirements. 
  1. Rapidly growing data environments: Today’s digital age sees organizations collecting data at an unprecedented scale. The sheer volume of data can be overwhelming with the influx of IoT devices, vendor integrations, user interactions, and digital transactions. Centralized teams often grapple with scaling issues, processing delays, and the challenge of timely data delivery. Data Mesh addresses this by distributing the data responsibility across different domains or teams. Multiple decentralized units handle the influx as data inflow increases, ensuring timely processing and reducing system downtime. The result is a more resilient data infrastructure ready to meet both current demands and future needs. 

When Not to Implement Data Mesh 

  1. Small to Medium-sized Enterprises (SMEs): While Data Mesh presents numerous advantages, it may not be suitable for all organizations or projects. Smaller organizations typically handle lower data volumes and may not possess the resources needed to manage their data independently. In these cases, a centralized data architecture would be more suitable to minimize complications in design and maintenance with fewer resources to manage them. 
  1. Mature and Stable Centralized Architectures: Organizations usually only turn to new solutions when they are experiencing problems. If a well-established centralized architecture is performing and fitting the needs of the company, there isn’t a need necessarily for Data Mesh adoption. Introducing a fundamental change in how data is managed is an expensive and disruptive undertaking. Building new infrastructure and expanding team capabilities changing organizational culture takes time.  
  1. Short-term Projects: Implementing a Data Mesh requires significant time and resource investment. The benefits of a Data Mesh won’t be seen when building or designing a limited lifespan project or proof of concept. If a project’s duration doesn’t justify the investment of a Data Mesh or the scope doesn’t require domain-specific data solutions, then the benefits of a Data Mesh aren’t utilized. Traditional data architectures are usually more appropriate for these applications and don’t need the oversight/governance that a Data Mesh requires.

  

Opportunities Offered by Data Mesh 

  1. Scalability: Data Mesh enables organizations to scale their data processing capabilities more effectively by enabling teams to control how and when their data is processed, optimizing resource use and costs, and ensuring they remain agile amidst expanding data sources and consumer bases.  
  1. Enhanced Data Ownership: Treating data as a product rather than a byproduct or a secondary asset is revolutionary. By doing so, Data Mesh promotes a culture with a clear sense of ownership and accountability. Domains or teams that “own” their data are more inclined to ensure its quality, accuracy, and relevance. This fosters an environment where data isn’t just accumulated but is curated, refined, and optimized for its intended purpose. Over time, this leads to more prosperous, more valuable data sets that genuinely serve the organization’s needs. 
  1. Speed and Innovation: Decentralization is synonymous with autonomy. When teams have the tools and the mandate to manage their data, they are not bogged down by cross-team dependencies or bureaucratic delays. They can innovate, experiment, and iterate at a faster pace, resulting in expanded data collection and richer data sets. This agility accelerates data product development, enabling organizations to adapt to changing needs quickly, capitalize on new opportunities, and stay ahead of the curve in the competitive market.  
  1. Improved Alignment with Modern Architectures: Decentralization isn’t just a trend in data management; it’s a broader shift seen in modern organizational architectures, especially with the rise of microservices. Data Mesh naturally aligns with these contemporary structures, creating a cohesive environment where data and services coexist harmoniously. This alignment reduces friction, simplifies integrations, and ensures that the entire organizational machinery, services, and data operate in a unified, streamlined manner. 
  1. Enhanced Collaboration: As domains take ownership of their data, there’s an inclination to collaborate with other domains. This cross-functional collaboration fosters knowledge sharing, best practices, and a unified approach to data challenges, driving more holistic insights.

Constraints and Challenges 

  1. Cultural Shift: Teams may not want to own their own data or have the experience to take on the responsibility. Training initiatives, workshops, and even hiring external experts might be necessary to bridge these skill gaps. 
  1. Increased Complexity: Developing an environment that supports a Data Mesh architecture is not without its challenges. As the Data Mesh model expands, managing the growing number of interconnected resources and solving integration issues to ensure smooth communication between various domains can be a considerable obstacle. Planning appropriately to support teams with access, training and management of a Data Mesh is critical to its evolution and success. This includes well defined requirements for APIs, data exchange, and interface protocols. 
  1. Cost Implications: Transitioning to a Data Mesh could entail substantial upfront costs, including hiring additional resources, training personnel, investing in new infrastructure, and possibly overhauling existing systems. 
  1. Governance: Data Governance has become a hot topic as data architectures grow and mature. Ensuring a consistent view of data across all domains can be challenging, especially when multiple teams update or alter their datasets independently. Tools to manage integrity, security and compliance are a requirement in a Data Mesh architecture. The need for teams to have autonomy in a decentralized environment is balanced with a flexible but controlled governance model that is the foundation for federated governance. This can be a challenge when initially designing the model based on team requirements, but it’s an important step to take as early as possible when building a data platform.  

Skillset: Evolving with the Data Mesh Paradigm

With an evolved mindset, the Data Mesh paradigm demands expertise that may not have previously been cultivated within traditional data teams. This transition from central data lakes to domain-oriented data products introduces complexities requiring a deep understanding of the data and the specific use cases it serves, both internally and externally. Skills such as collaboration, domain-specific knowledge translation, and data stewardship become vital. As data responsibility becomes decentralized, each team member’s role becomes more critical in ensuring data integrity, relevance, and security. As data solutions evolve, teams must adopt a mindset of perpetual learning, keeping pace with the latest methodologies, tools, and best practices related to managing their data effectively. 

Embracing the Data Mesh

In the evolving landscape of data management, the Data Mesh presents a promising alternative to traditional architectures. It’s a journey of empowerment, efficiency, and decentralization. The burgeoning community support for Data Mesh, evident from the increasing number of case studies, forums, and tools developed around it, underscores its pivotal role in the future of data management. However, its success hinges on an organization’s readiness to embrace the cultural and operational shifts it demands. As with all significant transformations, due diligence, meticulous planning, and an understanding of the underlying principles are crucial for its fruitful adoption. Embracing the Data Mesh is more than just a technological shift; it’s a paradigm transformation. Organizations willing to make this leap will find themselves not just keeping up with the rapid pace of data evolution but leading the charge in innovative, data-driven solutions.  

Data Trends: Six Ways Data Will Change Business in 2023 and Beyond


By Kristina Scott

Data is big and getting bigger. We’ve tracked six major data-driven trends for the coming year.

Digital analytics data visualization, financial schedule, monitor screen in perspective

Data is one of the fastest-growing and most innovative opportunities today to shape the way we work and lead. IDC predicts that by 2024, the inability to perform data- and AI-driven strategy will negatively affect 75% of the world’s largest public companies. And by 2025, 50% of those companies will promote data-informed decision-making by embedding analytics in their enterprise software (up from 33% in 2022), boosting demand for more data solutions and data-savvy employees.

Here is how data trends will shift in 2023 and beyond:

  1. Data Democratization Drives Data Culture

If you think data is only relevant to analysts with advanced knowledge of data science, we’ve got news for you.  Data democratization is one of the most important trends in data. Gartner research forecasts that 80% of data-driven initiatives that are focused on business outcomes will become essential business functions by 2025.

Organizations are creating a data culture by attracting data-savvy talent and promoting data use and education for employees at all levels. To support data democratization, data must be exact, easily digestible, and accessible.

Research by McKinsey found that high-performing companies have a data leader in the C-suite and make data and self-service tools universally accessible to frontline employees.

2. Hyper-Automation and Real-Time Data Lower Costs

Real-time data and its automation will be the most valuable big data tools for businesses in the coming years. Gartner forecasts that by 2024, rapid hyper-automation will allow organizations to lower operational costs by 30%. And by 2025, the market for hyper-automation software will hit nearly $860 billion.

3. Artificial Intelligence and Machine Learning (AI & ML) Continue to Revolutionize Operations

The ability to implement AI and ML in operations will be a significant differentiator. Verta Insights found that industry leaders that outperform their peers financially, are more than 2x as likely to ship AI projects, products, or features, and have made AI/ML investments at a higher level than their peers.

AI and ML technologies will boost the Natural Language Processing (NLP) market. NLP enables machines to understand and communicate with us in spoken and written human languages. The NLP market size will grow from $15.7 billion in 2022 to $49.4 billion by 2027, according to research from MarketsandMarkets.

We have seen the wave of interest in OpenAI’s ChatGPT, a conversational language-generation software. This highly-scalable technology could revolutionize a range of use cases— from summarizing changes to legal documents to completely changing how we research information through dialogue-like interactions, says CNBC.

This can have implications in many industries. For example, the healthcare sector already employs AI for diagnosis and treatment recommendations, patient engagement, and administrative tasks. 

4. Data Architecture Leads to Modernization

Data architecture accelerates digital transformation because it solves complex data problems through the automation of baseline data processes, increases data quality, and minimizes silos and manual errors. Companies modernize by leaning on data architecture to connect data across platforms and users. Companies will adopt new software, streamline operations, find better ways to use data, and discover new technological needs.

According to MuleSoft, organizations are ready to automate decision-making, dynamically improve data usage, and cut data management efforts by up to 70% by embedding real-time analytics in their data architecture.

5. Multi-Cloud Solutions Optimize Data Storage

Cloud use is accelerating. Companies will increasingly opt for a hybrid cloud, which combines the best aspects of private and public clouds.

Companies can access data collected by third-party cloud services, which reduces the need to build custom data collection and storage systems, which are often complex and expensive.

In the Flexera State of Cloud Report, 89% of respondents have a multi-cloud strategy, and 80% are taking a hybrid approach.

6. Enhanced Data Governance and Regulation Protect Users

Effective data governance will become the foundation for impactful and valuable data. 

As more countries introduce laws to regulate the use of various types of data, data governance comes to the forefront of data practices. European GDPR, Canadian PIPEDA, and Chinese PIPL won’t be the last laws that are introduced to protect citizen data.

Gartner has predicted that by 2023, 65% of the world’s population will be covered by regulations like GDPR. In turn, users will be more likely to trust companies with their data if they know it is more regulated.

Valence works with clients to implement a governance framework, find sources of data and data risk, and activate the organization around this innovative approach to data and process governance, including education, training, and process development. Learn more.

What these data trends add up to

As we step into 2023, organizations that understand current data trends can harness data to become more innovative, strategic, and adaptable. Our team helps clients with data assessments, by designing and structuring data assets, and by building modern data management solutions. We strategically integrate data into client businesses, use machine learning and artificial intelligence to create proactive insights, and create data visualizations and dashboards to make data meaningful.  

We help clients to develop a solution and create a modern data architecture that supports differentiated, cloud-enabled scalability, self-service capability, and faster time-to-market for new data products and solutions. Learn more.

Additional resources:


Retail Technology and Innovation – a Conversation with Michael Guzzetta


We recently spent some time with Michael Guzzetta, a seasoned retail technology and innovation executive and consultant who has worked with brands such as The Walt Disney Company, Microsoft, See’s Candies, and H-E-B.

Tell me about your background. What brought you to retail?

Like many people, I launched my retail career in high school when I worked in the men’s department at Robinson’s May. I also worked for The Warehouse (music retailer) and was a CSR at Blockbuster video – strangely, I still miss the satisfaction of organizing tapes on shelves.

I ignited my tech career in 2001 when I started working in payment processing and cloud-based tech, and then I returned to retail in 2009 when I joined Disney Store North America, one of the world’s strongest retail brands.

During my tenure at Disney, I had the privilege of working at the intersection of creative, marketing, and mobile/digital innovation. And this is where the innovation bug bit me and kicked off my decades-long work on omnichannel innovation projects. I seek opportunities to test and deploy in-store technology to simplify experiences for customers and employees, increase sales, and drive demand. Since jump-starting this journey at Disney Store, I’ve also helped See’s Candies, Microsoft, and H-E-B to advance their digital transformation through retail innovation.

What are some of the retail technologies that got you started?

I’ve seen it all! I’ve re-platformed eCommerce sites, deployed beacons and push notifications, deployed in-store traffic counting, worked on warehouse efficiency, automated and integrated buyer journeys and omnichannel programs, and more. I recently built a 20k SF innovation lab space to run proofs-of-concept to validate tech, test, and deployment in live environments. Smart checkout, supply chain, inventory management, eCommerce… you name it.

What are the biggest innovation challenges in retail today?

Some questions that keep certain retailers up at night are, “How can we simplify the shopping experience for customers and make it easier for them to check out?”, “How can we optimize our supply chain and inventory operations?”, “How can we improve accuracy for customers shopping online and reduce substitutions and shorts in fulfillment?” and “How can we make it easier and more efficient for personal shoppers to shop curbside and home delivery orders?” Not to mention, “What is the future of retail, and which technologies can help us stay competitive?”

I see potential in several trends to address those challenges, but my top three are:

Artificial Intelligence/Machine Learning – AI will continue to revolutionize retail. It’s permeated most of the technology we use today, whether it’s SAAS or hardware, like smart self-checkout. You can use AI, computer vision, and machine learning to identify products and immediately put them in your basket. AI is embedded in our everyday lives – it powers the smart assistants we use daily, monitors our social media activity, helps us book our travel, and runs self-driving cars, among dozens of other applications. And as a subset of AI, Machine Learning allows models to continue learning and improving, further advancing AI capabilities. I could go on but suffice it to say that the retailer that nails AI first wins.

Computer vision. Computer vision has a sizable opportunity to solve inventory issues, especially for grocery brands. Today, there’s a gap between online inventory and what’s on the shelf since the inventory system can’t keep pace with what’s stocked and on the shelves for personal shoppers, which is frustrating for customers who don’t expect substitutions or out-of-stock deliveries. With the advent of computer vision cameras, you can combine those differences and see what is on the shelf in real-time to inform what is available online accurately. Computer vision-supported inventory management will be vital to creating a truly omnichannel experience. Computer vision also enables smart shopping carts, self-checkout kiosks, loss prevention, and theft prevention. Not to mention Amazon’s use of CV cameras with their Just Walk Out tech in Amazon Go, Amazon Fresh, and specific Whole Foods locations. It has endless applications for retail and gives you the eyes online that you can’t get in stores today.

Robotics. In the last five years, robotics has taken a seismic leap, and a shift has happened, which you can see in massive, automated fulfillment centers like those operated by Amazon, Kroger, and Walmart. A brand can deliver groceries in a region without having a physical store, thanks to robotic fulfillment centers and distribution centers. It’s a game-changer. Robotics has many functions beyond fulfillment in retail, but this application truly stands out.

What is a missed opportunity that more retail brands should take advantage of?

Data. Data is huge, and its importance can’t be understated. It’s a big, missed opportunity for retailers today. Improving data management, governance, and sanitation is a massive opportunity for retailers that want to innovate.

Key opportunity areas around data in retail include customer experience (know your customer), understanding trends related to customer buying habits, and innovation. You can’t innovate at any speed with dirty data.

There’s a massive digital transformation revolution underway among retailers, and they are trying to innovate with data, but they have so much data that it can be overwhelming. They are trying to create data lakes, a single source of truth, and sometimes they can’t work because of disparate data networks. I believe that some of the more prominent retailers will have their data act together in a few years.

“Dirty data” results from companies being around for a long time, so they’ve accrued multiple data sets and cloud providers, and their data hasn’t been merged and cleaned. If you don’t have the right data, you are making decisions based on bad or old data, which could hurt you strategically or literally.

What do you wish more people understood about retail technology and innovation?

Technology will not replace people. In my experience, technology is meant to enhance the human experience, which includes employees. If technology simplifies the process so much that the employees become idle, they are typically trained to manage the technology or cross-trained to grow their careers. Technology isn’t replacing the human experience any time soon, although it is undoubtedly changing the existing work experience – ideally for the better, both for the employees and the bottom line.

Technology doesn’t always lower costs for retailers. Hardware innovation requires significant capital expenses when it’s deployed chain-wide. Amazon’s “Just Walk Out” is impressive technology, but the infrastructure, cloud computing costs, and computer vision cameras are insanely expensive. In 5 years, that may be different, but today it is a loss leader. It’s worth it for Amazon because they can get positive press, demonstrate innovation, and show industry leadership. But Amazon has not lowered its operating costs with “Just Walk Out.” This is just one example, but there are many out there.

Online shopping will not eliminate brick-and-mortar shopping. If the pandemic has taught us anything, online shopping is here to stay – and convenience is extremely attractive to consumers. But I think people will never stop going to stores because people love shopping. The experience you get by tangibly picking something up and engaging with employees in a store location will always be around, even with the advent of the Metaverse.

What are some brands that excite you right now because of how they use technology?

Amazon. What they have been doing with Just Walk Out technology, dash carts, smart shelves, and other IoT technology puts Amazon at the front of the innovation pack. Let’s not forget that they’ve led the way in same or next-day delivery by innovating with their automated fulfillment centers! They have the desire, the resources, and the talent to be the frontrunner for years to come.

Alibaba. This Chinese company is another retailer that uses technology in incredible ways. Their HEMA retail grocery stores are packed with innovation and technology. They have IoT sensors across the stores, electronic shelf labels, facial recognition cameras so you can check out with your face, and robotic kitchens where your order is made and delivered on conveyor belts. They also have conveyors throughout the store, so a personal shopper can shop by zone, then hook bags to be carried to the wareroom for sortation and delivery prep – it’s impressive.

Walmart and Kroger. Both brands’ use of automated fulfillment centers (AFCs) and drone technology (among many others) are pushing the boundaries of grocery retail today. Their AFCs cast a much wider net and have expanded their existing markets, so, for example, we may see Kroger trucks in neighborhoods that don’t have a store in sight.

Home Depot. They have a smart app with 3D augmented reality and robust in-store mapping/wayfinding. Their use of machine learning is also impressive. For example, it helps them better understand what type of projects a customer might be working on based on their browsing and shopping habits.

Sephora. They use beacon technology to bring people with the Sephora app into the store and engage them. They have smart mirrors that help customers pick the right makeup for their skin tone and provide tutorials. Customers can shop directly through smart mirrors or work with an in-store makeup artist.

What advice do you have for retailers that want to invest in technology innovation?

My first piece of advice is to include change management in the project planning from the start.

There are inherent challenges in retail innovation, often due to change management issues. When a company has been around for decades or even more than a century, they operate with well-known, trusted, and often outdated infrastructure. While that infrastructure can’t uphold the company for the next several decades or centuries, there can be a fear of significant change and a deeply rooted preference for existing systems. There can be a fear of job loss because of the misconception that technology will replace people in retail.

Bring those change-resistant people into the innovation process early and often and invite them to be part of the idea generation. Any technology solution needs to be designed with the user’s needs in mind, and this audience is a core user group. Think “lean startup” approach.

My second piece of advice is to devote enough resources to innovation and give the innovation team the power to make decisions. The innovation team should still operate with lean resources, focusing on minimum viable products and proofs of concept, so failures aren’t cost-prohibitive. The innovation team performs best when it has the autonomy to test, learn, and fail as they explore innovative solutions. Then, it reports its findings and recommendations to higher-ups to calibrate and pivot where needed.

In closing, I’d say the key to innovation success is embracing the notion of failure. Failure has value! Put another way; failure is the fast track to learning. Learning what not to do and what to try next can help a retail company to accelerate faster than the competition. Think MVP, stay lean, get validated feedback quickly, and iterate until you have a breakthrough. And always maintain a growth mindset – never stop learning and growing.

Additional resources:


3 Reasons Companies Advance Their Data Journey to Combat Economic Pressure


By Danny Vally

Have you updated your organization’s data journey lately? We are living in the Zettabyte Era, because the volume, velocity, and variety of data assets being managed by companies are big and getting bigger.

Data is getting more complicated and siloed. Today’s data is more complex than the data a typical business managed just twenty years ago. Even small companies deal with large data sets from disparate sources that can be complicated to process. Each data set may have its own unique structure, size, query language, and type.

The types of data are also changing quickly. What used to be managed in spreadsheets now demands automated systems, machine data, social network data, IoT data, customer data, and more.

There are real economic advantages for companies that take advantage of the data opportunity by investing in digital transformation (often starting by moving data to the cloud). Companies that take control of data outperform the competition:

  • 40% more revenue per employee
  • 50% higher average net income on revenue
  • $100M in additional operating income annually

Common data journey scenarios that motivate data-driven investments include:

  • Understand and predict customer behavior in real-time
  • Cut costs and free up resources with simplified data analysis
  • Explore new business models by finding new relationships in data
  • Eliminate surprise and unnecessary expenses
  • Gather and unify data to better understand your business

A data strategy is more than a single tool, dashboard, or report. A mature data strategy for any business includes a roadmap to plan the company’s data architecture, migration, integration, and management. Building in governance planning to ensure data security, integrity, access, quality, and protection will empower a business to scale.

That roadmap may also include incorporating artificial intelligence and machine learning, which unleashes predictive analytics, deep learning, and neural networks. While these once were understood to be tools available only to the world’s largest businesses, AI and ML are actually being deployed at even small and midsized businesses, with much success.

We work with organizations throughout their data journey by helping to establish where they are, where they want to go, and what they want to achieve.

A data journey usually starts by understanding data sources and organizing the data. Many organizations have multiple data sources, so creating a common data store is an important starting point. Once the data is organized, we can harness insights from the data using reporting and visualization, which enables a real-time understanding of key metrics.  Ensuring data governance and trust in sharing data is another important step, which is often supported by security. Lastly, advanced data can use artificial intelligence and machine learning to look for data trends or predict behaviors and extract new insights. By understanding where your organization is in its data journey, you can begin to visualize its next step. 

Additional resources:


Data Mesh Architecture in Cloud-Based Data Warehouses


Data is the new black gold in business. In this post, we explore how shifts in technology, organization processes, and people are critical to achieving the vision for a data-driven company that deploys data mesh architecture in cloud-based warehouses like Snowflake and Azure Synapse.

The true value of data comes from the insights gained from data that is often siloed and spans across structured, semi-structured, and unstructured storage formats in terabytes and petabytes. Data mining helps companies to gather reliable information, make informed decisions, improve churn rate and increase revenue.

Every company could benefit from a data-first strategy, but without effective data architecture in place, companies fail to achieve data-first status.

For example, a company’s Sales & Marketing team needs data to optimize cross-sell and up-sell channels, while its product teams want cross-domain data exchange for analytics purposes. The entire organization wishes there was a better way to source and manage the data for its needs like real-time streaming and near-real-time analytics. To address the data needs of the various teams, the company needs a paradigm shift to fast adoption of Data Mesh Architecture, which should be scalable & elastic.

Data Mesh architecture is a shift both in technology as well as in organization, processes, and people.

Before we dive into Data Mesh Architecture, let’s understand its 4 core principles:

  1. Domain-oriented decentralized data ownership and architecture
  2. Data as a product
  3. Self-serve data infrastructure as a platform
  4. Federated computational governance

Big data is about Volume, Velocity, Variety & Veracity. The first principle of Data mesh is founded on decentralization and distribution of responsibility to the SME\Domain Experts who own the big data framework.  

This diagram articulates the 4 core principles of Data Mesh and the distribution of responsibility at a high level.

Azure: Each team is responsible for its own domain, and data is decentralized and shared with other domains for data exchange and data as a product.
Snowflake: Each team is responsible for its own domain, and data is decentralized and shared with other domains for data exchange and data as a product.

Each Domain data is decentralized in its own data warehouse cloud. This model applies to all data warehouse clouds, such as Snowflake, Azure Synapse, and AWS Redshift.  

A cloud data warehouse is built on top of a multi-cloud infrastructure like AWS, Azure, and Google Cloud Platform (GCP), which allows compute and storage to scale independently. These data warehouse products are fully managed and provide a single platform for data warehousing, data lakes, data science team and to provide data sharing for external consumers.

As shown below, data storage is backed by cloud storage from AWS S3, Azure Blob, and Google, which makes Snowflake highly scalable and reliable. Snowflake is unique in its architecture and data sharing capabilities. Like Synapse, Snowflake is elastic and can scale up or down as the need arises.

From legacy monolithic data architecture to more scalable & elastic data modeling, organizations can connect decentralized enriched and curated data to make an informed decision across departments. With Data Mesh implementation on Snowflake, Azure Synapse, AWS Redshift, etc., organizations can strike the right balance between allowing domain owners to easily define and apply their own fine-grained policies and having centrally managed governance processes.

Additional resources:


How to Develop a Data Retention Policy


by Steven Fiore

We help organizations implement a unified data governance solution that helps them manage and govern their on-premises, multi-cloud, and SaaS data. The data governance solution will always include a data retention policy.

When planning a data retention policy, you must be relentless in asking the right questions that will guide your team toward actionable and measurable results. By approaching data retention policies as part of the unified data governance effort, you can easily create a holistic, up-to-date approach to data retention and disposal. 

Steps to Creating an Effective Data Retention Policy

Ideally, any group that creates, uses, or disposes of data in any way will be involved in data planning. Field workers collecting data, back-office workers processing it, IT staff responsible for transmitting and destroying it, Legal, HR, Public Relations, Security (cyber and physical) and anyone in between that has a stake in the data should be involved in planning data retention and disposal.

Data Inventory

The first step is to understand what data you have today. Thanks to decades of organizational silos, many organizations don’t understand all the data they have amassed. Conducting a data inventory or unified data discovery is a critical first step.  

Review Data Retention Regulations

Next, you need to understand the requirements of the applicable regulation or regulations in your industry and geographical region so that your data planning and retention policy addresses compliance requirements. No matter your organization’s values, compliance is required and needs to be understood.

Recognize Your Data Risks

Then, businesses should identify where data retention may be costing the business or introducing risk. Understanding the risk and inefficiencies in current data processes may help identify what should be retained and for how long, and how to dispose of the data when the retention expires.

If the goal is to increase revenue or contribute to social goals, then you must understand which data affords that possibility, and how much data you need to make the analysis worthwhile. Machine Learning requires massive amounts of data over extended periods of time to increase the accuracy of the learning, so if machine learning and artificial intelligence outcomes are key to your revenue opportunity, you will require more data than you would need to use traditional Business Intelligence for dashboards and decision making.

data retention policy

What Types of Data Should be Included in the Data Retention Policy?

The types of data included in the data retention policy will depend on the goals of the business. Businesses need to be thoughtful about what data they don’t need to include in their policies. Retaining and managing unneeded data costs organizations time and money – so identifying the data that can be disposed of is important and too often overlooked.

Businesses should consider which innovation technologies are included in their digital roadmap. If machine learning, artificial intelligence, robotic process automation, and/or intelligent process automation are in your technology roadmap, you will want a strategy for data retention and disposal that will feed the learning models when you are ready to build them.  Machine learning could influence data retention policies, Internet of Things can impact what data is included since it tends to create enormous amounts of data. Robotic or Intelligent Process Automation is another example where understanding which data is most essential to highly repeatable processes could dictate what data is held and for how long.

One final note is considering non-traditional data sources and if they should be included. Do voice mails or meeting recordings need to be included? What about pictures that may be stored along with documents? Security camera footage? IoT or server logs? Metadata? Audit trails? The list goes on, and the earlier these types of data are considered, the easier they will be to manage.

Common Data Retention Strategy Pitfalls

The paradox is that the two biggest mistakes organizations make when building a data retention policy are either not taking enough time to plan or taking too much time to plan. Spending too much time planning can lead to analysis paralysis letting a data catastrophe occur before a solution can be implemented. One way to mitigate this risk is to take an iterative approach so you can learn from small issues before they become big ones.

A typical misstep by organizations when building a data retention policy is that they don’t understand their objectives from the onset. Organizations need to start by clearly stating the goals of their data policy, and then build a policy that supports those goals. We talked about the link between company goals and data policies here.

One other major pitfall organizations fall into when building a data retention policy is that they don’t understand their data, where it lives, and how its interrelated. Keeping data unnecessarily is as bad as disposing of data you need – and in highly silo-ed organizations, data interdependencies might not surface until needed data is suddenly missing or data that should have been disposed of surfaces in a legal discovery. This is partially mitigated by bringing the right people to the planning process so that you can understand the full picture of data implications in your organization.

Data Retention Policy Solutions by Kopius

The future of enterprise effectiveness is driven by advanced data analytics and insights. Businesses of all sizes are including data strategies in their digital transformation roadmap, which must include data governance, data management, business planning and analysis, and intelligent forecasting. Understand your business goals and values, and then build the data retention policies that are right for you.

We are here to help. Contact us today to learn more about our services.

Additional Resources:

The Right Data Retention Policy for Your Organization


by Steven Fiore

Every business needs a strategy to manage its data, and that strategy should include a plan for data retention. Before setting a data retention policy, it’s important to understand the purpose of the policy and how it can contribute to organizational goals. 

There are four values that drive most businesses to do anything:  

  • To make money and increase revenue
  • To save money by decreasing costs
  • Because they must comply with regulations
  • Because they want to use the business as a platform for social good

While each of these values will be represented in any organization, some investigation will usually reveal that one or two of these values outshine the rest. Which values are most important will vary from one organization to another. 

Organizations need to start by clearly stating the goals of their data policy, and then build a policy that supports those goals. We help companies unearth business drivers so data policies can contribute to the company values and goals rather than compete with them. 

In this post, we explore best practices in establishing and maintaining a data retention policy through the lens of these business drivers.  

What are the goals of your data retention policy?

Value: Make Money

Companies that rely on advertising revenue like Google and Facebook want to keep as much data as necessary to maximize revenue opportunities.  

Companies that mine their data can spot trends in their data that inform product enhancements, improve customer experience (driving brand loyalty), and reveal revenue opportunities that would have otherwise been hidden. 

In both cases, the data retention policy should focus on what data can contribute to revenue, and how much of it is needed. Balancing aggregate data versus more granular data is the key so you retain enough data to achieve your objectives without retaining unneeded data that adds cost, complexity, and security or privacy risks.   

Value: Save Money

Many businesses focus on the bottom line and prioritize efficiency to avoid wasting time, money, and energy. 

Businesses that want to save money can use data retention to make the organization more efficient. While data storage is inexpensive, it isn’t free – and access can be more expensive than storage. So, for an organization that wants its data policies to help save money, the policy might focus on retaining only the data that is necessary to avoid extra storage and management overhead. 

Further, retaining more data than you need to can be a legal liability. Having a data retention and disposal policy can reduce legal expenses in the event of a legal discovery process.  

There’s also an efficiency cost to data – the more data you have, the slower the process will be to search and use that data. So, data retention policies can and should be part of a data governance strategy aimed at making the data that is retained as efficient to manage and use as possible. 

Value: Comply with Regulations

Many industries have their own regulations while some regulations cross industries. Businesses that must have a data retention policy may need it to comply with laws that govern data retention such as the Sarbanes Oxley Act, the Health Insurance Portability and Accountability Act (HIPAA), or IRS 1075. Even US-based companies may be subject to international legislation such as the European General Data Protection Regulation (GDPR), and companies that have customers in California need to understand how the California Consumer Privacy Act (CCPA) can impact data retention. Government agencies in the US are also bound by the Freedom of Information Act and some states have “Sunshine” laws that go even further.  

Businesses that are motivated to comply with regulations will need their data retention policy to reflect federal, state, and local requirements, and will need to document compliance with those requirements. 

Value: Business as a Platform for Social Good

 Whether an organization was established as an activist brand or has been drawn to social responsibility as investor demand has risen social responsibility, many companies are finding ways to use data to understand their social and environmental impact.  This impact is often also reported on through Environmental Social Governance (ESG) reporting, Carbon Disclosure Projects, and reporting structures like GRESB (Global Real Estate Sustainability Benchmark). 

In these cases, organizations that use their business as a platform for social good, may identify key metrics such as energy consumption or hiring data that can be used to inform reports on social responsibility.  

In closing

By understanding your organization’s values and priorities, you can ensure that its policies support those values. Every company has data to collect, manage, and dispose of, so it’s critical to have a roadmap for how to address data requirements today and into the future. This framework is a starting point to that effort because there’s nothing worse than going through the effort to implement a complex policy, only to discover that it moves the business further from its goals.  

Additional resources: