Data Mesh Architecture in Cloud-Based Data Warehouses


Data is the new black gold in business. In this post, we explore how shifts in technology, organization processes, and people are critical to achieving the vision for a data-driven company that deploys data mesh architecture in cloud-based warehouses like Snowflake and Azure Synapse.

The true value of data comes from the insights gained from data that is often siloed and spans across structured, semi-structured, and unstructured storage formats in terabytes and petabytes. Data mining helps companies to gather reliable information, make informed decisions, improve churn rate and increase revenue.

Every company could benefit from a data-first strategy, but without effective data architecture in place, companies fail to achieve data-first status.

For example, a company’s Sales & Marketing team needs data to optimize cross-sell and up-sell channels, while its product teams want cross-domain data exchange for analytics purposes. The entire organization wishes there was a better way to source and manage the data for its needs like real-time streaming and near-real-time analytics. To address the data needs of the various teams, the company needs a paradigm shift to fast adoption of Data Mesh Architecture, which should be scalable & elastic.

Data Mesh architecture is a shift both in technology as well as in organization, processes, and people.

Before we dive into Data Mesh Architecture, let’s understand its 4 core principles:

  1. Domain-oriented decentralized data ownership and architecture
  2. Data as a product
  3. Self-serve data infrastructure as a platform
  4. Federated computational governance

Big data is about Volume, Velocity, Variety & Veracity. The first principle of Data mesh is founded on decentralization and distribution of responsibility to the SME\Domain Experts who own the big data framework.  

This diagram articulates the 4 core principles of Data Mesh and the distribution of responsibility at a high level.

Azure: Each team is responsible for its own domain, and data is decentralized and shared with other domains for data exchange and data as a product.
Snowflake: Each team is responsible for its own domain, and data is decentralized and shared with other domains for data exchange and data as a product.

Each Domain data is decentralized in its own data warehouse cloud. This model applies to all data warehouse clouds, such as Snowflake, Azure Synapse, and AWS Redshift.  

A cloud data warehouse is built on top of a multi-cloud infrastructure like AWS, Azure, and Google Cloud Platform (GCP), which allows compute and storage to scale independently. These data warehouse products are fully managed and provide a single platform for data warehousing, data lakes, data science team and to provide data sharing for external consumers.

As shown below, data storage is backed by cloud storage from AWS S3, Azure Blob, and Google, which makes Snowflake highly scalable and reliable. Snowflake is unique in its architecture and data sharing capabilities. Like Synapse, Snowflake is elastic and can scale up or down as the need arises.

From legacy monolithic data architecture to more scalable & elastic data modeling, organizations can connect decentralized enriched and curated data to make an informed decision across departments. With Data Mesh implementation on Snowflake, Azure Synapse, AWS Redshift, etc., organizations can strike the right balance between allowing domain owners to easily define and apply their own fine-grained policies and having centrally managed governance processes.

Additional resources:


How to Develop a Data Retention Policy


by Steven Fiore

We help organizations implement a unified data governance solution that helps them manage and govern their on-premises, multi-cloud, and SaaS data. The data governance solution will always include a data retention policy.

When planning a data retention policy, you must be relentless in asking the right questions that will guide your team toward actionable and measurable results. By approaching data retention policies as part of the unified data governance effort, you can easily create a holistic, up-to-date approach to data retention and disposal. 

Steps to Creating an Effective Data Retention Policy

Ideally, any group that creates, uses, or disposes of data in any way will be involved in data planning. Field workers collecting data, back-office workers processing it, IT staff responsible for transmitting and destroying it, Legal, HR, Public Relations, Security (cyber and physical) and anyone in between that has a stake in the data should be involved in planning data retention and disposal.

Data Inventory

The first step is to understand what data you have today. Thanks to decades of organizational silos, many organizations don’t understand all the data they have amassed. Conducting a data inventory or unified data discovery is a critical first step.  

Review Data Retention Regulations

Next, you need to understand the requirements of the applicable regulation or regulations in your industry and geographical region so that your data planning and retention policy addresses compliance requirements. No matter your organization’s values, compliance is required and needs to be understood.

Recognize Your Data Risks

Then, businesses should identify where data retention may be costing the business or introducing risk. Understanding the risk and inefficiencies in current data processes may help identify what should be retained and for how long, and how to dispose of the data when the retention expires.

If the goal is to increase revenue or contribute to social goals, then you must understand which data affords that possibility, and how much data you need to make the analysis worthwhile. Machine Learning requires massive amounts of data over extended periods of time to increase the accuracy of the learning, so if machine learning and artificial intelligence outcomes are key to your revenue opportunity, you will require more data than you would need to use traditional Business Intelligence for dashboards and decision making.

data retention policy

What Types of Data Should be Included in the Data Retention Policy?

The types of data included in the data retention policy will depend on the goals of the business. Businesses need to be thoughtful about what data they don’t need to include in their policies. Retaining and managing unneeded data costs organizations time and money – so identifying the data that can be disposed of is important and too often overlooked.

Businesses should consider which innovation technologies are included in their digital roadmap. If machine learning, artificial intelligence, robotic process automation, and/or intelligent process automation are in your technology roadmap, you will want a strategy for data retention and disposal that will feed the learning models when you are ready to build them.  Machine learning could influence data retention policies, Internet of Things can impact what data is included since it tends to create enormous amounts of data. Robotic or Intelligent Process Automation is another example where understanding which data is most essential to highly repeatable processes could dictate what data is held and for how long.

One final note is considering non-traditional data sources and if they should be included. Do voice mails or meeting recordings need to be included? What about pictures that may be stored along with documents? Security camera footage? IoT or server logs? Metadata? Audit trails? The list goes on, and the earlier these types of data are considered, the easier they will be to manage.

Common Data Retention Strategy Pitfalls

The paradox is that the two biggest mistakes organizations make when building a data retention policy are either not taking enough time to plan or taking too much time to plan. Spending too much time planning can lead to analysis paralysis letting a data catastrophe occur before a solution can be implemented. One way to mitigate this risk is to take an iterative approach so you can learn from small issues before they become big ones.

A typical misstep by organizations when building a data retention policy is that they don’t understand their objectives from the onset. Organizations need to start by clearly stating the goals of their data policy, and then build a policy that supports those goals. We talked about the link between company goals and data policies here.

One other major pitfall organizations fall into when building a data retention policy is that they don’t understand their data, where it lives, and how its interrelated. Keeping data unnecessarily is as bad as disposing of data you need – and in highly silo-ed organizations, data interdependencies might not surface until needed data is suddenly missing or data that should have been disposed of surfaces in a legal discovery. This is partially mitigated by bringing the right people to the planning process so that you can understand the full picture of data implications in your organization.

Data Retention Policy Solutions by Kopius

The future of enterprise effectiveness is driven by advanced data analytics and insights. Businesses of all sizes are including data strategies in their digital transformation roadmap, which must include data governance, data management, business planning and analysis, and intelligent forecasting. Understand your business goals and values, and then build the data retention policies that are right for you.

We are here to help. Contact us today to learn more about our services.

Additional Resources:

The Right Data Retention Policy for Your Organization


by Steven Fiore

Every business needs a strategy to manage its data, and that strategy should include a plan for data retention. Before setting a data retention policy, it’s important to understand the purpose of the policy and how it can contribute to organizational goals. 

There are four values that drive most businesses to do anything:  

  • To make money and increase revenue
  • To save money by decreasing costs
  • Because they must comply with regulations
  • Because they want to use the business as a platform for social good

While each of these values will be represented in any organization, some investigation will usually reveal that one or two of these values outshine the rest. Which values are most important will vary from one organization to another. 

Organizations need to start by clearly stating the goals of their data policy, and then build a policy that supports those goals. We help companies unearth business drivers so data policies can contribute to the company values and goals rather than compete with them. 

In this post, we explore best practices in establishing and maintaining a data retention policy through the lens of these business drivers.  

What are the goals of your data retention policy?

Value: Make Money

Companies that rely on advertising revenue like Google and Facebook want to keep as much data as necessary to maximize revenue opportunities.  

Companies that mine their data can spot trends in their data that inform product enhancements, improve customer experience (driving brand loyalty), and reveal revenue opportunities that would have otherwise been hidden. 

In both cases, the data retention policy should focus on what data can contribute to revenue, and how much of it is needed. Balancing aggregate data versus more granular data is the key so you retain enough data to achieve your objectives without retaining unneeded data that adds cost, complexity, and security or privacy risks.   

Value: Save Money

Many businesses focus on the bottom line and prioritize efficiency to avoid wasting time, money, and energy. 

Businesses that want to save money can use data retention to make the organization more efficient. While data storage is inexpensive, it isn’t free – and access can be more expensive than storage. So, for an organization that wants its data policies to help save money, the policy might focus on retaining only the data that is necessary to avoid extra storage and management overhead. 

Further, retaining more data than you need to can be a legal liability. Having a data retention and disposal policy can reduce legal expenses in the event of a legal discovery process.  

There’s also an efficiency cost to data – the more data you have, the slower the process will be to search and use that data. So, data retention policies can and should be part of a data governance strategy aimed at making the data that is retained as efficient to manage and use as possible. 

Value: Comply with Regulations

Many industries have their own regulations while some regulations cross industries. Businesses that must have a data retention policy may need it to comply with laws that govern data retention such as the Sarbanes Oxley Act, the Health Insurance Portability and Accountability Act (HIPAA), or IRS 1075. Even US-based companies may be subject to international legislation such as the European General Data Protection Regulation (GDPR), and companies that have customers in California need to understand how the California Consumer Privacy Act (CCPA) can impact data retention. Government agencies in the US are also bound by the Freedom of Information Act and some states have “Sunshine” laws that go even further.  

Businesses that are motivated to comply with regulations will need their data retention policy to reflect federal, state, and local requirements, and will need to document compliance with those requirements. 

Value: Business as a Platform for Social Good

 Whether an organization was established as an activist brand or has been drawn to social responsibility as investor demand has risen social responsibility, many companies are finding ways to use data to understand their social and environmental impact.  This impact is often also reported on through Environmental Social Governance (ESG) reporting, Carbon Disclosure Projects, and reporting structures like GRESB (Global Real Estate Sustainability Benchmark). 

In these cases, organizations that use their business as a platform for social good, may identify key metrics such as energy consumption or hiring data that can be used to inform reports on social responsibility.  

In closing

By understanding your organization’s values and priorities, you can ensure that its policies support those values. Every company has data to collect, manage, and dispose of, so it’s critical to have a roadmap for how to address data requirements today and into the future. This framework is a starting point to that effort because there’s nothing worse than going through the effort to implement a complex policy, only to discover that it moves the business further from its goals.  

Additional resources: