In a previous blog post, I talked about the importance of data agility in an organization. As mentioned in that post, data agility is defined by TDWI as: “how fast can you extract value from your mountains of data and how quickly can you translate that information into action?”
For business, the translation is about getting your data into the hands of your analysts and business stakeholders in a way in which they can easily and quickly find insights about your business. I also mentioned in that post about utilizing platforms like Snowflake to help your organization achieve maximum data agility. However, Snowflake is only a part of the overall infrastructure and having it alone does not mean your organization will be able to extract value from your data in an agile fashion.
In this post, I’d like to help you understand some of the key concepts of a highly agile data infrastructure. We’ll discuss some of the common pitfalls of existing infrastructures, and how your organization can avoid or fix these issues. We’ll also discuss how you can enable people in your organization to become data engineers and enable your organization to increase its data agility.
But to understand where we need to go, let’s discuss how we got here.
The early days of organization structure and data infrastructure
Since the mid-2000s, data infrastructure has been moving toward monolithic data warehouses. These data warehouses were often controlled under the domain of a single data team and were seen as having the “golden records”. Getting data into or out of the warehouse was often complex and involved having to get many teams together to agree on what, when, where, and how. As the world started to change, businesses started to evolve to keep up with the faster pace. However, the monolithic data warehouse started to become a bottleneck to innovation and operation. And the data engineering team that guarded this system, started to become part of that bottleneck as well.
Data Engineering is defined as the process of preparing data for analytical and operational use. In a practical sense, this is usually the acquisition, shaping/transformation, and loading of data into systems of intelligence. In the monolithic data warehouse architecture, this is usually a centralized team that fields requests from the different business units within the organization. The problem with this approach is that the centralized data engineering group rarely has the business knowledge to understand the data and how to categorize and secure it. And while they can work with the business unit to build this understanding, typically many details are lost in translation.
The current landscape of organization structure and data infrastructure
This evolution of the business brought about business units hiring or educating members of their teams to become amateur data experts. These folks, at first, employed user-friendly systems such as Excel or Access to develop and curate data from their teams. These less sophisticated and rouge systems quickly became essential data sources for reporting and operations. While this allowed the business to move faster and keep up with the pace of innovation, it created a gap in trust between what the IT teams had blessed as the golden records and what the business units produced as operational and analytical data. And without knowing, it also created a new kind of data engineer.
These new data engineers have created both a unique opportunity and a unique set of problems. Since they live inside the business unit, they have both the knowledge and the understanding of how to categorize and secure the data. This obviously helps the business as these data engineers can adapt to quickly changing business needs. The main problem, however, is that since these folks are typically confined to the less sophisticated systems mentioned above, accessing the data they curate can be a challenge for other business units.
Modernizing organization structure and data infrastructure for the future
As the ownership of source data moves into individual business units, it is important that IT teams recognize and enable this ownership model to ensure the business can continue to innovate and grow. Part of that enablement is looking toward systems that offer elastic scale and storage, but also offer the ability for teams to identify and classify information as well. With this architecture, the IT team can focus on ensuring system availability and performance, while individual data domain owners focus on the type and classification of data. Individual business units should be given an area of ownership within the system and the ability to find and connect to other business units’ data.
Recognizing the importance of having individual business units take ownership of the data they produce means a shift in focus on the type of team members within these teams. As mentioned above, if you were to look at your current business units, there’s most likely a person or persons acting as a data engineer. Formalizing this role will give confidence to your teams that they own their output and also allows you to set rules and requirements for their participation. However, be careful that you don’t try to dictate what data they can provide, but rather the rules for onboarding data they wish to own.
Learn more about our services and how we help businesses unlock the power of their data.