Many of us interact with data as part of our day-to-day work. Suppose if you are a college employee, you will need to handle data every day. Whether you are an Administrator or classified staff, faculty, confidential and supervisory employee, irregular wage worker, or even work-study student, data handling will surely be a part of your life. This was just an example but data management is a crucial part of our daily life regardless of the profession we are in. In this article, we will be discussing what data management is and why is it Important.
What is Data Management?
The formal definition of data management comes from the Data Management Association International, Which states that “Data management refers to the development and execution of architectures, policies, practices, and procedures, in order to manage the information cycle of an enterprise in an effective manner”. It may sound a bit vague and complex to understand. To be more precise, we can infer that, Data management is simply the management of information. In fact, data is information, in many incarnations. For example, some of the kinds of data that the college often works with are student and employee records, like registration and enrolment information, grades, addresses, legal names, payroll and tax documents, emergency contact info, and lots more. You might not have realized how often you interact with data while you work and that is why data management is so important. It means that the information we work with should be accurate, consistent, and secure.
- Accurate means that the data is correct.
- Consistent means that the data is interoperable of flows between systems and departments without any issue.
- Secure means that the data is safe, both from malicious intent and the occasionally unavoidable human error.
Better data leads to better decision making, strengthens planning and assessment efforts, increases understanding of students and employees (in case of a college setting), allows for successful participation in state and federal efforts, strengthens accountability, and reduces risk to the institution from weak decision making and inaccurate reporting.
Top 10 mistakes that we generally commit while managing data:
Let’s review ten data management mistakes we commonly see:
- Flaky Data Management Plan: If there is no strategy in place for managing your data then you essentially are a ship without a rudder. A plan needs to be in place to manage the movement, lifecycle, security, availability, and quality of your data.
- Tools are used in place of the Data Management Plan:
- Unfortunately, we see this happen a lot. Data management tools are just that, Tools. If you do not have a long-term plan in place, you will either underutilize or over-utilize your tools.
- For instance, you may recall the example of Maslow’s Hammer. If all you have is a hammer, everything looks like a nail.
- There is a time and place for every tool and that is part of what a Data Management Plan outlines. An example of this is your ETL tool. ETL stands for “Extract, Transform, Load,” the three processes that, in combination, move data from one database, multiple databases, or other sources to a unified repository—typically a data warehouse. Using an ETL tool to do Orchestration and Scheduling is possible, but is it ideal?
- Lack of Meta Data Management:
With any implemented data integration solution in place, you are going to have data moving all over the place. But can you specify, where is it going, how it got there or how many transitions it went through? You are kidding yourself if you are thinking that you won’t have to answer these questions many times over. You need both, plan and the tools necessary to address this challenge.
- Master Data is not Mastered (lives in applications, ETL, etc.):
If you do an exhaustive search for one of your customers across all your systems, you would probably find several versions of that customer. Therefore, it becomes difficult to determine which one is right? That customer information needs to be stored and managed centrally. And a plan needs to be put together with the business to do so.
Let us dig a little deep to understand what Master Data v/s Transaction Data is?
- Master Data represents the people, places, or things that an organization cares about. Suppose you as a customer purchase some cheese from any store. So in this case, Master data would be you, the customer, the cheese product, the employee, and the store. The Transaction Data on the other hand is an event in which the Master Data participates, in this case, it will be the purchasing of the cheese. So some examples would be the price, the discount or coupon and the method of payment.
- So you can almost compare Master Data to Nouns and Transaction Data to Verbs as one describes a person, place, or thing, and the other describes an action or event that those nouns participate in.
- Another way of deciphering master data from transaction data is how often that data is changing. Master data like you, the customer, should be consistent whether you are checking out at the company website or checking out at the cash register. You are you no matter where the company interacts with you. Unlike the transaction data which is changing every time you purchase something from the store. One day you might spend 10 Rupees, the next day it might reach 100 Rupees. This measurement is called Volatility. If the data is highly volatile, then it is most likely transaction data.
- How we manage Master data v/s Transaction data is important as they both have very unique challenges. Master data is challenged by consistency issues. For example, you may use your customer loyalty card when you check out at the cash register, but when you buy online you might not. This creates two customer profiles, even though it is the same person. This also makes it difficult for the retailer to evaluate you as a customer, and it inflates the number of customers the retailer thinks it has.
- Transaction data is challenged by the amount of data and the decentralized ways that departments roll it up. For example, just imagine how much data is captured in just one day in any local grocery shop. The amount of data can be staggering, but everybody wants that data because it represents a snapshot of how the organization and its departments are performing. Because of this, multiple departments might have their way of rolling up that data which creates inconsistencies in logic, which ultimately leads to bad and stalemated decisions.
- Data Quality is believed to be an IT function:
This is perhaps one of the most challenging issues that IT groups have to deal with. The perception that Data is an IT issue, can get in the way of an organization making any progress in fixing data quality challenges. Since IT does not create the data, it is nearly impossible for them to determine whether the data is correct or not, the business must be involved.
- Data Warehouse does not equal to Big Database:
We find both large and small organizations that fall into the trap of assuming that the data warehouse is a dumping ground for report tables. There are huge opportunities that are being left behind with this mentality. A prominent question that arises is, how do you know if you have a real Data Warehouse? Let’s try to find the answer through a couple of short stories:
- The first story is about a report developer that is fed up with having to draw data from multiple locations to get information for the business. To fix this, the organization creates a database where all the tables needed for reporting can be found in one place, and this new database gets updated regularly with the latest data through scheduled refreshes. And somewhere down the line, this dumping ground of the report tables gets officially dubbed as the Data Warehouse.
- The second story is about an astute DBA (Database Admin) that is very good at creating views. You can think of a view as a data set that is processed by the database on the fly from multiple tables. These views are what supply the data to the reports. And all the logic for those views is sitting in code which only the DBA understands. Somewhere down the line this bunch of tables calculated on the fly gets deemed as the Data Warehouse.
- Neither of these examples represents a real Data Warehouse. But, in both of these examples, we see important requirements being met. In the first example, we see the importance of the data residing in a single location to simplify access to the tables. In the second example, we see the importance of simplifying query logic so that the report writer can focus on building content.
- In a real Data Warehouse, the best qualities of these two scenarios are taken, by physically centralizing the data and simplifying the logic for its consumption. The key to the success of a data warehouse is the business stakeholders and not the tools or technicalities.
- Building a Data Warehouse is much like peeling an onion, if you don’t start with the first layer, you are likely going to miss the big picture. The temptation to dive right into the detailed data and piece together what the business needs will ultimately cost you time and rework. There are 8 layers of the Data Warehousing onion that need to be peeled for a successful deployment. Once these layers are conquered, the organization has a pattern or rather a program for acquiring raw data and turning it into a shared decision-making asset.
So what are the signs of you not having a real Data Warehouse?
- Data Disparity: Acquiring your data involves scouring multiple databases.
- Views Everywhere: Acquiring your data might be very reliant on database views.
- IT Owned: The data warehouse was created as an IT project and did not require Business Stakeholder attention.
- Wanting a new Business Intelligence tool: You are wanting to replace your BI tool because you are not getting the data you need.
- Tribal knowledge: All the logic for gathering data sits in the head of your report developers.
- Shadow IT: Business Analysts are independently creating their analytical environments to get information out of the data.
- Multiple Truths: The management meetings are more about whose data is right than what the actual performance gaps are.
- Heavy BI query logic
- Excel-based integration: Analysts have to use spreadsheets with dozens of tabs to get the data to look right.
- Resource intensive: Management reports or dashboards appear simple but on the back-end, they require a large team of resources doing repetitive data integration work.
- Business Intelligence and Data Warehousing are separated by a management wall:
We see this often occur in large organizations where the need to insert process controls begins to erode the agility of Business Intelligence. The Data Warehousing and BI teams need as much cohesion as possible to ensure that both tactical and strategic data requests are being handled appropriately.
- Self Service Business Intelligence= Lack of Understanding/ Responsibility:
With many of the tools in the market today, business users can simply import excel spreadsheets and do their analysis. This is a good thing as it enables very tactical questions to be asked and answered. However, this also can create an environment where there is no shared or governed data for the larger organization. Often the result is that neither IT nor the Business takes ownership over the strategic data integration initiatives which are needed to feed information to a larger audience.
- Big Data is the new panacea- It’s not:
If you ever got a chance to follow the business intelligence industry then you might know that it is ruled by buzzwords. Big data is the new buzzword that every technology vendor is using to describe their product features. While there are some very valid innovations like Hadoop and Cloud-based services, the message is largely a new angle on the existing methodologies. There is still no Pixie dust solution out there.
- Assuming goodwill with the security of your data:
You might have firewalls in place for keeping outsiders from accessing your sensitive data but what about within the four walls of your own company? It is estimated that 88% of all data breaches involve insider negligence.
These are just a few of the common challenges we see organizations dealing with and they are a part of a larger study of topics that are addressed during the Data Management Health check. These health checks help people evaluate how they score in their Technology Landscape, Data usage, Enterprise Governance, and Business Culture.
10 data management mistakes we commonly see
How to keep your Laptop PC safe from internal and external damages or theft
Manage & Monitor multiple Windows 10 PC using browser & WAC
How to remove Outliers from a Dataset using Python?
What is Skewness and Kurtosis in Data Science??
How to Make Money as a Data Scientist through Alternate Sources