In today’s fast-paced business environment, organizations are continuously evolving, often resulting in multiple Enterprise Resource Planning (ERP) systems coexisting. As organizations transition from legacy to current and future ERP solutions, one of the most challenging tasks is unifying data across these platforms for consistent analytics and reporting. This blog post explores how to handle unified data modeling during digital transformation and offers insights for data modelers and engineers.
Why Unified Data Modeling Matters?
Unified data modeling is crucial for:
Consistent Reporting: Ensures that business decisions are based on a single source of truth.
Data Quality: Reduces data redundancy and inconsistencies.
Agility: Enhances the organization’s ability to adapt to new business requirements and technologies.
Challenges in Unified Data Modeling
Data Redundancy: Duplicate records across systems can lead to inaccuracies.
Inconsistent Data Definitions: Different ERPs may have varied data definitions and formats.
Data Integration: Seamlessly integrating data from disparate systems is complex.
Maintaining Traceability: Keeping track of the origin and lineage of data is vital.
Example:
1. Data Landscape Assessment
Suppose a company uses the following ERP systems:
Legacy ERP: Used for historical data and some active customer accounts.
Current ERP: The primary system for order processing and inventory management.
Future ERP: Under implementation, intended to replace the current ERP.
Key Entities:
Customer: Exists in all ERPs with different IDs and attributes.
Service Cases: Recorded in both current and future ERPs.
Legacy ERP:
Customer_ID | Name | Address | Phone | Service_Case_ID | Product_ID | Creation_Date | Description |
LEGACY_001 | Müller GmbH | Berliner Str. 123, Berlin | 030123456 | L_SC_1001 | PROD_A | 2024-01-15 | Issue with software A |
LEGACY_002 | Dupont S.A. | Rue de Rivoli 45, Paris | 014567890 | L_SC_1002 | PROD_B | 2024-01-20 | Hardware malfunction B |
Current ERP:
Customer_ID | Name | Address | Service_Case_ID | Product_ID | Creation_Date | Description | |
CURR_001 | Müller GmbH | Berliner Str. 123, Berlin | C_SC_2001 | PROD_A | 2024-01-16 | Software error A | |
CURR_003 | Smith Ltd. | Baker St. 221B, London | C_SC_2002 | PROD_C | 2024-01-18 | Maintenance request C |
Future ERP:
Customer_ID | Name | Address | Contact | Service_Case_ID | Product_ID | Creation_Date | Description |
FUTURE_001 | Müller GmbH | Berliner Str. 123, Berlin | +4930123456 | F_SC_3001 | PROD_A | 2024-01-17 | Update issue A |
FUTURE_004 | Gonzalez Corp. | Calle Gran Via 22, Madrid | F_SC_3002 | PROD_D | 2024-01-19 | Installation problem D |
2. Design a Centralized Data Model with Hash Keys
In the centralized data model, generate a Customer_Key as a hash of the concatenated Name, Address, and Contact_Info attributes. Here, we'll use a hash function to create the keys.
Use Surrogate Keys: Employ surrogate keys to uniquely identify entities.
Define Common Attributes: Standardize attributes across ERPs to maintain consistency.
Maintain Source System References: Include fields to track the origin of each record.
Customer Dimension Table:
Customer_Key | Customer_IDs | Name | Address | Contact_Info | Source_Systems | Source_Record_IDs |
9e107d9d372bb6826bd81d3542a419d6 | LEGACY_001, CURR_001, FUTURE_001 | Müller GmbH | Berliner Str. 123, Berlin | info@mueller.de, +4930123456 | Legacy, Current, Future | L_SC_1001, C_SC_2001, F_SC_3001 |
e4d909c290d0fb1ca068ffaddf22cbd0 | LEGACY_002 | Dupont S.A. | Rue de Rivoli 45, Paris | 014567890 | Legacy | L_SC_1002 |
68b329da9893e34099c7d8ad5cb9c940 | CURR_003 | Smith Ltd. | Baker St. 221B, London | Current | C_SC_2002 | |
a9f0e61a137d86aa9db53465e0801612 | FUTURE_004 | Gonzalez Corp. | Calle Gran Via 22, Madrid | Future | F_SC_3002 |
Note: The Customer_Key values shown above are mock hash keys generated for illustration. In practice, you would compute these using an actual hash function (e.g., SHA-256).
Service Case Fact Table:
Service_Case_ID | Customer_Key | Product_Key | Creation_Date_Key | Resolution_Date_Key | Status_Key | Response_Time | Resolution_Time | Cost | Number_of_Interactions | Source_Systems | Source_Record_IDs |
1001 | 9e107d9d372bb6826bd81d3542a419d6 | 101 | 20240115 | 20240118 | 1 | 4 | 72 | 150 | 2 | Legacy, Current, Future | L_SC_1001, C_SC_2001, F_SC_3001 |
1002 | e4d909c290d0fb1ca068ffaddf22cbd0 | 102 | 20240120 | 20240122 | 2 | 3 | 48 | 200 | 1 | Legacy | L_SC_1002 |
1003 | 68b329da9893e34099c7d8ad5cb9c940 | 103 | 20240118 | 20240121 | 1 | 5 | 72 | 180 | 2 | Current | C_SC_2002 |
1004 | a9f0e61a137d86aa9db53465e0801612 | 104 | 20240119 | 20240123 | 3 | 6 | 96 | 250 | 3 | Future | F_SC_3002 |
Generating Hash Keys
To generate these hash keys, we concatenate the significant attributes (e.g., Name, Address, Contact_Info) and use a hash function to produce a consistent hash value for each unique customer record. Here's a Python snippet to illustrate this process:
This script will produce unique hash keys for each customer based on their details, ensuring consistency across different systems.
3. Implement Data Harmonization Techniques
Data Matching Techniques:
Fuzzy Matching: Utilize fuzzy logic to detect similar names and addresses across systems, such as "Müller GmbH" appearing with different contact details.
Exact Matching: Leverage exact matches on fields like email addresses to identify duplicates.
Consolidation Rules:
Primary Source: Prioritize data from the Future ERP, assuming it has the most accurate and up-to-date information.
Merged Attributes: Combine comments, descriptions, and customer feedback from all systems into a single view.
4. Ensure Data Quality and Governance
Validation Rules:
Ensure that all contact information is in the correct format and that addresses conform to postal standards.
Implement consistency checks to verify that customer names and identifiers are accurate.
Data Profiling and Monitoring:
Utilize tools like SAP Data Intelligence to profile data regularly, identifying anomalies and ensuring data quality.
Set up alerts for data discrepancies or anomalies in customer records.
5. Leverage Modern Data Platforms
SAP Datasphere | Microsoft Fabric | Snowflake | Databricks Implementations:
SAP Datasphere can be used to facilitate the integration
and management of data across multiple ERP systems:
Data Integration: Seamlessly connect and integrate data from Legacy, Current, and Future ERPs into a unified platform.
Real-Time Analytics: Enable real-time insights into customer interactions and service case resolutions.
Scalability: Handle growing data volumes efficiently
Comments