top of page
Search

Design a centralized data model in the Harmonization Layer of D&A Platform

In today’s fast-paced business environment, organizations are continuously evolving, often resulting in multiple Enterprise Resource Planning (ERP) systems coexisting. As organizations transition from legacy to current and future ERP solutions, one of the most challenging tasks is unifying data across these platforms for consistent analytics and reporting. This blog post explores how to handle unified data modeling during digital transformation and offers insights for data modelers and engineers.



Why Unified Data Modeling Matters?


Unified data modeling is crucial for:

  • Consistent Reporting: Ensures that business decisions are based on a single source of truth.

  • Data Quality: Reduces data redundancy and inconsistencies.

  • Agility: Enhances the organization’s ability to adapt to new business requirements and technologies.


Challenges in Unified Data Modeling


  • Data Redundancy: Duplicate records across systems can lead to inaccuracies.

  • Inconsistent Data Definitions: Different ERPs may have varied data definitions and formats.

  • Data Integration: Seamlessly integrating data from disparate systems is complex.

  • Maintaining Traceability: Keeping track of the origin and lineage of data is vital.



Example:


1. Data Landscape Assessment


Suppose a company uses the following ERP systems:

  • Legacy ERP: Used for historical data and some active customer accounts.

  • Current ERP: The primary system for order processing and inventory management.

  • Future ERP: Under implementation, intended to replace the current ERP.


Key Entities:

  • Customer: Exists in all ERPs with different IDs and attributes.

  • Service Cases: Recorded in both current and future ERPs.



Legacy ERP:

Customer_ID

Name

Address

Phone

Service_Case_ID

Product_ID

Creation_Date

Description

LEGACY_001

Müller GmbH

Berliner Str. 123, Berlin

030123456

L_SC_1001

PROD_A

2024-01-15

Issue with software A

LEGACY_002

Dupont S.A.

Rue de Rivoli 45, Paris

014567890

L_SC_1002

PROD_B

2024-01-20

Hardware malfunction B


Current ERP:

Customer_ID

Name

Address

Email

Service_Case_ID

Product_ID

Creation_Date

Description

CURR_001

Müller GmbH

Berliner Str. 123, Berlin

C_SC_2001

PROD_A

2024-01-16

Software error A

CURR_003

Smith Ltd.

Baker St. 221B, London

C_SC_2002

PROD_C

2024-01-18

Maintenance request C


Future ERP:

Customer_ID

Name

Address

Contact

Service_Case_ID

Product_ID

Creation_Date

Description

FUTURE_001

Müller GmbH

Berliner Str. 123, Berlin

+4930123456

F_SC_3001

PROD_A

2024-01-17

Update issue A

FUTURE_004

Gonzalez Corp.

Calle Gran Via 22, Madrid

F_SC_3002

PROD_D

2024-01-19

Installation problem D



2. Design a Centralized Data Model with Hash Keys


In the centralized data model, generate a Customer_Key as a hash of the concatenated Name, Address, and Contact_Info attributes. Here, we'll use a hash function to create the keys.


  • Use Surrogate Keys: Employ surrogate keys to uniquely identify entities.

  • Define Common Attributes: Standardize attributes across ERPs to maintain consistency.

  • Maintain Source System References: Include fields to track the origin of each record.


Customer Dimension Table:

Customer_Key

Customer_IDs

Name

Address

Contact_Info

Source_Systems

Source_Record_IDs

9e107d9d372bb6826bd81d3542a419d6

LEGACY_001, CURR_001, FUTURE_001

Müller GmbH

Berliner Str. 123, Berlin

info@mueller.de, +4930123456

Legacy, Current, Future

L_SC_1001, C_SC_2001, F_SC_3001

e4d909c290d0fb1ca068ffaddf22cbd0

LEGACY_002

Dupont S.A.

Rue de Rivoli 45, Paris

014567890

Legacy

L_SC_1002

68b329da9893e34099c7d8ad5cb9c940

CURR_003

Smith Ltd.

Baker St. 221B, London

Current

C_SC_2002

a9f0e61a137d86aa9db53465e0801612

FUTURE_004

Gonzalez Corp.

Calle Gran Via 22, Madrid

Future

F_SC_3002

Note: The Customer_Key values shown above are mock hash keys generated for illustration. In practice, you would compute these using an actual hash function (e.g., SHA-256).

Service Case Fact Table:

Service_Case_ID

Customer_Key

Product_Key

Creation_Date_Key

Resolution_Date_Key

Status_Key

Response_Time

Resolution_Time

Cost

Number_of_Interactions

Source_Systems

Source_Record_IDs

1001

9e107d9d372bb6826bd81d3542a419d6

101

20240115

20240118

1

4

72

150

2

Legacy, Current, Future

L_SC_1001, C_SC_2001, F_SC_3001

1002

e4d909c290d0fb1ca068ffaddf22cbd0

102

20240120

20240122

2

3

48

200

1

Legacy

L_SC_1002

1003

68b329da9893e34099c7d8ad5cb9c940

103

20240118

20240121

1

5

72

180

2

Current

C_SC_2002

1004

a9f0e61a137d86aa9db53465e0801612

104

20240119

20240123

3

6

96

250

3

Future

F_SC_3002

Generating Hash Keys

To generate these hash keys, we concatenate the significant attributes (e.g., Name, Address, Contact_Info) and use a hash function to produce a consistent hash value for each unique customer record. Here's a Python snippet to illustrate this process:


This script will produce unique hash keys for each customer based on their details, ensuring consistency across different systems.


3. Implement Data Harmonization Techniques


Data Matching Techniques:

  • Fuzzy Matching: Utilize fuzzy logic to detect similar names and addresses across systems, such as "Müller GmbH" appearing with different contact details.

  • Exact Matching: Leverage exact matches on fields like email addresses to identify duplicates.


Consolidation Rules:

  • Primary Source: Prioritize data from the Future ERP, assuming it has the most accurate and up-to-date information.

  • Merged Attributes: Combine comments, descriptions, and customer feedback from all systems into a single view.


4. Ensure Data Quality and Governance


Validation Rules:

  • Ensure that all contact information is in the correct format and that addresses conform to postal standards.

  • Implement consistency checks to verify that customer names and identifiers are accurate.

Data Profiling and Monitoring:

  • Utilize tools like SAP Data Intelligence to profile data regularly, identifying anomalies and ensuring data quality.

  • Set up alerts for data discrepancies or anomalies in customer records.


5. Leverage Modern Data Platforms


SAP Datasphere | Microsoft Fabric | Snowflake | Databricks Implementations:

SAP Datasphere can be used to facilitate the integration

and management of data across multiple ERP systems:

  • Data Integration: Seamlessly connect and integrate data from Legacy, Current, and Future ERPs into a unified platform.

  • Real-Time Analytics: Enable real-time insights into customer interactions and service case resolutions.

  • Scalability: Handle growing data volumes efficiently


68 views

Comments


bottom of page