Developing a Data Lakehouse for a South African Government-Sector Training Authority: Governance Framework Design Through Systematic Literature Review

Developing a Data Lakehouse for a South African Government-Sector Training Authority: Governance Framework Design Through Systematic Literature Review

Zamani Khulekani Mthembu, Sulaiman Saleem Patel, Nalindren Naicker, Seena Joseph, Lavanya Madamshetty, Devraj Moonsamy, Ayotuyi Tosin Akinola, Thamotharan Prinavin Govender
DOI: 10.4018/978-1-6684-9716-6.ch007
(Individual Chapters)
No Current Special Offers


The Durban University of Technology is undertaking a project to develop a data lakehouse system for a South African government-sector training authority. This system is considered critical to enhance the monitoring and evaluation capabilities of the training authority and ensure service delivery. Key to the successful deployment of the data lakehouse is the implementation of suitable data governance for the system. This chapter identifies the key components of data governance relevant to the system through a systematic literature review process. Thereafter, the components of data governance are mapped against the technical architecture of the data lakehouse and the governance mechanisms are for all lakehouse system components are defined. A practitioner expert evaluation is presented to assess the data governance mechanisms. Overall, the data governance framework and resulting mechanisms were found to be sufficient, except regarding ensuring data quality. The need for separate studies focused on ensuring data quality for the data lakehouse system was identified as future work.
Chapter Preview

Background To The Study

In South Africa, Sector Education and Training Authorities are organizations established by government to facilitate skills development and training within specific sectors of the economy. These Government-Sector Training Authorities (GTAs) are a critical component of the country’s National Skills Development Strategy and play a pivotal role in addressing the skills gap and training needs within various industries. This study is part of an ongoing project between the Durban University of Technology (DUT) and a South African GTA. The aim of this project is to modernize the knowledge management capabilities of the GTA, while simultaneously developing skill and capacity among DUT students.

To improve the knowledge management capabilities of the GTA, it was identified that an end-to-end data warehousing and automated reporting system was needed. Through scoping discussions and consultation between GTA stakeholders and the DUT design team, it was decided that the data warehousing solution would be developed on the Microsoft Azure technology stack.

The study presented in this chapter was conducted as part of the aforementioned project and focuses on the data governance and management considerations that are needed alongside the technical development of the data warehousing system.

While Microsoft Azure cloud services provide a robust platform for building DLHs, the development of a data governance framework (DGF) tailored to the unique needs of South African GTAs remains a crucial yet understudied area (Al-Ruithe et al., 2019). Existing literature on data governance in DLH predominantly focuses on generic frameworks and fails to address the challenges faced by GTAs such as:

  • Privacy and confidentiality: GTAs deal with sensitive personal information of individuals participating in training programs. Ensuring data privacy and confidentiality is crucial to comply with data protection regulations and maintain trust (Amo et al., 2021).

  • Data quality and integrity: Accurate and reliable data is vital for decision-making and policy formulation. GTAs may face challenges in maintaining data quality and integrity, such as data inconsistencies, duplicates, and data integration issues from various sources (Abraham et al., 2019).

  • Data security and access control: Protecting data from unauthorized access, breaches, and cyber threats is a critical concern. GTAs need robust security measures and access controls to safeguard sensitive information and ensure compliance with security standards (Gupta et al., 2022).

  • Compliance with regulations: GTAs must comply with specific regulations and legislation related to data management and protection. These can include laws like the Protection of Personal Information Act, 2013 (POPIA) or sector-specific regulations that impose additional requirements on data handling.

The aim of this study is thus to develop a data governance framework for the data warehousing solution being developed for the South African GTA. In the following sections of the chapter, an introduction to modern data warehousing and data governance is first provided. Thereafter, an overview of the data warehousing solution designed by the DUT project team is provided. This establishes an understanding of the technical system that the governance framework is being designed to support. With that understanding in mind, the third section of the chapter presents a systematic literature review that identifies the key elements of data governance that are most relevant to government-sector organizations. Using these elements, the bespoke governance framework is designed for the system under study, and evaluated through consultation with an industry expert practitioner. The final section of the chapter presents conclusions and findings arising from the study, and suggests directs for future research efforts.

Key Terms in this Chapter

Data Management: The operational and strategic management of data assets.

Data Governance Framework: A set of guiding principles that the exercise of authority, control, and shared decision making over the management of data assets.

Data Lakehouse: An enterprise information system that facilitates the analysis of structured, semi-structured or unstructured data by an organization.

Complete Chapter List

Search this Book: