Database Versioning with Git

*Database + Versioning = Efficiency *

We were so intrigued by this serious mathematical equation here at Elixirr Digital that we had to engage our best mathematician Zdravko from the Data & Analytics team to prove it out. After months of heavy research and an additional college degree, Zdravko had found that we were missing a key component to the equation.

With his findings, he single-handedly redefined the meaning of databases with one simple modification: Database + Versioning = Efficiency * Wellbeing. Needless to say, Zdravko was immediately contacted by IBM, where he is now working as a permanent replacement for E. F. Codd, father of the relational database model.

I can’t be serious, right? You’re right! Zdravko is a fictional character mischievously used to introduce Database Versioning. I sure hope ChatGPT won’t use Zdravko’s accomplishments as facts. Jokes aside, let’s get back to the topic of this blog post – Database Versioning.

1 Introduction

Database deployments can get overly complicated when there are multiple collaborators working on the same database system. Assuming that version control mechanisms are not in place, collaborators are exposed to risks of corrupting the database by deploying faulty SQL, deploying, testing, or rolling back changes to a wrong SQL version (because it was not versioned in the first place), overriding objects deployed by their colleagues, leading to inconsistent database objects across different environments etc.

Versioning plays a crucial role in the context of database deployment. Here are some key reasons why versioning is important:

Change Tracking and Auditing: by assigning a unique version number or label to each set of changes, we can easily identify and understand the modifications introduced in a specific deployment.
Collaboration and Teamwork: In a collaborative development environment, multiple team members may be working on various aspects of the database simultaneously. Versioning enables effective collaboration by providing a structured mechanism to manage and merge these changes.
Rollback and Recovery: In the event of issues or errors introduced during a database deployment, versioning enables easy rollback and recovery.
Testing and Quality Assurance: Versioning facilitates testing and quality assurance processes by allowing for the creation of dedicated environments for testing different versions of the database.
Documentation Management: Versioning serves as a documentation mechanism for database changes. Each version or label captures the associated schema modifications, data transformations, or SQL scripts, providing a clear record of the evolution of the database structure and content.

Database versioning establishes a structured and controlled process for managing and deploying database changes, leading to increased efficiency, reliability, and maintainability of the database system.

2 Database Deployment Strategies

Regardless of whether database deployments are driven by a version control system or not, a Database Deployment Strategy must be put in place. Among various strategies, we will describe two commonly used ones: Manual Deployments and Script-based deployments.

center-medium

Figure 1 Conventional Database Deployment Process

Figure 1 shows a conventional Database Deployment process – 1) developers deploy to DEV and complete unit tests successfully: 2) DBAs deploy the same changes to PROD and acceptance tests are completed successfully. This process is repeated by multiple developers and DBAs, potentially at the same time. A standardised set of Deployment rules can prevent multiple developers from overwriting each other’s changes and eliminate some portions of DBAs’ due diligence.

2.1 Manual Deployments

Manual deployments involve manually executing SQL scripts or making changes directly in the database environment. This approach often entails the following steps:

Developers create SQL scripts or make changes directly in the development environment
Database administrators (DBAs) manually execute the scripts or apply the changes in the target database environment
The process may involve coordination between development and operations teams to schedule and perform the deployment
This strategy typically lacks version control, relying on manual tracking of changes and documentation

Manual deployments have some notable characteristics:

Flexibility: Changes can be made directly in the database, allowing for ad hoc modifications
Limited Control: It may be challenging to track changes, enforce consistency, and ensure that all necessary modifications are applied correctly
Error-Prone: The absence of automation and standardised processes increases the risk of human errors during the deployment
Collaboration Challenges: Coordinating and communicating changes across teams can be time-consuming and error-prone

2.2 Scripting-based deployments

Scripting-based deployments involve using scripts or tools to automate the deployment process. Some common approaches include:

Creating SQL scripts that encapsulate database changes and can be executed automatically
Using tools specifically designed for database deployments, such as Flyway, Liquibase, or Redgate SQL Change Automation
Employing custom scripting frameworks or libraries within the organization

Scripting-based deployments offer several advantages:

Automation & Rollback: By using scripts or dedicated tools, the deployment process can be automated, reducing manual effort and minimising errors
Version Control Integration: Scripts can be versioned using a version control system like Git, allowing for change tracking, collaboration, and rollbacks
Consistency: Standardised deployment scripts ensure that changes are applied consistently across environments
Repeatable Deployments: Scripts can be re-executed reliably, enabling repeatable deployments, and supporting continuous integration and delivery practices

However, scripting-based deployments also have some considerations:

Learning Curve: There may be a learning curve associated with adopting and configuring deployment tools or frameworks
Maintenance: Scripts and deployment tools require ongoing maintenance to keep them aligned with database changes and evolving requirements
Tool Selection: Choosing the appropriate tool or scripting approach depends on factors like the database platform, team expertise, and organisational needs

It is important to evaluate these strategies based on project specific requirements, team capabilities, and the complexity of database deployments.

3 Version Control Strategies

Now that Database Deployment Strategies are discussed, we can jump over to discussing Version Control Strategies. It is important to note that these two processes are tightly coupled – Database Deployment Strategy directly impacts the Version Control Strategy and vice versa. When, for example, a Version Control Strategy is put in place, Database Deployment’s Automation & Rollback process needs to be aligned to support it. Ideally, Version Control should be defined at least on a high level before starting to think about Database Deployments. The following sections discuss and compare Git branching strategies, simultaneously drawing a parallel to a common database deployment strategy.

3.1 Git Flow

Git Flow is a commonly used branching strategy for large teams – it is a bit complicated but effective, release based branching strategy that enables parallel development on the same repository (as long as teams are holistically following the set of rules defined by Git Flow, e.g. release branches should not contain changes developed by other developers and deployed to the DEV branch).

The following figure shows a simplified Git Flow branching strategy, pushing focus to this strategy’s vulnerability in terms of not following the set of rules defined by Git Flow.

center-big

Figure 2 Git Flow – Simplified branching strategy

Git Flow assumes that all changes deployed to DEV should be deployed to PROD. This example depicts a scenario where deployments do not follow the branching strategy set of rules. In the example, 1) f1 had a failed deployment to DEV, but a fix was then deployed that resolved the failure. After some time, 2) f2 was created before f1 got released to PROD. After the DEV deployment was completed successfully, 3) a release branch r1 was created and deployed to PROD.

center-medium

Figure 3 Git Flow – Database Deployment

At this point, r1 deployed changes from both f1 and f2 feature branches. Ideally, the merge reviewers will identify that the changes from f1 shouldn’t be deployed, and they will take the appropriate steps to avoid the risk of deploying unwanted changes, following Git Flow rule sets.

3.2 Data Flow

Data Flow is a convenient name we appointed to a custom branching strategy we developed to be used by multiple large teams – it is a simple and effective, feature based branching strategy that enables parallel development on the same repository. This strategy’s set of rules are not only holistically followed by teams, but additionally enforced by custom developed “guardrails” orchestrated by GitHub Actions.

The following figure shows an example Data Flow branching strategy. This strategy’s vulnerability is the possibility of merge conflicts that can occur when changes are made to scripts that were deployed to DEV branch, but not yet deployed to PROD branch.

center-big

Figure 4 Data Flow – Branching strategy

Data Flow assumes that all changes deployed to PROD should be deployed to DEV, but DEV should never get deployed to PROD. This example depicts a scenario where the branching strategy set of rules is enforced by custom developed “guardrails”. In the example, 1) f1 deployed to DEV successfully. After some time, 2) f2 was created before f1 got released to PROD. f2 had a failed deployment to DEV, but a fix was then deployed that resolved the failure. After the DEV and PROD deployments were completed successfully, 3) PROD was merged into DEV.

The last step shows how 4) a branch was trying to be created off the DEV branch, but the custom developed “guardrails” prevented the branch from being created.

center-medium

Figure 5 Data Flow – Database Deployment

At this point, PROD contains changes from f2 feature branch only. With Data Flow, the merge reviewers have some process related considerations to account for, but even when they don’t – the custom developed “guardrails” will ensure that unwanted changes do not reach a target database environment.

4 Hands-on Experience

One of our clients from the banking industry employs a large number of teams who deploy changes to the same database. The changes are usually complex enough to cause failures for other deployments in case of an erroneous change. Given a lot of teams deploy to the same database, and a rollback process is not implemented as a part of the manual deployment process, it’s important that all the teams working on this database are coordinated in case of an erroneous change – such coordination can consume a lot of time. To streamline the deployment process, our client transitioned from manual deployments to scripting-based approaches, a Flyway-based database deployment pipeline that leverages Git and a custom automation framework that allows rollbacks.

After the client onboarded multiple teams to the pipeline, they noticed how a lot of manual effort was eliminated due to a streamlined deployment strategy. The rollback automation enabled teams to consistently focus on development rather than troubleshooting. Git enabled version control, making changeset identification easier. These components enabled teams to efficiently deploy database changes. Since erroneous changes are automatically rolled back, teams can perform deployments mindlessly, improving overall wellbeing.

A steep learning curve was evident, but the value of our pipeline is increasing as we are introducing new features like automatic code scanning and change request creation for Production deployments. They don’t find Zdravko’s Database + Versioning = Efficiency * Wellbeing equation fictive at all, in fact – the onboarded teams are promoting it to new onboarding teams!

5 Conclusion

Version controlled database deployment is an invaluable process for projects where multiple teams are actively developing changes in the same database. Benefits like transparent change tracking and auditing, automated rollback and recovery make collaboration between SQL developers and DBAs very precise and effective. Not to squander with benefits only, there are downsides such as maintaining the software driving the process, a learning curve with adopting and configuring the software, and finally – security concerns like hosting and VPCs. But at the end of the day, if we ask Zdravko to say a couple of words about Database Versioning, he would put it very precisely as Efficiency and Wellbeing.

Database Versioning with Git

1 Introduction

2 Database Deployment Strategies

3 Version Control Strategies

4 Hands-on Experience

5 Conclusion

More on this subject

How to manage Apache Iceberg Tables using AWS resources

Database Versioning with Git

1 Introduction

2 Database Deployment Strategies

3 Version Control Strategies

4 Hands-on Experience

5 Conclusion

How to manage Apache Iceberg Tables using AWS resources

Receive Our Emails