Formalizing the database build process pays dividends, and I'm talking cash money dividends. The database world is much like many programming ecosystems, like Java or .NET for example. It is a little different too, but at the end of the day code is code and it should be treated as critical to the success of business. So why shouldn't the database code go through the same standard best practices that every other programming language I have used does? Things like unit testing and continuous integration, and logical release processes to name a few.
Utilizing an open source tool called Liquibase your database build and release process can be sane. This post will go over the basic concepts and usage of Liquibase http://www.liquibase.org/. Liquibase is written in Java and is database independent. The licence for Liquibase is Apache License, version 2 so no fear if you want to modify the source code and/or your budget is $0.00.
At a very simple high level Liquibase keeps track of what SQL scripts have been executed against a database and only acknowledges future scripts that haven't been executed against a database. This might not sound very exciting, but it opens the doors to a sweet database build and release process.
High Level Benefits Of Sweet Database Build And Release Process
- Database objects are sourced in version control and there's one central location for a given object. This helps save time and removes human error of managing copies of the same code.
- Builds are repeatable. This is where most if not all database build process fall hard, not so with Liquibase.
- Builds can be automated. This opens the door for continuous integration and unit tests. Reduce the risk of sending broken junk to QA or worse production. In summary fail fast and early and roll in the cash savings.
The remainder of this post will walk through the creation of a simple database using Liquibase as the foundation of the process.
If you want to follow along hands on these are the items you will need:
(I'm doing this on Windows 7, but the OS shouldn't matter as long as you have the below items in place. I'm also invoking Liquibase from the command line. Though there is support for Ant, Maven, Spring, Servlet Listener, and CDI Environment. Honestly I don' know what half of that even is.)
- Java 1.6+ http://www.java.com/en/
- Liquibase 3.0.7 or newer http://www.liquibase.org/download/index.html (liquibase-3.0.7-bin.zip or liquibase-3.0.7-bin.tar.gz). Extract to a new directory of your chose and add it to your path.To quickly validate that Liquibase is in working order go to a command prompt and type liquibase --version and you should see Liquibase Version: 3.0.7 as output.
- PostgreSQL 8.4+ (local or remote install), though any version that a JDBC driver can connect to should work http://www.postgresql.org/download/. You actually can use any database vendor for the most part that has a JDBC driver. My example will use PostgreSQL 9.1, but feel free to use what you want. If you want my SQL code to work use PostgreSQL as you go through the hands on exercise or adjust for your database.
- Make sure you can connect to your install of PostgreSQL and create a new database called mercury. The quickest way may be to use the GUI tool called pgAdmin. For this exercise just log in as the postgres user. This is the admin account and can do anything to the database so typically you wouldn't want to use this account so loosely, but for learning no worries.
- Download and place the JDBC jar file for PostgreSQL on your file system and remember the location we will need it before we can build our database. http://jdbc.postgresql.org/download.html
High Level Technical Overview
Liquibase is a Java application that reads in a configuration file that tell it where to read your database scripts from on disk and compares it to a change log table it stores in the target database. Using a combination of file check-sum and some other data in each sql script Liquibase knows if for example a table already exists and to include it in the current build. You can tell it to spit out a upgrade.sql script or to execute the needed changes on the target database. I prefer having Liquibase write out to an upgrade.sql file and if you have a DBA team they are more likely to buy into this approach. This gives you and the DBA team a tangible artifact that can be reviewed before it's just executed.
Liquibase Build Configurations
In order to make Liquibase do anything of value you need to first setup one or more configuration files. Your configuration file is XML and you call Liquibase and feed it this configuration so it knows where to look for your database source code. Out of the box Liquibase gives you the flexibility to setup your database source code in a few different ways. There's full on XML, YAML, JSON, and plain SQL. The approach I take uses SQL as I have a database background and see no reason to abstract out a very easily digestible language like SQL.
To touch all points we will make use of our brand new mercury database we created previously.
- Create a directory called C:\DatabaseSource\mercury
- Create a sub directory called Tables, ForeignKeys, and Functions under the mercury directory.
- Under the mercury directory create a file named 0_LiquiBaseBuildConfigMain.xml. With the following content.
0_LiquiBaseBuildConfigMain.xml
All of these items in the real world would be added to source control once we add files to them. That is one of the benefits of using Liquibase. It allows you to logically place all your objects into your source control without any custom hack jobs I have seen on some home grown attempts at database build process. Not everyone would store the config file in source control, but that's up to you.
On to the 0_LiquiBaseBuildConfigMain.xml file, the header section can be ignored, focus on the stuff starting at line 8 and ending on line 10. You will see that our main configuration file is making reference to three other configuration files. This is not required, but I have broken the configuration into a main file with a configuration specific to each database object. This helps modularize the configuration and helps future proof the setup. For this post the only objects we will be working with are tables, foreign keys, and functions to keep things simple. In the real world this would be extended out to schemas, users, views, and ... you get the drift. At this point the important thing is to notice the order of the directory references in the main configuration file. I create tables first followed by the fk's for the tables which is followed by the functions. I broke out the fk's from the table definition for a reason. By placing the fk reference after the tables Liquibase will first gather all the table scripts followed by all the fk's. This way we don't have to worry about ordering the table creation in any special sequence. The three object configuration scripts look like the following.
1_LiquiBaseBuildConfigTables.xml
2_LiquiBaseBuildConfigForeignKeys.xml
3_LiquiBaseBuildConfigFunctions.xml
In each configuration file take note that each one uses the tag includeAll. This tells Liquibase to go into the target directory and grab all the scripts. Last I could tell it was alpha numeric. If you don't do this you can also use the include tag and name off every single script precisely and in the exact order you want them added to your upgrade script. If needed this can be used entirely or mixed and matched with includeAll, as needed. I find if you can make use of includeAll your configuration stays smaller and easier to wrap your mind around. Cases may come up where you have to do an include, and that's ok.
As part of initial setup to make life easier for the long haul we will create a file named liquibase.properties under our mercury directory. This contains the values of the jar file we want to use along with the database connection settings. You can also pass these along at the command line, but this is easier and more robust.
Copy the below text and place in your liquibase.properties file. Update the location of the jar file to your location and change the other settings as needed.
liquibase.properties
Database Source
Now that we have the configuration setup we need some database objects to add to our build. Let's add two tables, a parent and child with a fk reference to the parent and one function that references the parent table. This will show how we deal with dependency order, something that makes database builds a challenge compared to languages that have compilers that figure out this type of thing.
Under our Tables directory create a file named public.department.sql. Inside of it paste the following.
public.department.sql
The special things to take note on are line 1 --liquibase formatted sql. This is a required SQL comment in any sql file you want Liquibase to pick up. If left out Liquibase will just skip the file and not include it in the build.
Line 3 and 4 are what Liquibase calls a change-set. You can think of it much like you think of a source control change set. Every commit is a change set and can be one file or many files. In Liquibase you break out separate changes into separate change-sets. So you may have 5 different statements in 1 change-set or maybe you will choose to create a separate change-set for the 5 statements, the chose is up to you. For now just understand that in order to add a change to the build you will need to add a change set. For more information on what the change-set settings are see http://www.liquibase.org/documentation/sql_format.html.
Let's now add a department table named public.employee.sql to the Tables directory.
public.employee.sql
Nothing special just another change-set, but in a different file. You may have noticed that both files have a change-set named C1. That's ok the name is scoped to the file so other files can have a C1 too without Liquibase thinking it's the same change-set.
Now we will add our foreign key that points from the employee to the department. In the ForeigKeys directory create a file named public.employee_FKS.sql and copy the below SQL into it.
public.employee_FKS.sql
It looks just like the table change-sets as far as the Liquibase comments are concerned.
Next let's add our function to the build. Create a file named public.get_department(ip_department_id INTEGER).sql under the Functions directory. I included the function signature in the file name because in PostgreSQL two functions can have the same names, but different parameters. So this is just future proofing the code.
public.get_department(ip_department_id INTEGER).sql
The only thing of importance to note on this file is the change-set runOnChange:true on line 3. For the tables and fk this was set to false. This setting allows us to override Liquibase standard behavior of failing the build if an existing object in the database has it's code modified due to a check-sum failure. Normally in the case of a table you have a initial create statement and you run it on your database creating the table. The next time you want to modify that table you need to write an alter statement, not modify the initial create statement. So you set the change-set setting runOnChange:false to keep the create from ever running again. With functions we don't want to have to copy code to a new change-set to make changes. We want to edit the existing code. Otherwise we would have to copy the entire function text and create a new change-set below the previous one, this would be messy at best. To combat this we tell Liquibase that we want this SQL script included again if it's ever changed, even if it has been released previously to the target database.
Prepare For Launch!
We are close to getting to run the build.
Now assuming all things have fallen into place open up a command prompt and cd to your mercury folder. Now execute the command liquibase updateSQL > updateVersion1.0.sql. This tells Liquibase to see what change-sets are in my files that haven't ever been logged to the changelog table and dump the sql to updateVersion1.0.sql. updateVersion1.0.sql will contain all the SQL that's needed to upgrade our mercury database to the current release.
If you get no error messages you will now have a file in your mercury folder called updateVersion1.0.sql. Any issues email me or comment and I will help you figure it out.
updateVersion1.0.sql
If you aren't feeling warm and fuzzy after the birth of this upgrade script I either completely missed my target in how I wrote this or you don't get excited very easily and need to cheer up. This is so simple and powerful, it's amazing! I love this! Can you feel it! I don't excite easily and this makes me want to wake up in the morning(scary or sad?).
Take a look at the upgrade script and study the order the objects were added to the file and how that correlates to how we setup our configuration files. Aside from the header and the two tables that Liquibase needs databasechangeloglock and databasechangelog and the inserts into those tables all the other SQL came out of our SQL scripts. The lock table keeps two or more instances of Liquibase from trying to apply an upgrade. This would cause crazy results and is a safely measure. However the only time Liquibase will write to this table is if you invoke the upgrade directly with Liquibase, not when you spit output to the sql script. Of course we can remedy this, but for now don't worry about that. The log table that is created captures the checksums and what change-sets have been applied to the database.
Now execute updateVersion1.0.sql against mercury. You should now have a function, fk, and two new tables, plus the two Liquibase tables(which was just a one time event).
Next go and query databasechangeloglock. Hopefully it makes some sense on first review. It also gives you a nice built in audit log of what and when things are executed on the database.
Now run Liquibase again this time like this liquibase updateSQL > updateVersion1.1.sql and open the sql file. You will see, minus some comments it's empty of SQL because we made no changes to our source code on the file system. Now let's add a new change-set to create the last_name column on the employee table in public.employee.sql and save it. In the real world we wold check this file out and add a new change-set with an alter table command like this.
public.employee.sql
The key thing to look at is our new change-set is named C2 not C1. If we put C1 as part of our new change-set identifier and execute Liquibase the build will fail telling us we have unique identifiers clashing, this is a good thing.
Now go back to the command prompt and execute liquibase updateSQL > updateVersion1.1.sql and take a look at updateVersion1.1.sql.
updateVersion1.1.sql
Nice! We have captured the changes for our next release and have a script that will upgrade our database.
I hope this has given you an appetite to take a look at evolving your database build process and taking Liquibase for a spin.
Really we just scratched the surface of using Liquibase for our database build and release process, but if this is not database nirvana I don't know what is. With just the most basic setup of Liquibase we have opened the doors to extending our process to new heights. For example I have hooked this type of setup into a TeamCity continuous integration build process and was able to know within a minute of check-in if a script was broken by building a smoke test database to validate scripts before they are included in the release for QA to test. On a large team this is a huge thing and saves a lot of time aka money.
If you want to see more content related to this process let me know and I will add a part 2 covering a related topic . Again if you have trouble or questions leave a comment or feel free to email me bob.henkel gmail.com