Thursday, February 20, 2014

PostgreSQL Does Autonomous Transactions With DBLINK

I created a small framework  to do autonomous transactions in PostgreSQL using dblink.  I use this to write to a log table regardless if the surrounding transaction rollbacks or not.  Very handy to instrument your stored functions and other logic with calls to log to the table.

https://github.com/henkbl/plog

Thursday, December 12, 2013

Liquibase Database Build Process Part 1 (Life can be better!)

Overview

Formalizing the database build process pays dividends, and I'm talking cash money dividends.  The database world is much like many programming ecosystems, like Java or .NET for example. It is a little different too, but at the end of the day code is code and it should be treated as critical to the success of business.  So why shouldn't the database code go through the same standard best practices that every other programming language I have used does? Things like unit testing and continuous integration, and logical release processes to name a few.  

Utilizing an open source tool called Liquibase your database build and release process can be sane. This post will go over the basic concepts and usage of Liquibase http://www.liquibase.org/. Liquibase is written in Java and is database independent.  The licence for Liquibase is Apache License, version 2 so no fear if you want to modify the source code and/or your budget is $0.00.

At a very simple high level Liquibase keeps track of what SQL scripts have been executed against a database and only acknowledges future scripts that haven't been executed against a database.  This might not sound very exciting, but it opens the doors to a sweet database build and release process.

High Level Benefits Of Sweet Database Build And Release Process

  • Database objects are sourced in version control and there's one central location for a given object.  This helps save time and removes human error of managing copies of the same code.
  • Builds are repeatable.  This is where most if not all database build process fall hard, not so with Liquibase. 
  • Builds can be automated.  This opens the door for continuous integration and unit tests. Reduce the risk of sending broken junk to QA or worse production.  In summary fail fast and early and roll in the cash savings.

The remainder of this post will walk through the creation of a simple database using Liquibase as the foundation of the process.

If you want to follow along hands on these are the items you will need:
(I'm doing this on Windows 7, but the OS shouldn't matter as long as you have the below items in place. I'm also invoking Liquibase from the command line.  Though there is support for Ant, Maven, Spring, Servlet Listener, and CDI Environment. Honestly I don' know what half of that even is.)

  • Java 1.6+ http://www.java.com/en/
  • Liquibase 3.0.7 or newer http://www.liquibase.org/download/index.html (liquibase-3.0.7-bin.zip or liquibase-3.0.7-bin.tar.gz). Extract to a new directory of your chose and add it to your path.To quickly validate that Liquibase is in working order go to a command prompt and type liquibase --version  and you should see Liquibase Version: 3.0.7 as output.
  • PostgreSQL 8.4+ (local or remote install), though any version that a JDBC driver can connect to should work http://www.postgresql.org/download/. You actually can use any database vendor for the most part that has a JDBC driver. My example will use PostgreSQL 9.1, but feel free to use what you want. If you want my SQL code to work use PostgreSQL as you go through the hands on exercise or adjust for your database.
  • Make sure you can connect to your install of PostgreSQL and create a new database called mercury. The quickest way may be to use the GUI tool called pgAdmin. For this exercise just log in as the postgres user.  This is the admin account and can do anything to the database so typically you wouldn't want to use this account so loosely, but for learning no worries.
  • Download and place the JDBC jar file for PostgreSQL on your file system and remember the location we will need it before we can build our database. http://jdbc.postgresql.org/download.html

High Level Technical Overview

Liquibase is a Java application that reads in a configuration file that tell it where to read your database scripts from on disk and compares it to a change log table it stores in the target database. Using a combination of file check-sum and some other data in each sql script Liquibase knows if for example a table already exists and to include it in the current build.  You can tell it to spit out a upgrade.sql script or to execute the needed changes on the target database. I prefer having Liquibase write out to an upgrade.sql file and if you have a DBA team they are more likely to buy into this approach. This gives you and the DBA team a tangible artifact that can be reviewed before it's just executed.

Liquibase Build Configurations

In order to make Liquibase do anything of value you need to first setup one or more configuration files. Your configuration file is XML and you call Liquibase and feed it this configuration so it knows where to look for your database source code. Out of the box Liquibase gives you the flexibility to setup your database source code in a few different ways.  There's full on XML, YAML, JSON, and plain SQL. The approach I take uses SQL as I have a database background and see no reason to abstract out a very easily digestible language like SQL.

To touch all points we will make use of our brand new mercury database we created previously.

  • Create a directory called C:\DatabaseSource\mercury
  • Create a sub directory called Tables, ForeignKeys, and Functions under the mercury directory.
  • Under the mercury directory create a file named 0_LiquiBaseBuildConfigMain.xml. With the following content.

0_LiquiBaseBuildConfigMain.xml


All of these items in the real world would be added to source control once we add files to them.   That is one of the benefits of using Liquibase. It allows you to logically place all your objects into your source control without any custom hack jobs I have seen on some home grown attempts at database build process. Not everyone would store the config file in source control, but that's up to you.

On to the 0_LiquiBaseBuildConfigMain.xml file, the header section can be ignored, focus on the stuff starting at line 8 and ending on line 10.  You will see that our main configuration file is making reference to three other configuration files. This is not required, but I have broken the configuration into a main file with a configuration specific to each database object. This helps modularize the configuration and helps future proof the setup. For this post the only objects we will be working with are tables, foreign keys, and functions to keep things simple. In the real world this would be extended out to schemas, users, views, and ... you get the drift. At this point the important thing is to notice the order of the directory references in the main configuration file. I create tables first followed by the fk's for the tables which is followed by the functions. I broke out the fk's from the table definition for a reason. By placing the fk reference after the tables Liquibase will first gather all the table scripts followed by all the fk's. This way we don't have to worry about ordering the table creation in any special sequence. The three object configuration scripts look like the following.

1_LiquiBaseBuildConfigTables.xml

2_LiquiBaseBuildConfigForeignKeys.xml

3_LiquiBaseBuildConfigFunctions.xml


In each configuration file take note that each one uses the tag includeAll.  This tells Liquibase to go into the target directory and grab all the scripts. Last I could tell it was alpha numeric.  If you don't do this you can also use the include tag and name off every single script precisely and in the exact order you want them added to your upgrade script. If needed this can be used entirely or mixed and matched with includeAll, as needed. I find if you can make use of includeAll your configuration stays smaller and easier to wrap your mind around. Cases may come up where you have to do an include, and that's ok.

As part of initial setup to make life easier for the long haul we will create a file named liquibase.properties under our mercury directory. This contains the values of the jar file we want to use along with the database connection settings.   You can also pass these along at the command line, but this is easier and more robust.

Copy the below text and place in your liquibase.properties file. Update the location of the jar file to your location and change the other settings as needed.

liquibase.properties


Database Source

Now that we have the configuration setup we need some database objects to add to our build. Let's add two tables,  a parent and child with a fk reference to the parent and one function that references the parent table.  This will show how we deal with dependency order, something that makes database builds a challenge compared to languages that have compilers that figure out this type of thing.

Under our Tables directory create a file named public.department.sql.  Inside of it paste the following.

public.department.sql


The special things to take note on are line 1 --liquibase formatted sql. This is a required SQL comment in any sql file you want Liquibase to pick up. If left out Liquibase will just skip the file and not include it in the build.

Line 3 and 4 are what Liquibase calls a change-set. You can think of it much like you think of a source control change set. Every commit is a change set and can be one file or many files.  In Liquibase you break out separate changes into separate change-sets.  So you may have 5 different statements in 1 change-set or maybe you will choose to create a separate change-set for the 5 statements, the chose is up to you.  For now just understand that in order to add a change to the build you will need to add a change set.  For more information on what the change-set settings are see http://www.liquibase.org/documentation/sql_format.html.

Let's now add a department table named public.employee.sql to the Tables directory.

public.employee.sql


Nothing special just another change-set, but in a different file.  You may have noticed that both files have a change-set named C1.  That's ok the name is scoped to the file so other files can have a C1 too without Liquibase thinking it's the same change-set.

Now we will add our foreign key that points from the employee to the department.  In the ForeigKeys directory create a file named public.employee_FKS.sql and copy the below SQL into it.

public.employee_FKS.sql


It looks just like the table change-sets as far as the Liquibase comments are concerned.

Next let's add our function to the build.  Create a file named public.get_department(ip_department_id INTEGER).sql under the Functions directory. I included the function signature in the file name because in PostgreSQL two functions can have the same names, but different parameters.  So this is just future proofing the code.

public.get_department(ip_department_id INTEGER).sql


The only thing of importance to note on this file is the change-set runOnChange:true on line 3. For the tables and fk this was set to false. This setting allows us to override Liquibase standard behavior of failing the build if an existing object in the database has it's code modified due to a check-sum failure. Normally in the case of a table you have a initial create statement and you run it on your database creating the table. The next time you want to modify that table you need to write an alter statement, not modify the initial create statement.  So you set the change-set setting runOnChange:false to keep the create from ever running again. With functions we don't want to have to copy code to a new change-set to make changes. We want to edit the existing code. Otherwise we would have to copy the entire function text and create a new change-set below the previous one, this would be messy at best. To combat this we tell Liquibase that we want this SQL script included again if it's ever changed, even if it has been released previously to the target database.

Prepare For Launch!

We are close to getting to run the build.

Now assuming all things have fallen into place open up a command prompt and cd to your mercury folder. Now execute the command liquibase updateSQL > updateVersion1.0.sql. This tells Liquibase to see what change-sets are in my files that haven't ever been logged to the changelog table and dump the sql to updateVersion1.0.sql.  updateVersion1.0.sql will contain all the SQL that's needed to upgrade our mercury database to the current release.

If you get no error messages you will now have a file in your mercury folder called updateVersion1.0.sql. Any issues email me or comment and I will help you figure it out.

updateVersion1.0.sql


If you aren't feeling warm and fuzzy after the birth of this upgrade script I either completely missed my target in how I wrote this or you don't get excited very easily and need to cheer up.  This is so simple and powerful, it's amazing! I love this!  Can you feel it!  I don't excite easily and this makes me want to wake up in the morning(scary or sad?).

Take a look at the upgrade script and study the order the objects were added to the file and how that correlates to how we setup our configuration files.  Aside from the header and the two tables that Liquibase needs databasechangeloglock and databasechangelog and the inserts into those tables all the other SQL came out of our SQL scripts.  The lock table keeps two or more instances of Liquibase from trying to apply an upgrade. This would cause crazy results and is a safely measure. However the only time Liquibase will write to this table is if you invoke the upgrade directly with Liquibase, not when you spit output to the sql script.  Of course we can remedy this, but for now don't worry about that. The log table that is created captures the checksums and what change-sets have been applied to the database.

Now execute  updateVersion1.0.sql against mercury.  You should now have a function, fk, and two new tables, plus the two Liquibase tables(which was just a one time event).

Next go and query databasechangeloglock.  Hopefully it makes some sense on first review. It also gives you a nice built in audit log of what and when things are executed on the database.

Now run Liquibase again this time like this liquibase updateSQL > updateVersion1.1.sql and open the sql file. You will see, minus some comments it's empty of SQL because we made no changes to our source code on the file system. Now let's add a new change-set to create the last_name column on the employee table in public.employee.sql and save it.  In the real world we wold check this file out and add a new change-set with an alter table command like this.

public.employee.sql


The key thing to look at is our new change-set is named C2 not C1. If we put C1 as part of our new change-set identifier and execute Liquibase the build will fail telling us we have unique identifiers clashing, this is a good thing.
Now go back to the command prompt and execute liquibase updateSQL > updateVersion1.1.sql and take a look at updateVersion1.1.sql.

updateVersion1.1.sql


Nice!  We have captured the changes for our next release and have a script that will upgrade our database.

I hope this has given you an appetite to take a look at evolving your database build process and taking Liquibase for a spin.

Really we just scratched the surface of using Liquibase for our database build and release process, but if this is not database nirvana I don't know what is. With just the most basic setup of Liquibase we have opened the doors to extending our process to new heights.  For example I have hooked this type of setup into a TeamCity continuous integration build process and was able to know within a minute of check-in if a script was broken by building a smoke test database to validate scripts before they are included in the release for QA to test.  On a large team this is a huge thing and saves a lot of time aka money.

If you want to see more content related to this process let me know and I will add a part 2 covering a related topic . Again if you have trouble or questions leave a comment or feel free to email me bob.henkel gmail.com

Thursday, February 19, 2009

PostgreSQL C Function Debugging

As of late I have been becoming more and more interested in PostgreSQL(pg). I have a few high level reasons for this interest. One the economy is down and it seems of value to learn about a full featured 100% free database. You never know when a database like pg might fill in a business need that would of been filled with an expensive proprietary database when the economy is strong. It's also got a nice set of features and a really great/smart/geeky community to help and support you.

This post will only focus on creating and debugging a c function that you can create in pg.

To read up on what has already been typed up on the nitty gritty of pg check out these links.
General all encompassing ->http://www.postgresql.org/docs/8.3/interactive/index.html
Stored Functions in c ->http://www.postgresql.org/docs/8.3/interactive/xfunc-c.html

I run my pg server on old desktop pc hardware. The CPU is 800MHZ and there is one gig of ram. The box is running FreeBSD 7.0 32-bit. Pg is version 8.3.5.

Here is my super simple and almost pointless C function that I used to grasp the basics creating and debugging a c function in pg.
/* foo.c*/
#include "postgres.h"
#include
#include "fmgr.h"

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

PG_FUNCTION_INFO_V1(c_loop);

Datum
c_loop(PG_FUNCTION_ARGS)
{
int32 arg = PG_GETARG_INT32(0);
while (arg >= 0)
{
arg--;
}

PG_RETURN_INT32(arg);
}

To create a c function and compile it follow the steps I have outlined bellow.

Prerequisites for how I did this:
  1. Pg server compiled with --enable-debug (NOT 100% sure if this is needed yet need to double check)
  2. The pg server up and running and you are able to connect to it.
  3. Make sure you have the pg source code on the same machine you are compiling on. To compile your c function you need some of the pg source code. I'm doing all my compiles on FreeBSD 7.0 so if you are using another OS look here http://www.postgresql.org/docs/8.3/interactive/xfunc-c.html in section titled 34.9.6. Compiling and Linking Dynamically-Loaded Functions.
  4. I'm logged in as the postgres UNIX account when creating my file and compiling. I'm also logged in as the postgres account when connecting to the database via psql command line tool.
  5. You have a c compiler and gdb installed.
Steps to compile and debug:
  1. Copy the code above into file foo.c and save it.
  2. Compile foo.c with debug symbols by adding the -g option. For example on FreeBSD you do this gcc -g -fpic -c foo.c -I/usr/home/postgres/src/include and then gcc -g -shared -o foo.so foo.o. You shouldn't get any errors if all goes well.
  3. Make sure your foo.so is in a location that the pg server can see. If postgres is the account running the server which most likely it will be make sure you put the file in a location that postgres has access to. I put mine in a directory that postgres owns full rights to.
  4. Execute the following from psql.
    CREATE FUNCTION public.c_loop(integer) RETURNS integer
    AS '/usr/home/postgres/foo', 'c_loop' /*Change to point to your foo file location*/
    LANGUAGE C STRICT;
    This creates a stored function in the database that we can call via SQL to invoke our C function.
  5. To make sure all is well with our new database function execute select public.c_loop(10); from psql. It should execute to completion and return -1.
  6. Now onto the debugging. Log into psql and LOAD the C function like this -> load '/usr/home/postgres/foo'; This will cause the pg server to load the object so we can debug it.
  7. Find the back end process that you are connected to in psql by running select pg_backend_pid(); while in the same psql session.
  8. Open another terminal session and run gdb for the pid returned from step 7 like this -> gdb /usr/local/pgsql/bin/postgres 27069
  9. Now from the gdb session you started in step 8 set a break point in the c function by typing -> b c_loop
  10. Now type continue or cont
  11. Go back to your psql session and execute the function via SQL like this -> SELECT public.c_loop(10);
  12. Go back to your gdb session and check that you hit the break point. You can now step through the C function in gdb. Type s and hit enter and you should be stepping through the code.
You now have a high performance stored function language that you can depend on for computational intensive needs and the added bonus of being able to debug the code with a debugger. Not bad for all of this being free. Feel free to ask questions and I can try and answer them.