Friday 4 July 2014

Manningtree, The Gateway to Constable Country

Manningtree Station, Constable Country
Manning Tree Station
Constable Country is so called because of it's links to John Constable, the English painter painted his best known works in and around Dedham Vale. These romantic pieces include famouse The Hay Wain. To get to Dedham Vale take the train from Liverpool Street station to Manning Tree Station and follow the country path along the River Stour.

When you get to Dedham Vale The Hay Wain awaits which you can enjoy over tea in a garden courtesy of the National Trust.

You can reach Dedham from Liverpool Street by taking a train to Manningtree station or alternatively Flatford.
See National Rail Enquiries for more details.
 

Thursday 15 November 2012

Mapping Value through Your Most Important Asset

Today I chatted with someone getting ready to go live with a new system and were in the process of making their legacy data fit with the new platform. It was a familiar scenario with the existing data from a 10 year old system, that grew out of nothing, had little validation and a data structure creaking at the seams. 
Everyone benefits from a combined approach 
towards better data quality.

With most ageing systems there are variations of the same products, companies and contacts. The resulting duplicates are generally held together by long serving staff who know the data intimately and can readily reason why it sits in the way it does.

Migration to a new platform with added  validation and business rules can pose interesting challenges for any BI or BA consultant but therein lies an opportunity. 

A rationalised mapping approach that involves key business users in a sit down exercise will give value back to their most important asset. By creating two-column lists with distinct values of legacy data on the left and their equivalent on the new platform on the right is a slow and hard slog but worth the effort. It can be done where variations exist in company names, product names or any other referential information. 

Those long serving serving staff that hold dupes together hold the value in this exercise in delivering meaningful mappings. It's an exercise in exchanging ideas and will result in rethinks to the functionality of the new platform as the nature of the data is explored. It will also result in a cleaner data set.

From a techie point of view the mappings form part of the data transformation between legacy and destination platforms which the data is driven through en route to success.

This sit down exercise has huge benefits as Insight relies on aspects of search, results, related results, relevance and ranking. These are the ingredients of the moment as most businesses reach out across a complicated digital landscape.

Monday 5 November 2012

A Little Space Saving When Migrating Dates

Many years ago I worked on a Human Resources and Payroll system that sat on a Northgate system. 

Those familiar with Northgate systems will tell you the platform is based on the Pick operating system. All data is held in a character format and dates are held as numbers. 

Each date is a number relative to the 1st January 1970 so 365 would be 1st January 1971 and -365 would be 1st January 1969 and so forth. 

This notion stuck with me and have since used the same approach a couple of times when migrating large volumes data between platforms using flat files. 

The method of using views to migrate dates as a number relative to a chosen date using DATEDIFF function (SQL Server/Oracle) will save you space in any extract file. Times can also be tackled the same way output as seconds, thousandths of seconds, or even nano seconds relative to mid-night. 

Moving these dates onto the target platform is then achieved using the DATEADD function (SQL Server/Oracle) by adding the number migrated to the relative date chosen against  the previous extract.

This might seem like a little issue and little saving but with high data volumes prevalent these days it might just make a little bit of sense.

' hope this helps.....a little.

Enjoy :)

Tuesday 30 October 2012

Store and Report Unlimited Multiple Choices using Exponentiation

non-SQL developers look away now....

Much is in the air at the moment about digital tsunami, the Big Data movement and Business Intelligence technologies to name but a few. This post offers a concept that may reduce a tiny bit of the space along the way by showing how to store multiple choice answers in a single location.

One concept of storing option choices from forms, surveys and online exams for example is to store them combined as a single value. This removes the headache, sorry overhead, of dealing with dozens of boolean yes/no values that appear in a raw dataset and allows the Business Intelligence expert a more manageable view of what's on her plate. 


The method takes advantage of exponentiation where the values against option boxes are used as the exponent or base against an index value of 2.  
I said any number of choices but this is actually limited to a database server's own limitations or the  limits imposed by cloud based systems as part of their own governance rules.

Scenario:

Let's say we have a form on a web page that has 5 option boxes and a person completes the form and chooses the first, third and fifth choices. 

Each option box should have assigned values 1 to 5 respectively and based on the choices made (1,3 and 5) we derive the single value to store as follows : 

21 (option box 1) which is 2
+
23 (option box 3) which is 8
+
25 (option box 5) which is 32

So the single number to store that represents the choices 1,3 and 5 becomes 2 + 8 + 32 which is 42

We of course need to unravel the number 42 to display the choices back onto the page or app (p.s...no hint here to the Meaning of Life!). To do this we start by taking the highest valued choice box and test if it's associated 25 value can be subtracted from 42 and if it can then we mark it's equivalent check box. The remainder is then used against the next lower value option box and repeated until we reach the lowest value box. 

The steps below outline the full approach for this example.

1. Is 42 - 25 (25) greater or equal to 0?  
    42 - 25 = 10.
    Yes it is so we subtract it and use the remainder in the next step.

    We also mark the fifth choice option box as checked. 
  2. Is 10 - 24 (16) greater or equal to 0?
    10 - 16 = -6
    No it's not so we leave it and don't check option box 4. Move to the next step.

3. Is 10 - 23 (8) greater or equal to 0?
    10 - 8 = 2
    Yes it is so we subtract it and use the remainder in the next step.

    We also mark the third choice option box as checked. 

4. Is 2 - 22 (4) greater or equal to 0?
    2 - 4 = -2
    No it's not so we leave it and don't check option box 2.

5. Is 2 - 21 (2) greater or equal to 0?
    2 - 2 = 0
    Yes it is.

    We also mark the first choice option box as checked. 

This will now display option box 1,3 and 5 as checked.

Lastly, the single stored value will need reported on and for this you need to create a view which will be made up of a series of case statements one for each yes/no choice based on the logic above. 
For those using SQL Server then a Common Table Expression (CTE) would provide a more efficient alternative.

I have offered a concept here without any implementation code. This is because the problem can be solved using SQL, JavaScript, PHP, Ruby, C# or many other alternatives depending on the platform in use. 

I hope this helps :)

Enjoy !

Thursday 25 October 2012

Letting Views do the Work





non-SQL developers look away now....

Businesses need to know how they're doing on any given day. 
Any financial reporting system will have comparisons of the current month, quarter or year against it's previous respective time periods in order to track it's KPI values.

The key dates used in these comparisons move all the time and one easy way to configure any reporting output is by using views. 

The DBA Developer can create a view such as the one below which returns only one row. This row consists of four dates: first and last day of the current month and the first and last day of the previous month. 

Each of the four dates are constructed by building a string using combination the DATEPART and DATEADD functions which are then converted to a date time format.

create view vw_key_report_dates
as
select    convert(datetime,'01-' + right('0' + cast(datepart(MM,dbo.dt_notime()) as varchar(2)),2) + '-'
            + cast(datepart(YYYY,dbo.dt_notime()) as varchar(4)),105) as curr_mon_first_dy
             ,dateadd(m,1,convert(datetime,'01-' + right('0' + cast(datepart(MM,dbo.dt_notime()) as varchar(2)),2) + '-'
            + cast(datepart(YYYY,dbo.dt_notime()) as varchar(4)),105))-1 as curr_mon_last_dy
            ,dateadd(m,-1,convert(datetime,'01-' + right('0' + cast(datepart(MM,dbo.dt_notime()) as varchar(2)),2) + '-'
            + cast(datepart(YYYY,dbo.dt_notime()) as varchar(4)),105)) as prev_mon_first_dy
            ,dateadd(d,-1,convert(datetime,'01-' + right('0' + cast(datepart(MM,dbo.dt_notime()) as varchar(2)),2) + '-'
            + cast(datepart(YYYY,dbo.dt_notime()) as varchar(4)),105)) as prev_mon_last_dy


select *
from vw_key_report_dates

curr_mon_first_dy    curr_mon_last_dy  prev_mon_first_day   prev_mon_last_dy
2012-10-01 00:00     2012-10-31 00:00  2012-09-01 00:00     2012-09-30 00:00

(1 row(s) affected)

This view becomes useful when extended to define quarters, years and any accounting periods that the business may be. It also becomes useful to the developer as they can let the view take care of setting the key dates in their reporting system by taking advantage of the always moving the database server's system date. This ensures that the view moves in time with the business.

The simple report below returns the total revenue value for the previous month. It does this by using an exotic join between the revenue table and the view joining on the range of dates required set as prev_mon_first_day and  prev_mon_last_dy.


select sum(rev_value) as total_last_month
from revenue_table rt
join vw_key_report_dates vw 
on rt.rev_date between prev_mon_first_day and prev_mon_last_dy


I hope this helps :)

Enjoy!

Wednesday 24 October 2012

Generating Unique Values Against a New Dataset

non-SQL developers look away now....

Here's another Data Analysis tip...

When dealing with brand new datasets a DBA Developer sometimes needs to add a unique value which will form a way of identifying each row. The ROW_NUMBER() function is an obvious choice but if the data contains duplicates or has data quality issues it can be tricky. This method doesn't rely on the data itself, it just adds the unique values you need.

First we add a column that will contain the incremental number......


-- add a new column called rownum
alter table test_data add rownum int

Now we run the following three line script......

-- declare a variable that be used to assign incremental numbers
declare @i int
--assign a starting point for our id's

select @i = 10000000
-- The update statement will populate the column 
-- with consecutive numbers starting with 10000001
update test_data set @i = rownum = @i +1

I used 10000000 as a starting point so that each number will be the same length and will appear consistent when used in a reporting scenario and when exported back out to tools such as Microsoft Excel. 

One to have in your SQL tool kit for sure.....Enjoy!



Thursday 18 October 2012

ThePower of Exotic Joins and why having a Calendar table is useful


non-SQL developers look away now....

The Scenario:

On her way back from the polo club, the director of marketing was driving across town and noticed several different billboard campaigns been ran by his company. Concerned about the inconsistency of how her brand’s identity was been rolled out he posed a question to his PA to find out all campaigns currently in progress.

The Problem:
In our scenario each campaign has a start and end date along with a campaign id. Let’s say they reside in a table called campaign which is defined as follows:


CREATE TABLE campaign
      (
       campaign_id int
      ,date_start  datetime
      ,date_from   datetime
      )


For this example the following data can be inserted which shows that campaign id’s 1 and 3 will overlap:


insert into dbo.campaign values( 1,'01-jan-2011','10-jan-2011')
insert into dbo.campaign values( 2,'01-feb-2011','10-feb-2011')
insert into dbo.campaign values( 3,'05-jan-2011','09-jan-2011')
insert into dbo.campaign values( 4,'01-mar-2011','10-mar-2011')
insert into dbo.campaign values( 5,'01-jan-2011','10-jan-2011')
insert into dbo.campaign values( 6,'01-feb-2011','10-feb-2011')
insert into dbo.campaign values( 7,'05-jan-2011','09-jan-2011')
insert into dbo.campaign values( 8,'01-mar-2011','10-mar-2011')
insert into dbo.campaign values( 9,'01-jan-2011','10-jan-2011')
insert into dbo.campaign values(10,'01-feb-2011','10-feb-2011')
insert into dbo.campaign values(11,'05-jan-2011','09-jan-2011')
insert into dbo.campaign values(12,'01-mar-2011','10-mar-2011')
insert into dbo.campaign values(13,'01-jan-2011','10-jan-2011')
insert into dbo.campaign values(14,'01-feb-2011','10-feb-2011')
insert into dbo.campaign values(15,'05-jan-2011','09-jan-2011')
insert into dbo.campaign values(16,'01-mar-2011','10-mar-2011')
insert into dbo.campaign values(17,'01-jan-2011','10-jan-2011')
insert into dbo.campaign values(18,'01-feb-2011','10-feb-2011')
insert into dbo.campaign values(19,'05-jan-2011','09-jan-2011')
insert into dbo.campaign values(20,'01-mar-2011','10-mar-2011')


Each row of data holds a start and end date range which we need to compare against each of the other rows in the campaign table. From a Business Intelligence view point this data doesn't present much granularity.

Date ranges stored in this fashion may also have associated hidden business implications linked to types of days such as week days, weekend days or bank holidays which could distort the desired result. This would lead to assumptions and misconceptions leading to non-realistic reporting.


The Solution:
For a set based approach we need to transform each row of summarized data defined by the date range into individual rows. 

To achieve this we need another table called calendar which, for the purposes of this scenario, has only one column called date. The date column should be populated consecutive dates that will span all start and end dates in the campaign table .
CREATE
 TABLE calendar(date  datetime)




The resultant record set from an exotic join will form a collection of individual days of when each campaign is/was in force.

This derived record set provides a starting point to the solution.

Let’s take the full list of steps in order:
1. Transform the date range data set into a record set of individual days.
     Include the campaign id and create a derived table which we can then call dr1.

2. Make a copy of step 1 and create a derived table which we can then call dr2.
3. Create a select statement that joins both derived tables on the day column key and identify the common overlapping days.
4. Add a where clause to exclude equal campaign id’s across both derived tables so that joined rows (ie dates) are only on differing campaigns.


Step 1: Transform the date ranges and create the derived table dr1.
By using the between keyword we specify the date range to join onto the calendar table. This will spawn the rows we are interested in.


      --will be used as derived table dr1
select      c1.campaign_id
                  ,c2.date_day
      from dbo.campaign c1
join dbo.calendar c2 on c2.date_day
between c1.date_start and c1.date_from
      where c1.campaign_id is not null


Step 2: Same as Step 1 that creates the derived table dr2.


      --will be used as derived table dr2
select      c1.campaign_id
                  ,c2.date_day
      from dbo.campaign c1
join dbo.calendar c2 on c2.date_day
between c1.date_start and c1.date_from
      where c1.campaign_id is not null



Step 3 and 4: The SELECT statement joins both derived tables on the day column key and identifies the common overlapping days. The WHERE clause is then used to exclude campaign's that are common across both derived tables.


    select  distinct dr1.campaign_id
    from
      (
      select      c1.campaign_id
                  ,c2.date_day
      from dbo.campaign c1
join dbo.calendar c2 on c2.date_day
between c1.date_start and c1.date_from
      where c1.campaign_id is not null
      ) dr1
      join
      (
      select      c1.campaign_id
                  ,c2.date_day
      from dbo.campaign c1
join dbo.calendar c2 on c2.date_day 
between c1.date_start and c1.date_from
      where c1.campaign_id is not null
      ) dr2
      on dr1.date_day = dr2.date_day
      where dr1.campaign_id <> dr2.campaign_id

      Results
campaign_id
-----------
1
3

(2 row(s) affected)


This shows the power of joining on ranges and the power of using a calendar table. The calendar table can also be extended to include columns that depict days such as bank holidays, weekends which may affect any Business Intelligence reporting system.


Thus endith the lesson :)