Thursday, 15 November 2012

Mapping Value through Your Most Important Asset

Today I chatted with someone getting ready to go live with a new system and were in the process of making their legacy data fit with the new platform. It was a familiar scenario with the existing data from a 10 year old system, that grew out of nothing, had little validation and a data structure creaking at the seams. 
Everyone benefits from a combined approach 
towards better data quality.

With most ageing systems there are variations of the same products, companies and contacts. The resulting duplicates are generally held together by long serving staff who know the data intimately and can readily reason why it sits in the way it does.

Migration to a new platform with added  validation and business rules can pose interesting challenges for any BI or BA consultant but therein lies an opportunity. 

A rationalised mapping approach that involves key business users in a sit down exercise will give value back to their most important asset. By creating two-column lists with distinct values of legacy data on the left and their equivalent on the new platform on the right is a slow and hard slog but worth the effort. It can be done where variations exist in company names, product names or any other referential information. 

Those long serving serving staff that hold dupes together hold the value in this exercise in delivering meaningful mappings. It's an exercise in exchanging ideas and will result in rethinks to the functionality of the new platform as the nature of the data is explored. It will also result in a cleaner data set.

From a techie point of view the mappings form part of the data transformation between legacy and destination platforms which the data is driven through en route to success.

This sit down exercise has huge benefits as Insight relies on aspects of search, results, related results, relevance and ranking. These are the ingredients of the moment as most businesses reach out across a complicated digital landscape.

Monday, 5 November 2012

A Little Space Saving When Migrating Dates

Many years ago I worked on a Human Resources and Payroll system that sat on a Northgate system. 

Those familiar with Northgate systems will tell you the platform is based on the Pick operating system. All data is held in a character format and dates are held as numbers. 

Each date is a number relative to the 1st January 1970 so 365 would be 1st January 1971 and -365 would be 1st January 1969 and so forth. 

This notion stuck with me and have since used the same approach a couple of times when migrating large volumes data between platforms using flat files. 

The method of using views to migrate dates as a number relative to a chosen date using DATEDIFF function (SQL Server/Oracle) will save you space in any extract file. Times can also be tackled the same way output as seconds, thousandths of seconds, or even nano seconds relative to mid-night. 

Moving these dates onto the target platform is then achieved using the DATEADD function (SQL Server/Oracle) by adding the number migrated to the relative date chosen against  the previous extract.

This might seem like a little issue and little saving but with high data volumes prevalent these days it might just make a little bit of sense.

' hope this helps.....a little.

Enjoy :)

Tuesday, 30 October 2012

Store and Report Unlimited Multiple Choices using Exponentiation

non-SQL developers look away now....

Much is in the air at the moment about digital tsunami, the Big Data movement and Business Intelligence technologies to name but a few. This post offers a concept that may reduce a tiny bit of the space along the way by showing how to store multiple choice answers in a single location.

One concept of storing option choices from forms, surveys and online exams for example is to store them combined as a single value. This removes the headache, sorry overhead, of dealing with dozens of boolean yes/no values that appear in a raw dataset and allows the Business Intelligence expert a more manageable view of what's on her plate. 

The method takes advantage of exponentiation where the values against option boxes are used as the exponent or base against an index value of 2.  
I said any number of choices but this is actually limited to a database server's own limitations or the  limits imposed by cloud based systems as part of their own governance rules.


Let's say we have a form on a web page that has 5 option boxes and a person completes the form and chooses the first, third and fifth choices. 

Each option box should have assigned values 1 to 5 respectively and based on the choices made (1,3 and 5) we derive the single value to store as follows : 

21 (option box 1) which is 2
23 (option box 3) which is 8
25 (option box 5) which is 32

So the single number to store that represents the choices 1,3 and 5 becomes 2 + 8 + 32 which is 42

We of course need to unravel the number 42 to display the choices back onto the page or app ( hint here to the Meaning of Life!). To do this we start by taking the highest valued choice box and test if it's associated 25 value can be subtracted from 42 and if it can then we mark it's equivalent check box. The remainder is then used against the next lower value option box and repeated until we reach the lowest value box. 

The steps below outline the full approach for this example.

1. Is 42 - 25 (25) greater or equal to 0?  
    42 - 25 = 10.
    Yes it is so we subtract it and use the remainder in the next step.

    We also mark the fifth choice option box as checked. 
  2. Is 10 - 24 (16) greater or equal to 0?
    10 - 16 = -6
    No it's not so we leave it and don't check option box 4. Move to the next step.

3. Is 10 - 23 (8) greater or equal to 0?
    10 - 8 = 2
    Yes it is so we subtract it and use the remainder in the next step.

    We also mark the third choice option box as checked. 

4. Is 2 - 22 (4) greater or equal to 0?
    2 - 4 = -2
    No it's not so we leave it and don't check option box 2.

5. Is 2 - 21 (2) greater or equal to 0?
    2 - 2 = 0
    Yes it is.

    We also mark the first choice option box as checked. 

This will now display option box 1,3 and 5 as checked.

Lastly, the single stored value will need reported on and for this you need to create a view which will be made up of a series of case statements one for each yes/no choice based on the logic above. 
For those using SQL Server then a Common Table Expression (CTE) would provide a more efficient alternative.

I have offered a concept here without any implementation code. This is because the problem can be solved using SQL, JavaScript, PHP, Ruby, C# or many other alternatives depending on the platform in use. 

I hope this helps :)

Enjoy !

Thursday, 25 October 2012

Letting Views do the Work

non-SQL developers look away now....

Businesses need to know how they're doing on any given day. 
Any financial reporting system will have comparisons of the current month, quarter or year against it's previous respective time periods in order to track it's KPI values.

The key dates used in these comparisons move all the time and one easy way to configure any reporting output is by using views. 

The DBA Developer can create a view such as the one below which returns only one row. This row consists of four dates: first and last day of the current month and the first and last day of the previous month. 

Each of the four dates are constructed by building a string using combination the DATEPART and DATEADD functions which are then converted to a date time format.

create view vw_key_report_dates
select    convert(datetime,'01-' + right('0' + cast(datepart(MM,dbo.dt_notime()) as varchar(2)),2) + '-'
            + cast(datepart(YYYY,dbo.dt_notime()) as varchar(4)),105) as curr_mon_first_dy
             ,dateadd(m,1,convert(datetime,'01-' + right('0' + cast(datepart(MM,dbo.dt_notime()) as varchar(2)),2) + '-'
            + cast(datepart(YYYY,dbo.dt_notime()) as varchar(4)),105))-1 as curr_mon_last_dy
            ,dateadd(m,-1,convert(datetime,'01-' + right('0' + cast(datepart(MM,dbo.dt_notime()) as varchar(2)),2) + '-'
            + cast(datepart(YYYY,dbo.dt_notime()) as varchar(4)),105)) as prev_mon_first_dy
            ,dateadd(d,-1,convert(datetime,'01-' + right('0' + cast(datepart(MM,dbo.dt_notime()) as varchar(2)),2) + '-'
            + cast(datepart(YYYY,dbo.dt_notime()) as varchar(4)),105)) as prev_mon_last_dy

select *
from vw_key_report_dates

curr_mon_first_dy    curr_mon_last_dy  prev_mon_first_day   prev_mon_last_dy
2012-10-01 00:00     2012-10-31 00:00  2012-09-01 00:00     2012-09-30 00:00

(1 row(s) affected)

This view becomes useful when extended to define quarters, years and any accounting periods that the business may be. It also becomes useful to the developer as they can let the view take care of setting the key dates in their reporting system by taking advantage of the always moving the database server's system date. This ensures that the view moves in time with the business.

The simple report below returns the total revenue value for the previous month. It does this by using an exotic join between the revenue table and the view joining on the range of dates required set as prev_mon_first_day and  prev_mon_last_dy.

select sum(rev_value) as total_last_month
from revenue_table rt
join vw_key_report_dates vw 
on rt.rev_date between prev_mon_first_day and prev_mon_last_dy

I hope this helps :)


Wednesday, 24 October 2012

Generating Unique Values Against a New Dataset

non-SQL developers look away now....

Here's another Data Analysis tip...

When dealing with brand new datasets a DBA Developer sometimes needs to add a unique value which will form a way of identifying each row. The ROW_NUMBER() function is an obvious choice but if the data contains duplicates or has data quality issues it can be tricky. This method doesn't rely on the data itself, it just adds the unique values you need.

First we add a column that will contain the incremental number......

-- add a new column called rownum
alter table test_data add rownum int

Now we run the following three line script......

-- declare a variable that be used to assign incremental numbers
declare @i int
--assign a starting point for our id's

select @i = 10000000
-- The update statement will populate the column 
-- with consecutive numbers starting with 10000001
update test_data set @i = rownum = @i +1

I used 10000000 as a starting point so that each number will be the same length and will appear consistent when used in a reporting scenario and when exported back out to tools such as Microsoft Excel. 

One to have in your SQL tool kit for sure.....Enjoy!

Thursday, 18 October 2012

ThePower of Exotic Joins and why having a Calendar table is useful

non-SQL developers look away now....

The Scenario:

On her way back from the polo club, the director of marketing was driving across town and noticed several different billboard campaigns been ran by his company. Concerned about the inconsistency of how her brand’s identity was been rolled out he posed a question to his PA to find out all campaigns currently in progress.

The Problem:
In our scenario each campaign has a start and end date along with a campaign id. Let’s say they reside in a table called campaign which is defined as follows:

       campaign_id int
      ,date_start  datetime
      ,date_from   datetime

For this example the following data can be inserted which shows that campaign id’s 1 and 3 will overlap:

insert into dbo.campaign values( 1,'01-jan-2011','10-jan-2011')
insert into dbo.campaign values( 2,'01-feb-2011','10-feb-2011')
insert into dbo.campaign values( 3,'05-jan-2011','09-jan-2011')
insert into dbo.campaign values( 4,'01-mar-2011','10-mar-2011')
insert into dbo.campaign values( 5,'01-jan-2011','10-jan-2011')
insert into dbo.campaign values( 6,'01-feb-2011','10-feb-2011')
insert into dbo.campaign values( 7,'05-jan-2011','09-jan-2011')
insert into dbo.campaign values( 8,'01-mar-2011','10-mar-2011')
insert into dbo.campaign values( 9,'01-jan-2011','10-jan-2011')
insert into dbo.campaign values(10,'01-feb-2011','10-feb-2011')
insert into dbo.campaign values(11,'05-jan-2011','09-jan-2011')
insert into dbo.campaign values(12,'01-mar-2011','10-mar-2011')
insert into dbo.campaign values(13,'01-jan-2011','10-jan-2011')
insert into dbo.campaign values(14,'01-feb-2011','10-feb-2011')
insert into dbo.campaign values(15,'05-jan-2011','09-jan-2011')
insert into dbo.campaign values(16,'01-mar-2011','10-mar-2011')
insert into dbo.campaign values(17,'01-jan-2011','10-jan-2011')
insert into dbo.campaign values(18,'01-feb-2011','10-feb-2011')
insert into dbo.campaign values(19,'05-jan-2011','09-jan-2011')
insert into dbo.campaign values(20,'01-mar-2011','10-mar-2011')

Each row of data holds a start and end date range which we need to compare against each of the other rows in the campaign table. From a Business Intelligence view point this data doesn't present much granularity.

Date ranges stored in this fashion may also have associated hidden business implications linked to types of days such as week days, weekend days or bank holidays which could distort the desired result. This would lead to assumptions and misconceptions leading to non-realistic reporting.

The Solution:
For a set based approach we need to transform each row of summarized data defined by the date range into individual rows. 

To achieve this we need another table called calendar which, for the purposes of this scenario, has only one column called date. The date column should be populated consecutive dates that will span all start and end dates in the campaign table .
 TABLE calendar(date  datetime)

The resultant record set from an exotic join will form a collection of individual days of when each campaign is/was in force.

This derived record set provides a starting point to the solution.

Let’s take the full list of steps in order:
1. Transform the date range data set into a record set of individual days.
     Include the campaign id and create a derived table which we can then call dr1.

2. Make a copy of step 1 and create a derived table which we can then call dr2.
3. Create a select statement that joins both derived tables on the day column key and identify the common overlapping days.
4. Add a where clause to exclude equal campaign id’s across both derived tables so that joined rows (ie dates) are only on differing campaigns.

Step 1: Transform the date ranges and create the derived table dr1.
By using the between keyword we specify the date range to join onto the calendar table. This will spawn the rows we are interested in.

      --will be used as derived table dr1
select      c1.campaign_id
      from dbo.campaign c1
join dbo.calendar c2 on c2.date_day
between c1.date_start and c1.date_from
      where c1.campaign_id is not null

Step 2: Same as Step 1 that creates the derived table dr2.

      --will be used as derived table dr2
select      c1.campaign_id
      from dbo.campaign c1
join dbo.calendar c2 on c2.date_day
between c1.date_start and c1.date_from
      where c1.campaign_id is not null

Step 3 and 4: The SELECT statement joins both derived tables on the day column key and identifies the common overlapping days. The WHERE clause is then used to exclude campaign's that are common across both derived tables.

    select  distinct dr1.campaign_id
      select      c1.campaign_id
      from dbo.campaign c1
join dbo.calendar c2 on c2.date_day
between c1.date_start and c1.date_from
      where c1.campaign_id is not null
      ) dr1
      select      c1.campaign_id
      from dbo.campaign c1
join dbo.calendar c2 on c2.date_day 
between c1.date_start and c1.date_from
      where c1.campaign_id is not null
      ) dr2
      on dr1.date_day = dr2.date_day
      where dr1.campaign_id <> dr2.campaign_id


(2 row(s) affected)

This shows the power of joining on ranges and the power of using a calendar table. The calendar table can also be extended to include columns that depict days such as bank holidays, weekends which may affect any Business Intelligence reporting system.

Thus endith the lesson :)

Wednesday, 10 October 2012

BR Virtual Summit 2012

Brand Republic Virtual Summit 2012

Today I attended what is the future of summits and conferences organised by the well the respected Brand Republic group.

I've been a keen reader of Brand Republic's Campaign publication for the past 6 years and seen it's commentary change throughout that time from traditional type of marketing (billboard/newspaper ads, inserts and paper promotions etc) to digital marketing which is so very much of today's engagement. Opening a copy of Campaign these days and it's coverage is almost all centred around digital media.
On registering for the summit each delegate is asked for their basic details such as  company and position. After a short tutorial on how to navigate through the auditorium you are free to wander through using your avatar and visit the stands on display, attend webinars, download brochures and engage with other avatars attending.
Each stand has a representative from a company there to offer advice and engage with via their own avatar. Brand Republic's own stand was there offering live sessions and I signed up to some on both days.
The first of which was a keynote speech from BR's senior reporter which set the context of the event by outlining where the industry is today. The key note reminded listeners that difficult economics times are still with us and the importance of a more intuitive and better customer experience is essential for any business to survive and expand. 

The growing complexity that faces CMOs are bound together by huge data growth, multi-channel, multi-platform, social media and the shift in customer demographics. Social engagement is a sophisticated business these days and one that can swing positive or negative at any given time. Because of this a combination of good technology and people need to be in place to deliver effectively and with some affinity.

In terms of building a framework that will engage with a social media environment the best advice came from O2's Social Media Manager UK, Kristian Lorenzon. Chris advised "Start small, think big". O2's social media team that now sits at 15 and answers queries fielded through Twitter and Facebook from 1000's of customers. The team of course announce new products and services engaging with customers across all the main social network platforms.
The seminar, partly titled "how to turn a crisis into a triumph",  allowed  Kristian to recount the day in July when O2's network went down. The outage prompted the biggest social media event ever for O2 with more that 120,000 tweets. Several unedited examples of tweets where shown during the seminar which were very direct, very course and included threats to cancel contracts. 
Turning this situation around meant keeping cool and getting the team answering each message individually thus slowly turning the on-line sentiment around. Not panicking, staying in control,  knowing your audience and acknowledging each individual's frustrations were key factors to this slow recovery.
In the days that followed O2 announced a goodwill gesture to all affected customers against their September bill along with an in-store voucher for £10. This prompted over 10,000 goodwill messages in return over the two days that followed.
The case study showed just how powerful the medium is and highlighted the sensitivity of a group needing a vital service.
The seminar ran three interactive polls during the course of the hour with one poll posing the question "What is the main driver for your use of social media". This poll returned a close draw  between "Building brand awareness" and "Creating audience engagement". The theme common to both is that of a relationships.  
The other two choices of "Driving Sales" and "We don't use social media" scored almost no votes. 

You can follow a recording of this live event here :
Social Media - it's great but show me the money !

Later in the day I attended another live event which focused on online advertising guested by Louise Bolt, Digital Marketing Officer, from Macmillan Cancer Support. With a tight budget to work with, taken up mostly by TV  spend, the remainder is spread over running campaigns promoting the many different areas that make up the charity. The big message from this live session is a continual trial and error approach and see what works and what doesn't. By running different campaigns across many platforms responses can be measured by the traffic driven to the charity's website and how engaged each user is when they got there. Working towards an optimised approach that drives engagement is the objective here. By reaching new users and raising the awareness of the charity and targeting contextually on relevant websites the team at Macmillan are able to achieve positive results. Use of analytics to measure user engagement and KPIs are key.
Louise outlined a growing trend towards traffic coming from mobiles with last year accounting for 30% and this area would be tested and tried further as time goes on.

The message I took aware from this live event was testing, trial and error, more testing and learning from what went before. Building as you progress is the approach to gaining a creditable online audience.

You can follow a recording of this live event here :
Online Advertising - ignored window dressing or targeted sales-driver

This was a great conference and would not have been possible without the sponsors:

and of course....

Brand Republic