Fantasy Football – When is a fact not a Fact?

February 2nd, 2010 by Adrian Ward

As you will see from Venkat’s Football blog, we ran a demo of the OBIEE system at last years UKOUG.  The demo was based upon the English Football Premiership, and although I prefer rugby, using sports in samples is a great way to highlight the features of OBIEE.  It seems that some sports, such as Cricket, American Football and Tennis seem to compete on the number of statistics they can find in a match!

You don’t need to understand the English football league to understand the demo, however, for football managers there are dozens of facts that can be analysed, and can be related too.
For example, you can run a request to show teams that are level at half time, but go on to lose the game in the second half.  As a coach you may draw from that fitness needs to be improved, along with concentration (more drinks?), and a better half time talk.

So you can use the ’statistics’ to help identify where things are going wrong, and when you are doing the right things, for example losing at away teams far away (change the travel arrangements?).

The great thing with sports stats is that you can also analyse your opponent to find their weaknesses – do they concede early goals (we have to come out strong and attacking from the whistle).

The other ‘interesting’ thing about using sports results in OBIEE is that they clearly demonstrate how facts are sometimes measures and sometimes dimensional attributes…..

Reporting on Fact based Dimensional Attributes

Lets use an example to explore this:
The exam question is this:

How many games was my team losing at half time, but then went on to win.

We will then drill into these games to try to understand the cause of this.

Remember the question for later!

The measures we have employed include:

  • Goals scored
  • Home Team Score at Half Time
  • Home Team Score at Full time
  • Away Team Score at Half Time
  • Away Team Score at Full time
  • Home Team Red cards,
  • Home Team Yellow Cards,
  • etc

These measures are set with an aggregation of SUM, but we could also set many of the other aggegations, such as AVERAGE.
If you run the following request,

Home Stats

the SQL generated is:

SELECT T6428.YEAR AS C1,
T6428.HOMETEAM AS C2,
‘HOME’ AS C3,
SUM(T6428.FTHG) AS C4,
SUM(T6428.FTAG) AS C5
FROM
PREMIER_LEAGUE T6428 /* HOME TEAM – PREMIER LEAGUE */
WHERE ( T6428.YEAR = ‘2009′ )
GROUP BY T6428.HOMETEAM, T6428.YEAR
ORDER BY C1, C2, C3
** FTHG = Full Time Home Team Goals
** FTAG = Full Time Away Team Goals

Note the “SUM(T6428.FTHG) AS C4″ which is from the Sum Aggregate in the logical Column.

Quick Data Modelling Aside

Before I go further a quick note on how the data could be modelled.

There are essentially three ways (I can quickly think of) we could model the ‘result’ of a match.

  1. Capture the result in the ‘Match’ fact record.
    Two Examples
    a) We could add a field called HTR – Half Time Result – and enter words or codes such as ‘Home Win’ or ‘HW’.
    b) We could add field called HW_HT (short for Home Win Half Time) and put a 0 or 1 in if it true
  2. No Results code – just capture the goals (let the reporting layer do the work)
  3. Place the result codes in a fact table of their own.
    Create a table called MatchResults, with columns

MATCH_ID,   RESULT_CODE,   RESULT

a sample set of record for a match could be -

123, HW_HT, 0

123, AW_HT, 0

123, DR_HT, 1

Back to the scenario

Let’s look at a Scenario 1(a) method (which is the chosen method for our Demo).

In the ETL Venkat has created a field called FTR – Full Time Result, which contains the values H (Home Team Wins), D (Draw) and A (Away Team Wins)

If we use this column in a report

6

WITH
SAWITH0 AS (select sum(case when T6428.FTR = ‘H’ then 1 else 0 end ) as c1,
T6428.YEAR as c2,
T6428.HOMETEAM as c3,
T6428.AWAYTEAM as c4,
T6428.DATE1 as c5
from
PREMIER_LEAGUE T6428 /* Home Team – Premier League */
where ( T6428.YEAR = ‘2009′ )
group by T6428.AWAYTEAM, T6428.DATE1, T6428.HOMETEAM, T6428.YEAR)
select distinct SAWITH0.c2 as c1,
SAWITH0.c3 as c2,
SAWITH0.c4 as c3,
SAWITH0.c5 as c4,
‘Home’ as c5,
SAWITH0.c1 as c6
from
SAWITH0
order by c1, c2, c3, c4, c5

In Summary this is:

7

select T6428.YEAR as c1,
T6428.HOMETEAM as c2,
‘Home’ as c3,
sum(case when T6428.FTR = ‘H’ then 1 else 0 end ) as c4
from
PREMIER_LEAGUE T6428 /* Home Team – Premier League */
where ( T6428.YEAR = ‘2009′ )
group by T6428.HOMETEAM, T6428.YEAR
order by c1, c2, c3

Now let’s take a scenario 2 where the ETL for football scores does not populate a set of result fields. This is often the case where database developers are not warehouse or star friendly, and only store the raw data.

So, in order to determine who is winning at half time we have to run a calculation. The choices of where you do this are numerous:

  1. In the database (Create a materialised or normal View)
  2. In the rpd – Create a calculated field which looks at the scores
  3. In a request – In a particular report

The database option probably being the longest one to implement, but it will also be the most performant (if using an MV) when you come to running reports. Creating the calculation in the rpd means it is re-usable. Creating the calculation in the report is probably the easiest to get done in a short time.

Lets have a look at creating a request of home wins at full time.

Add, Year, Team Name, Wins at Full Time.

Get Wins

Wins at Full time =

Case Win

(CASE WHEN ((Home Team Score at Full Time – AWAY Team Score at Full Time)>0) THEN 1 ELSE 0 END)

Which created the following SQL.

WITH
SAWITH0 AS (SELECT SUM(T6428.FTAG) AS C1,
SUM(T6428.FTHG) AS C2,
T6428.YEAR AS C3,
T6428.HOMETEAM AS C4
FROM
PREMIER_LEAGUE T6428 /* HOME TEAM – PREMIER LEAGUE */
WHERE ( T6428.YEAR = ‘2009′ )
GROUP BY T6428.HOMETEAM, T6428.YEAR)
SELECT DISTINCT SAWITH0.C3 AS C1,
SAWITH0.C4 AS C2,
SAWITH0.C2 AS C3,
SAWITH0.C1 AS C4,
CASE WHEN SAWITH0.C2 – SAWITH0.C1 > 0 THEN 1 ELSE 0 END AS C5
FROM
SAWITH0
ORDER BY C1, C2

AH!, So the SQL generated did not give the right answer.

Is that what you expected? Did you definately know it was going to do that?

As you can see it SUMMED the Home Goals and Away goals for the season then looked at the case if sum(Home goals) is greater than sum(away goals). Therefore you do not get an answer for each match, so cannot get total matches won in the table.

Why is this?

The key difference between Scenario 1 and 2 is shown in the statements

The first method analysis of the Result runs on each Match record in the table, then sums the field

sum(case when T6428.FTR = ‘H’ then 1 else 0 end ) as c4

The second method runs an analysis the summed fields

CASE WHEN SAWITH0.C2 – SAWITH0.C1 > 0 THEN 1 ELSE 0 END AS C5

Now lets see if we can get a Scenario 2 to work.

Here is the query again, but with a secret weapon – The Pivot table

Get wins

.SQL produced is the same,

WITH
SAWITH0 AS (select sum(T6428.FTAG) as c1,
sum(T6428.FTHG) as c2,
T6428.YEAR as c3,
T6428.HOMETEAM as c4,
T6428.DATE1 as c5,
T6428.AWAYTEAM as c6
from
PREMIER_LEAGUE T6428 /* Home Team – Premier League */
where ( T6428.YEAR = ‘2009′ )
group by T6428.AWAYTEAM, T6428.DATE1, T6428.HOMETEAM, T6428.YEAR),
SAWITH1 AS (select distinct SAWITH0.c1 as c1,
SAWITH0.c2 as c2,
SAWITH0.c3 as c3,
SAWITH0.c4 as c4,
SAWITH0.c5 as c5,
‘Home’ as c6,
SAWITH0.c6 as c7
from
SAWITH0)
select distinct SAWITH1.c3 as c1,
SAWITH1.c4 as c2,
SAWITH1.c5 as c3,
SAWITH1.c6 as c4,
SAWITH1.c7 as c5,
SAWITH1.c2 as c6,
SAWITH1.c1 as c7,
case when SAWITH1.c2 – SAWITH1.c1 > 0 then 1 else 0 end as c8
from
SAWITH1
order by c1, c2, c3, c4, c5

This time the query is the same, but I have put all the details into the table so that each match record is analysed, then used a pivot table to display the results

The Pivot

So the secret is to make sure that the calculations are done at the detail row level to get the right answer, then have the BI server add the rows for you.

The exam question can therefore be answered by creating a query with all the necessary calculations, Half time losing (make =1) minus full time winning (make = 0). sum the result in a pivot.

So the data can be analysed by treating the fact like an attribute of the match.  My favourite method to model this is to create the resultant fields – HT_HW (0/1), FT_HW(0/1) on the table and add these to a logical dimension. Then expose the dimension table and, Hey Presto your facts are acting like dimensional attributes.

Tags:

Comments

  1. Rittman Mead Consulting » Blog Archive » Fantasy Football – When … | Sports News Says:

    [...] here to see the original: Rittman Mead Consulting » Blog Archive » Fantasy Football – When … Share and [...]

Write a comment





Website Design & Build: tymedia.co.uk