Technical

Pattern matching

Peter Scott

Apr 4, 2007 • 1 min read

Tom Kyte posts an interesting paper and invites comment. It is not an easy read, and really needs a reasonable understanding of analytic functions, but persevere if you can.

To my mind the best place to find patterns is as close to the data as possible. I know that some vendors have developed "real time analytics" that actually sit in the [XML] data feed and analyse data before it even hits the database, but the next best thing is to use the database for a purpose it was optimised for; the manipulation of data. Moving a large amount of data across a network to some procedural engine working on a possibly row-based algorithm strikes me as potentially slow.

But there are two sides to pattern matching - finding patterns that indicate an event of significance occurring; here we define a pattern and analyse data until it occurs. But perhaps more challenging is spotting when a normal pattern disappears to be replaced with abnormal activity; here we may not know the new pattern to look for but we know what is normal. Conducting that type of analysis is often more complex that the traditional stoplight on a BI web dashboard that alerts if sales, say, drops below a threshold value. Identifying new patterns is an important facet of data analysis - it's a moot point where row based pattern recognition stops and traditional data mining begins

Perhaps the biggest downside to putting pattern matching functionality into the database is the complexity of the SQL functionality, it will not be for the faint-hearted (or those scared by analytic functions). And here is a conundrum: business users may know the patterns to seek but can't handle the code, and experienced developers can cut the code but not have an insight into the patterns. Until tools become available for users to build their own queries there may be a reluctance to get involved. After all what business would allow important metrics to be calculated by IT professionals ;-)

Sign up for more like this.