Industry


Ads by TechWords

See your link here


Life and death (literally) data management challenges

In my column that is appearing Monday I pointed out several application areas that:

1.  Cannot, in my opinion, be addressed by dogmatic relational approaches
2.  Literally are life and death

These are electronic health records and homeland/national security (intelligence analysis).

The key factor that makes them hard to handle relationally is the difficult, ever-changing nature of the analysis they need to support.  There simply is no stable way to define what the data is, exactly, or how it is to be evaluated. 

I considered adding disaster response to the list, but it wasn't quite as obvious because it didn't include that indeterminacy factor.   Indeed, a huge national database of people, buildings, transportation, etc. would have been a rather wonderful thing to have in the Katrina emergency, even if it had no more bells and whistles than a relational approach will allow.

At least two important related issues are addressed in other blog posts, namely

What about privacy?
Would integrated electronic health records really save lives?

If you have comments specifically on those points, it would be great if you made them in those threads.  DBMS-related comments probably belong here. 

What People Are Saying

Yes, the DATA is changing.

Yes, the DATA is changing. Obviously, this can be handled in principle by some combination of schema modification and the ordinary update mechanism. But doing so is IMO an unnatural act.

Look, the RM is general enough that in principle anything can be shoehorned into it. However, that doesn't mean we should do so in every case.

Eric, I didn't say that

Eric,

I didn't say that tables were "flat" or "two-dimensional," for the same reason I wouldn't call an arbitrary matrix "two-dimensional." Linear Algebra was one of those semi-life-changing classes for me ...

Most of the time I do tend to think of each column as one-dimensional, although for datatypes without a natural sort order even that's not really valid. But when it matters, I quite happily break that habit too.

However, if you think other readers here need the point spelled out more clearly, please feel free to expound further.

And by the way, I'm now blogging over at www.dbms2.com rather than here. There may be other blogs later, but right now that's the active one.

Cheers,

CAM

Some comments on the posting

Some comments on the posting and other comments:

Curt wrote:

The key factor that makes them hard to handle relationally is the difficult, ever-changing nature of the analysis they need to support. There simply is no stable way to define what the data is, exactly, or how it is to be evaluated.

You're implying that the "data" is an unstructured "stream" which needs to be captured and analyzed later. Presumably the result of that analysis will be data? Some portion, at least, of the "streams" resolve into data - until that point, it's relatively useless, and after that point, we know what it is and presumably want to take advantage of th e fact?

... intermediate evaluations or reductions of the data are performed and newly stored in the database, such as weblog interpretation, text analyses and/or any form of data mining.

Yes, but what is stored? The results of the analysis, presumably, which is structured (in contrast to the "stream" from which it was extracted). That is data, and while a relationship to its origin is desirable, it is still something remarkably different from its source.

So yes, especially from a tabular point of the view, the data ITSELF changes depending on the analysis that needs to be done.

Not really. Data is created (defined and extracted) based on analysis. The fact that this needs to happen more quickly is certainly an argument for better tools for database management, but has nothing to do with the value of relations.

And the words "from a tabular point of the view" are meaningless in this context. Relations are no more "tabular" than the earth is flat based on a Mercator projection. Geometrically speaking, does this representation of a set of 3-dimensional points prove that cubes are flat?

{(0,0,0), (0,0,1), (0,1,0), (0,1,1), (1,0,0), (1,0,1), (1,1,0), (1,1,1)}

- Eric

Ummmm, lets see. Get rid of

Ummmm, lets see. Get rid of the rdbms. Replace with cheap "data appliances." The RDBMS model is too complex. Well I am going to agree with you to a point. After seeing what the Universities are putting out for CS, the RDBMS model is too complex for them to understand. Everything needs to be in a neet black box. Understanding the black box is forbiden. I want to thank you and the universities.

Because of this I get a lot of business and stay very busy. Another system is put into production it is either unreliable (going down mulitple times a week.) Slow, the end user can not get to the data required in a timely manner. Or my personal favoriate, the CFO is tired of dropping 100K on new hardware to get more performance out of their application.

Some questions. How are you going to insure people are not walking on each other as they are updating xml files? How are you going to get the data appliances talking to each other and ensure security of the data?

It's funny you mention homeland security. I have several friends there who are experts in rdbms technology. A few people who were spouting your theories are no longer on the project because it just does not deliver what the client needs.

Later,
-Robert

Curt, after being flamed by

Curt, after being flamed by you earlier, I am a bit reluctant to post, but then again:

How can you say that DATA changes? Data, at some point in time, is data. You say:

"...the data ITSELF changes depending on the analysis that needs to be done"

Do you believe this? Isn't it so, that your analysis will present you with a result based on the DATA underlying the analysis?

"You seem to be assuming that the data is all there; analysts decide after the fact how to analyze it."

If DATA changes depending on which analysis we perform, why do we need data?

Please explain. Until you do, please call me Ravi, as we seem to be one.

Ravi,Please don't start a

Ravi,

The key point that you (and other relational advocates) seem to be missing is that different analytic needs can mean different DATA needs.  You seem to be assuming that the data is all there; analysts decide after the fact how to analyze it.    But that may not be true at all.  And commonly, it is true but only under an important architectural assumption -- intermediate evaluations or reductions of the data are performed and newly stored in the database, such as weblog interpretation, text analyses and/or any form of data mining.  

So yes, especially from a tabular point of the view, the data ITSELF changes depending on the analysis that needs to be done.

Several statements in the

Several statements in the article require some comment.

The statements "The key factor that makes them hard to handle relationally is the difficult, ever-changing nature of the analysis they need to support. " Data analysis and data storage (in the logical sense of modelling data) are two different things. The same data can be "combined" in many different ways to permit many different analyses. Hence, your argument is invalid. Analysis of data has absolutely nothing to do with relational modelling per se.

The second statement, There simply is no stable way to define what the data is, exactly, or how it is to be evaluated. also does not make sense. If you have the data, you presumably have some agreed upon way of storing the data (logically speaking). The issue of analyzing the data is completely different from the issue of modelling the data.