Sunday, March 25, 2012

custom(and dynamic) record fields query

Hello,
I need to create a query that depends on custom fields (virtual).
First I'll try to outline the background.
I got Nodes table, that has NodeID field, and other (not really important)
fields like name, some boolean fields and so on.
Lets define node as an item for sale (just for the example purposes).
User of the application can create custom attribute values for each each
item (node).
To make the attribute names consistant through application I got another
table Attributes (AttID, AttributeName)
And the last table is association table between Node and Attributes
containing the value of the attribute as well.
I'd like to get following resultset:
NodeID, NodeName, CustomField1, CustomField2, CustomField3, ... and so on
(names of the fields would depend on Attribute name)
with values:
1,FirstNode, 'one of the values', 'another value', NULL, ...
2,SecondNode, NULL, NULL, NULL , ...
3,ThirdNode, NULL, NULL, 'some value', ...
SQL for creation such tables would look like (I ommited constrains and
indexes for siplicity):
CREATE TABLE Nodes (
NodeID INT IDENTITY (1, 1) NOT NULL,
NodeName NVARCHAR(255) NULL
)
CREATE TABLE Attributes (
AttID INT IDENTITY (1, 1) NOT NULL,
AttributeName NVARCHAR(50) NOT NULL
)
CREATE TABLE NodeAttr (
NodeID INT NOT NULL,
AttID INT NOT NULL,
Value NVARCHAR(255)
)
some data insertion:
INSERT INTO Nodes(NodeName) VALUES ('FirstNode')
INSERT INTO Nodes(NodeName) VALUES ('SecondNode')
INSERT INTO Nodes(NodeName) VALUES ('ThirdNode')
INSERT INTO Nodes(NodeName) VALUES ('FourthNode')
INSERT INTO Nodes(NodeName) VALUES ('FifthNode')
INSERT INTO Attributes (AttributeName) VALUES ('CustomField1')
INSERT INTO Attributes (AttributeName) VALUES ('CustomField2')
INSERT INTO Attributes (AttributeName) VALUES ('CustomField3')
INSERT INTO NodeAttr (NodeID,AttID,Value) VALUES (1,1,'My Value for Field1')
INSERT INTO NodeAttr (NodeID,AttID,Value) VALUES (1,2,'My Value for Field2')
INSERT INTO NodeAttr (NodeID,AttID,Value) VALUES (1,3,'My Value for Field3')
INSERT INTO NodeAttr (NodeID,AttID,Value) VALUES (3,3,'My Value for Field3
in third node')
Second I describe my approach to the problem.
I created function that retrieves data based on nodeID and Attribute name:
CREATE FUNCTION myfun(@.NodeID INT, @.AttributeName NVARCHAR(50))
RETURNS NVARCHAR(255)
AS
BEGIN
DECLARE @.Value NVARCHAR(255)
SET @.Value=NULL
SELECT @.Value=Value FROM NodeAttr NA
INNER JOIN Attributes A ON (NA.AttID=A.AttID)
WHERE NodeID=@.NodeID AND AttributeName=@.AttributeName
RETURN @.Value
END
Next, I create dynamic SQL for data retrieval:
declare @.query NVARCHAR(4000)
SELECT
@.query=isnull(@.query+',','SELECT NodeID,
')+'dbo.myfun(NodeID,'''+AttributeName+''') AS ['+AttributeName+']'
FROM Attributes
SET @.query = @.query + ' FROM Nodes'
and execute it:
EXECUTE sp_executesql @.query
Dynamicly generated SQL looks like this:
SELECT NodeID,
dbo.myfun(NodeID,'CustomField1') AS [CustomField1],
dbo.myfun(NodeID,'CustomField2') AS [CustomField2],
dbo.myfun(NodeID,'CustomField3') AS [CustomField3]
FROM Nodes
It is great when you deal with small number of nodes. When node number is
about 2000 the perfomarnce is really bad (executes over a minute). And I'd
like to have even more.
Anyone can point me to better solution? :)
regards,
Pawel Rogozinski
Software Engineer at Eracent Inc.
pawel.rogozinski@.eracent.com>> I need to create a query that depends on custom fields (virtual). <<
This flawed approach is called an EAV design and newbies re-invent it
about once a month here. The basic fallacy that leads to meltdown in
about a year is that you are mixing data and metadata. Also, columns
are not fields and rows are not records -- totally different concepts.
Then give the users a copy of SQL and get out of the way. If any
random user is a better database programmers and designer than you are,
you should not be programming.
It is not just exponently slow; since it has no data integrity, it is
error-prone, too!!|||>> I'd like to get following resultset:
It is better to use a client programming language or a report writer to
generate such cross tabulations. In t-SQL, most general approaches are
clumsy. For some methods refer to: http://tinyurl.com/ayjwa
Anith|||This sounds like an EAV (Entity-Attribute-Value) design. One of it's biggest
failings is that reporting is significantly more difficult and slow. Another
divanatage is that it requires dynamic SQL and all of its issues. Thus, y
ou
should be aware that the difficulties you are having in finding a solution a
re
directly related to the design.
That said and assuming you wanted to continue with this design, one solution
that takes the dynamic SQL out of the equation would be to make the equivale
nt
of a materialized view. In essence, create a permanent temp table where each
"attribute" in your design is a column in this temp table. You would then
populate and maintain this table with every entity's attribute values. To do
that you can use a massive case statement like so:
Select NA.NodeId
, Min(Case When NA.AttId = 1 Then NA.Value End) As Attribute1Name
, Min(Case When NA.AttId = 2 Then NA.Value End) As Attribute2Name
...
From NodeAttr As NA
Group By NA.NodeId
In order to maitain the contents of the table, you'll need to use triggers t
o
update the entity's appropriate attribute values in the temp table and so on
.
The solutions to the problems created by this type of design get uglier and
uglier and the performance gets worse and worse.
Thomas|||Celko,
thanks for your "constructive" comments.

> This flawed approach is called an EAV design
> and newbies re-invent it about once a month here.
Nice. I did not reinvent this, I'm just working on such flawed design,
as you called it.
Do you know better design, that could substitute EAV then?
[...cut...]

> It is not just exponently slow; since it has no data
> integrity, it is error-prone, too!!
Didn't I mention that I did not put integrity constraitns for simplicity
of the example? I believie I did.
I would be very gratefull if you could lead me to a better solution and
not criticizing my approach, as I am aware of the flaws that come with
my approach (and the design as well).
*** Sent via Developersdex http://www.examnotes.net ***|||I like the temporary table solution.
As the attributes are not added/deleted frequently it could be one of the
best approaches.
regards,
Pawel Rogozinski
Software Engineer at Eracent
pawel.rogozinski@.eracent.com|||>> Do you know better design, that could substitute EAV then? <<
The point everyone is trying to get across to you is that you need a
totally different approach; it is not a matter of finding a substutute.
Why not use an actual data model, based on the specifications from the
user. There is no "magical, one size fits all" answer that you s
with kludges like EAV. Each database is a thing in itself, but you can
look for patterns for reuse.
simplicity of the example? I believe I did. <<
Did I meniton that doing that is virtually impossible without a few
hundred lines of procedural code in triggers and procs if you have a
database of any size?
Let's say I have 50 attributes of type INTEGER in my EAV model. They
are crammed into one table. Each one has 1 or 2 CHECK() constraints
that are totally different. Now try to write the CASE expression with
50 WHEN clauses to validlate each one.
Let's say that I have a primary key among those 50 attributes. Can you
show me the uniquenss constraint to enforce that business rule?
Let's say that I have a a foreign key among those 50 attributes. Can
you show me the uniquness constraint to enforce that business rule?
And do the DRI Actions required to maintain this relationship?
Thomas Coleman had a recent posting where he worked out the equivalent
of a simple GROUP BY statement for an EAV. Google it.sql

No comments:

Post a Comment