Lock ‘Em Up And Throw Away The Key/Value Pairs

I have discovered a dastardly plot to wreck my database.  This thing runs deep, even the Russians are involved.  Like a ticking time bomb, this plot has been stewing within the bowels of my biggest OLTP instance, just waiting to explode.  Luckily, I discovered it before it was too late.

It all started with a seemingly innocent request from a developer.  Oh, I’ve been seeing clues along the way that something wasn’t right, but the alarms hadn’t yet gone off.  That is, until this request came long.

“Can you expand this VARCHAR column named Value from 50 characters to 255 please?”, the developer asked.  Hmmm, well, let’s see, there are 5.5 million rows in this table, you’re asking me to allow another 200 characters per row, potentially a gig of data.  This particular table was already on my watch list as being overly busy, I just hadn’t had time to fully dissect it to understand why.  “Can you tell me a little more about why this change is needed?”, I asked.  The response was “I want to store URL’s in there, and it’s not long enough.”

Storing a URL in a column named Value?  Something smells funny here.  I took a closer look at the table, and exclaimed “Egads!  It’s a key/value table!”  Each row in the table is an attribute for an entity!  The reason the table is so busy is because the application is making multiple trips to fetch all of the attributes as each entity is referenced.

This developer was wanting to add an additional Key value to the table for storing a URL of some type.  Acting quickly, I began interrogating developers about why this table exists, who approved this, are you insane, etc.  Almost as quickly, I whipped up a proposal for a proper design, one row per entity, normalized data storage, proper data types, all of the things a good DBA would do.

<br>CREATE TABLE dbo.CorrespondenceEventType<br> (<br>  CorrespondenceEventTypeID   INT    NOT NULL  IDENTITY(1,1),<br>  CorrespondenceType     VARCHAR(50)  NOT NULL,<br>  CONSTRAINT PK_CorrespondenceEventType PRIMARY KEY CLUSTERED (CorrespondenceEventTypeID)<br> ) WITH (DATA_COMPRESSION=PAGE)<br>GO<br><br>INSERT INTO dbo.CorrespondenceEventType<br>SELECT DISTINCT Value<br>FROM dbo.CorrespondenceEventParameters<br>WHERE [Key] = 'CorrespondenceType'<br>GO<br><br>CREATE TABLE dbo.CorrespondenceEventParameter<br> (<br>  CorrespondenceEventParameterID INT    NOT NULL  IDENTITY(1,1),<br>  CorrespondenceEventID   INT    NOT NULL,<br>  Client_ID      INT    NOT NULL,<br>  CorrespondenceEventTypeID  INT    NOT NULL,<br>  OrderItemID      INT,<br>  Order_ID      INT,<br>  EventID       INT,<br>  Payment_ID      INT,<br>  Student_ID      INT,<br>  AuthorizationID     INT,<br>  EmailAddress     VARCHAR(255),<br>  AppName       VARCHAR(20),<br>  LastUpdate      DATETIME,<br>  LastUpdateAppUserID    INT,<br>  CONSTRAINT PK_CorrespondenceEventParameter PRIMARY KEY CLUSTERED (CorrespondenceEventParameterID),<br>  CONSTRAINT FK_CorrespondenceEventParameter_Clients FOREIGN KEY (Client_ID) REFERENCES dbo.Clients(Client_ID),<br>  CONSTRAINT FK_CorrespondenceEventParameter_CorrespondenceEvent FOREIGN KEY (CorrespondenceEventID) REFERENCES dbo.CorrespondenceEvent(CorrespondenceEventID),<br>  CONSTRAINT FK_CorrespondenceEventParameter_CorrespondenceEventType FOREIGN KEY (CorrespondenceEventTypeID) REFERENCES dbo.CorrespondenceEventType(CorrespondenceEventTypeID),<br>  CONSTRAINT FK_CorrespondenceEventParameter_ExamRegistration FOREIGN KEY (OrderItemID) REFERENCES dbo.ExamRegistration(OrderItemID),<br>  CONSTRAINT FK_CorrespondenceEventParameter_Orders FOREIGN KEY (Order_ID) REFERENCES dbo.Orders(Order_ID),<br>  CONSTRAINT FK_CorrespondenceEventParameter_Payments FOREIGN KEY (Payment_ID) REFERENCES dbo.Payments(Payment_ID),<br>  CONSTRAINT FK_CorrespondenceEventParameter_Students FOREIGN KEY (Student_ID) REFERENCES dbo.Students(Student_ID),<br>  CONSTRAINT FK_CorrespondenceEventParameter_Authorizations FOREIGN KEY (AuthorizationID) REFERENCES dbo.Authorizations(AuthorizationID)<br> ) WITH (DATA_COMPRESSION=PAGE)<br>GO<br><br>CREATE INDEX IX_CorrespondenceEventParameter_CorrespondenceEventID ON dbo.CorrespondenceEventParameter (CorrespondenceEventID) WITH (DATA_COMPRESSION=PAGE)<br>CREATE INDEX IX_CorrespondenceEventParameter_ClientID ON dbo.CorrespondenceEventParameter (Client_ID) WITH (DATA_COMPRESSION=PAGE)<br>CREATE INDEX IX_CorrespondenceEventParameter_CorrespondenceEventTypeID ON dbo.CorrespondenceEventParameter (CorrespondenceEventTypeID) WITH (DATA_COMPRESSION=PAGE)<br>CREATE INDEX IX_CorrespondenceEventParameter_OrderItemID ON dbo.CorrespondenceEventParameter (OrderItemID) WITH (DATA_COMPRESSION=PAGE)<br>CREATE INDEX IX_CorrespondenceEventParameter_Order_ID ON dbo.CorrespondenceEventParameter (Order_ID) WITH (DATA_COMPRESSION=PAGE)<br>CREATE INDEX IX_CorrespondenceEventParameter_EventID ON dbo.CorrespondenceEventParameter (EventID) WITH (DATA_COMPRESSION=PAGE)<br>CREATE INDEX IX_CorrespondenceEventParameter_PaymentID ON dbo.CorrespondenceEventParameter (Payment_ID) WITH (DATA_COMPRESSION=PAGE)<br>CREATE INDEX IX_CorrespondenceEventParameter_StudentID ON dbo.CorrespondenceEventParameter (Student_ID) WITH (DATA_COMPRESSION=PAGE)<br>CREATE INDEX IX_CorrespondenceEventParameter_AuthorizationID ON dbo.CorrespondenceEventParameter (AuthorizationID) WITH (DATA_COMPRESSION=PAGE)<br>GO<br>

I even wrote the conversion script to take the data from their existing key/value table and store it properly within my new design:

<br>INSERT INTO dbo.CorrespondenceEventParameter<br> (CorrespondenceEventID, Client_ID, CorrespondenceEventTypeID, OrderItemID, Order_ID, EventID, Payment_ID, Student_ID, AuthorizationID, EmailAddress, AppName, LastUpdate, LastUpdateAppUserID)<br>SELECT<br> CorrespondenceEventID    = Pivoted.CorrespondenceEventID,<br> Client_ID       = Pivoted.ClientId,<br> CorrespondenceEventTypeID   = CorrespondenceEventType.CorrespondenceEventTypeID,<br> OrderItemID       = ExamRegistration.OrderItemID,<br> Order_ID       = Orders.Order_ID,<br> EventID        = Pivoted.EventId,<br> Payment_ID       = Payments.Payment_ID,<br> Student_ID       = Students.Student_ID,<br> AuthorizationID      = Authorizations.AuthorizationID,<br> EmailAddress      = Pivoted.EmailAddress,<br> AppName        = (SELECT MAX(AppName) FROM dbo.CorrespondenceEventParameters WHERE CorrespondenceEventParameters.[Key] = 'CorrespondenceType' AND CorrespondenceEventParameters.Value = CorrespondenceEventType.CorrespondenceType AND CorrespondenceEventParameters.CorrespondenceEventID = Pivoted.CorrespondenceEventID),<br> LastUpdate       = (SELECT MAX(LastUpdate) FROM dbo.CorrespondenceEventParameters WHERE CorrespondenceEventParameters.[Key] = 'CorrespondenceType' AND CorrespondenceEventParameters.Value = CorrespondenceEventType.CorrespondenceType AND CorrespondenceEventParameters.CorrespondenceEventID = Pivoted.CorrespondenceEventID),<br> LastUpdateAppUserID     = (SELECT MAX(LastUpdateAppUserID) FROM dbo.CorrespondenceEventParameters WHERE CorrespondenceEventParameters.[Key] = 'CorrespondenceType' AND CorrespondenceEventParameters.Value = CorrespondenceEventType.CorrespondenceType AND CorrespondenceEventParameters.CorrespondenceEventID = Pivoted.CorrespondenceEventID)<br>FROM<br> (<br>  SELECT *<br>  FROM<br>   (<br>    SELECT <br>     CorrespondenceEventID,<br>     [Key],<br>     Value = MAX(Value)<br>    FROM dbo.CorrespondenceEventParameters<br>    GROUP BY CorrespondenceEventID, [Key]<br>   ) AS EventParams<br>  PIVOT<br>   (<br>    MAX(Value)<br>    FOR [Key] IN (AuthorizationId, ClientId, CorrespondenceType, EmailAddress, EventId, ExamRegistrationId, OrderId, PaymentId, StudentId)<br>   ) AS PivotTable<br> ) AS Pivoted<br>INNER JOIN dbo.CorrespondenceEventType<br> ON Pivoted.CorrespondenceType = CorrespondenceEventType.CorrespondenceType<br>LEFT JOIN dbo.ExamRegistration<br> ON ExamRegistration.OrderItemID = Pivoted.ExamRegistrationId<br>LEFT JOIN dbo.Orders<br> ON Orders.Order_ID = Pivoted.OrderId<br>LEFT JOIN dbo.Payments<br> ON Payments.Payment_ID = Pivoted.PaymentId<br>LEFT JOIN dbo.Students<br> ON Students.Student_ID = Pivoted.StudentId<br>LEFT JOIN dbo.Authorizations<br> ON Authorizations.AuthorizationID = Pivoted.AuthorizationId<br>WHERE Pivoted.ClientId IS NOT NULL<br>

I met resistance right away.  “This will make our queries more complicated”, they cried.  “How so?”, I asked.  They muttered something about having to lookup ID values, and then run a second query to get the data they wanted.  “Ever heard of an INNER JOIN?”, I asked?  How is this complicated?

<br>SELECT CorrespondenceEventParameter.LastUpdate<br>FROM dbo.CorrespondenceEventParameter<br>INNER JOIN dbo.CorrespondenceEventType<br> ON CorrespondenceEventParameter.CorrespondenceEventTypeID = CorrespondenceEventType.CorrespondenceEventTypeID<br>WHERE CorrespondenceEventParameter.OrderItemID = 244389219<br> AND CorrespondenceEventType.CorrespondenceType = 'ExamReminder'<br>

Still they resisted, so I proceeded with implementing my solution in a copy of the production database, and demonstrated for them that my way would result in a query plan that costs a fraction of the plan produced by their current design.  In both cases, the intent is to return the last date/time that a reminder was sent for a specific appointment.  The top query plan is from the current key/value table design.  The bottom plan is from my new redesigned structure.

The real kicker came when I translated that into a metric that they could understand – time.  Using collected performance stats, I summed up the processing time that their current query had accumulated in production over the previous hour.  Using the cost difference between the query plans as a guide, I estimated that my new design would have cut nearly 30 minutes of processing time out of the previous hour for their application.  This got their attention, and redesigning this process has magically found its way onto the priority list.  It’s hard to argue against solid evidence.