What have the database vendors done for me lately?
February 18th, 2008 by Sreeni JakkaSreeni Jakka, founder of Tech O2, is an IT industry veteran with over 17 years of experience. Sreeni has deep experience in custom software development of CRM/CEM, SFA, eBusiness applications with emphasis on J2EE, LAMP, RoR and Open Source paradigms. Sreeni's experience ranges from start-ups to mature organizations. Sreeni's current interests include Enterprise 2.0 and Tech O2.
There have been radical improvements in Information technology in the last 20 years… especially in the software frameworks, middleware technologies, and several business-enabling paradigms from B2C, B2B, and Supply Chain to the latest SaaS, and Web 2.0. Unfortunately, the commercial database vendors don’t seem to keep up the pace. Specifically in the last 15 years, the user-centric developments in database arena are non-existent, or insignificant.
Sure, there are quite a few developments in the commercial database markets, ranging from Application Clusters (ex: Oracle’s RAC, Real Application Clusters) for high-level of clustering, Enterprise Grids, synchronization with Application Servers (of course, of their own brand), to enterprise multi-site replication come built-in toaster functionalities. The focus has been on stability, high-availability, business continuity; these advances certainly seem to be geared towards additional revenue generation from existing customer installations, and push towards cross-selling of their enterprise products (with clever integration of their suite of products with the database systems).
The software programmers may not be paying their bills, but they sure are the elite customers of Commercial database vendors. After all, we develop software using their database products.
So, my question to the database vendors as web programmer is “What have you done for me lately?”
Software development has become synonymous with web development, or web-enabled development. Paradigms such as SaaS (Software as a Service), and global markets only emphasize the web-enabled software development. My rambling is that the database vendors have done very little to help, assist and nurture the web programmers.
I present my case by identifying some of the fundamental needs of a web user (also read as web application), and the hoops the web programmer has to jump through to build this functionality. It will become pretty obvious to you, why this functionality cannot be ‘out of the box’ from the database vendors. You will appreciate the savings of time and effort for programmers, projects, and enterprises if these are built into (in a direct manner) the database systems. I don’t make claims that this list I provided is comprehensive, and I invite the software folks to add to this list.
1. Password encryption, User Authentication
Most web applications are database driven. Provided these are built on robust and secure databases, why can’t we make use of the database functionality for password encryption, user authentication? Programmers usually end up writing this functionality for almost every web application, which is such a mundane function. The only commercial database that supports this functionality I find is Microsoft’s SQL Server 2005 and up. Of course, there is always a catch with Microsoft; and this works only with ASP.NET 2.0. This gives you out of the box solution for authentication, registration, role creation wizards that are pre-binded with SQL Server.
2. Support for Regex (Regular Expressions) in SELECT
Developers that come from Software Engineering background and/or UNIX/LINUX/Perl background, appreciate the use and benefits of regular expressions. Especially in pattern matching scenarios, regex gives you so much flexibility. The ideal scenarios are Customer recognition on the web, self-help where the users might end up misspell key words, and product suggestions based on certain patterns. To squeeze in pattern matching criteria into primitive (AND, OR, IN) constructs of SQL WHERE clause, is a developer’s nightmare. The best commercial database could offer is LIKE phrase with the use of limited wildcard searches such as %. MySQL, the most famous Open Source database implementation of Regex is really robust.
3. Pagination
No need for introductions here… everyone is familiar with “the <<prev, next>>, page 2 of 12″ etc. kind of representation of data sets in all web applications. Almost all web applications use this functionality to present a subset of data the user is looking for, and alerting them the index or position of that data with respect to the entire result set. What we want is either ‘give me rows between 11 and 20 in the set’, or ’starting at 11 give me next 10 in the set’.
It’s sad to see that even after a decade of surge and success of web development, none of the commercial database vendors support this feature ‘out of the box’.
With Oracle you have to get on a decent size expedition to achieve this. First you define your query and use row_number() construct in conjunction with over(sort column) construct, and then use that query as a correlated subquery, determine your start and end row numbers and use that in the where clause of the main sql. It goes something like this:
startRow := 11; endRow := 20;
SELECT col1, col2, col3, row_number
FROM (SELECT col1, col2, col3, row_number() row_number OVER (ORDER BY col3)
FROM table1, table2
WHERE table1.key = table2.key)
WHERE row_number BETWEEN startRow AND endRow;
This example may not seem that bad, as it contains only two tables in the subquery, and selecting only 3 columns. But in real world applications, you could imagine how ugly and complicated this could become. Also, the subquery ends up fetching all rows based on the sort criteria inside, and the wrapper SQL filters it to the given range. In other words, you only achieved minimizing the packet size to the client, but ended up fetching all the data on the server side. You would also see the redundant column definition between the main sql and the real query which is defined as the subquery. So, my question is “Why?”!
Because of this not-so-direct way of doing things, programmers either end up caching the entire data set in Stateful Beans or on the HTTPSession sometimes, and you could imagine the system performance in that scenario.
Let’s look at MySQL’s implementation of pagination. MySQL offers two ways of achieving pagination and both are equally graceful. To get rows 11 to 20, you could either use
SELECT col1, col2, col3
FROM table1, table2
WHERE table1.key = table2.key
ORDER BY col3
LIMIT 10, 10
Or
SELECT col1, col2, col3
FROM table1, table2
WHERE table1.key = table2.key
ORDER BY col3
LIMIT 10 OFFSET 10
4. Tagging - Meta Data
With the focus on Social Tagging and Tag Clouds, a favorite Web 2.0 phenomenon, the trend of web applications is the focus on incorporating this functionality for different reasons. The purpose ranges from suggestions to alternative products, pushing add-on services/products related to the current product, attaching product reviews, or simply to extend more relevant content to the user. The concept of Meta data exists in DBMS, but the use is so far limited to the internal orchestration of the database itself. It hasn’t been exposed to the customer use. It’s heart-wrenching to see so many web applications using the concept of Tag Clouds and Meta Data, and investing in building the database framework for it where as this can be built in as a feature of the database.
5. Asynchronous Event Notification
This would be an ideal feature to have in a DBMS that can notify of events of interest to different applications, or middle-tier environments. Imagine your product catalog is cached in your eCommerce site, and there have been few new products introduced, or a product has changed. If the DBMS can notify such events to the Application Server or middle-tier framework such as ESB (Enterprise Service Bus), it can benefit from it (by reloading the product catalog or something like that). Most web applications will have to wait for the next start-up or use their maintenance windows to get the latest changes of cached data. Or you will end up writing another polling scheme to see if anything of interest has changed in the database.
6. Best-Match Search
All commercial databases provide only a true search. This is done using constructs such as WHERE customer = “John” in the SQL. Programmers end up writing their own implementations of natural search algorithms to achieve best-match searches, and relevancy ratings for search results. Come on, this is so fundamental to web applications and today’s user-centric applications, and I don’t see a big deal of effort in incorporating such functionality in the database itself.
Ex: John 100% relevancy
Johny 90% relevancy
Johnathon 65% relevancy
Applications can benefit from improved performance by pushing such functionality to the DBMS, it standardizes mundane processes, and also helps the programmers focus on the real needful tasks.
It’s time that the database vendor in addition to the emphasis on scalability, high-performance functionalities, and big ticket revenue generation features; should also focus on productivity improvements for customers, and provision of programmer-friendly features.