Software of all kinds requires information to do its work. At the back end of most programs you’ll usually find some form of database, which has been created or selected and configured to run specifically with that application.
In creating that database or making a selection from the wide range of products available, there are several factors to consider. We’ll be looking at these criteria, in this article.
Establish How Much Data The Application Will Store At Full Function
Practically any database out there can handle a data store of only gigabytes, and if the data set is small enough, an in-memory database application might be all you need. Many middle market databases are capable of handling thousands of gigabytes (terabytes) without problems.
When the information you’re dealing with gets to millions of gigabytes (petabytes), the options become limited. You’ll also need to consider additional costs for data storage, either on-site or in the cloud.
Calculate The Peak Number Of Users
It’s necessary to estimate the maximum number of simultaneous users who may be sending queries to an application database, as this figure will have implications both on the capacity of the database, and its ability to scale.
With a public database for example, you may have to consider scaling out the application to multiple servers, to cope with unexpected traffic surges or seasonal demands.
Determine Application Demand For Availability, Scalability, Throughput, Latency, and Data Consistency
24/7/365 and 99.999% availability are the ideals for today’s digital business operations – and often for the applications that help make those processes happen. With a database on-site, you can usually allocate active servers and configure high availability outside of scheduled maintenance periods. Off premises, there are a few cloud databases that offer 99.999% availability if you run them in multiple zones.
NoSQL databases have traditionally had the edge over SQL databases, in terms of horizontal scaling. The cloud offers much greater options for dynamic scalability. This quality is important, as databases with good scalability can handle multiple simultaneous users by scaling up or out until their throughput becomes sufficient for the load.
Throughput is output relative to input, or the amount of data passing through a system from input to output. Databases with high throughput can support many simultaneous users.
With latency, you’ll need to consider both the response time of the database itself, and the end-to-end response time of the application it serves. Instantaneous response to user actions typically requires a database to respond in under 100 milliseconds for each simple transaction. Analytic queries may take seconds, or even minutes depending on the complexity of analysis.
In SQL databases, all reads usually return the latest data. This translates into strong data consistency. Systems with eventual consistency offer lower latency, but at the risk of reading stale data. Data consistency can range from eventual to strong, for NoSQL databases.
Establish The Change Frequency Of Your Database Schema
A schema is the skeletal structure that represents the logical view of the entire database. It defines how data is organized and how the relationships between records are associated.
SQL databases are the best option if your database schema is unlikely to change significantly over time, and you want most fields to have consistent data types from record to record.
Determine The Geographic Spread Of Your User Population
Most databases that support globally distributed nodes and strong consistency use consensus groups to speed up write operations without seriously degrading consistency. Unless you provide additional servers in their regions, database latency for remote users has a lower limit at the speed of light.
With users spread across the globe, there’s a trade-off between consistency and latency. Some databases allow for distributed read-write servers. Others offer distributed read-only servers, with all writes forced to go through a single master server
Identify The Typical Data Characteristics For The Application
The “shape” or form of the information you’re processing will also influence the choice of database for your application. If you’re performing analytics, some degree of transformation may be required, to bring the data into an appropriate form.
SQL databases usually store strongly-typed data in rectangular tables, using rows and columns. Relationships between tables must be clearly defined, and SQL databases use indexes to speed up selected queries, with the JOINS mechanism to query multiple tables at once.
Document databases will typically store weakly-typed records, which may include arrays and nested documents. Graph databases usually store vertexes and edges, triples, or quads. Key-value and columnar stores are other NoSQL database categories.
Online Transaction Processing (OLTP), Analytic Queries (OLAP), Or Both?
Any application that needs to perform rapid transaction processing requires a database with fast write speeds and minimal indexes. A database used in analytic queries needs a fast read speed and multiple indexes.
If your applications must process transactions and perform analytics, then a hybrid system is called for. Such systems may for example have a primary transactional store which feeds a secondary analysis store through replication.
Calculate The Read To Write Ratio
As some databases are better at handling reads and queries than they are at writing quickly, it’s a good idea to determine the mix of read and write activity that your application is likely to produce. This will aid in creating benchmarks for desired performance – and in your final selection of database.
Any Special Query Requirements?
Queries on geographical or geometric data tend to have different requirements from a standard relational database query. Most of the 25 or so databases capable of handling spatial data queries support some or all of the Open Geospatial Consortium standard.
Similarly, the full-text search of text fields requires different indexes to relational or geospatial data.
Identify Your Preferred Programming Languages – And Those The Database Supports
Take A Close Look At Your Budget And Resources
In terms of price, there are databases for every budget, ranging from free to enterprise-level. Many offer a range of payment options and packages with varying price tags, such as Developer’s Editions or Enterprise solutions.
Note that if you opt for a free open source database, you may need to invest in external support. In all cases, it might also be necessary to budget for user training.
Consider Any Compliance Or Legal Requirements That Apply
Regulatory frameworks such as GDPR and the new CCPA in California set out conditions for data privacy, which affect how information should be handled and stored. Industry standards may also dictate your approach to data governance, and even the technology that it’s permissible to use. These factors will have to be taken into account in making your database selection.
As you can see, there are a lot of things to think about, before choosing a database option for your application. But it’s worth taking the time to consider as many of these factors as are relevant. The alternative could mean spending time, effort, and money on a database solution that doesn’t meet your application requirements.