RDBMS performance implications of NAND Flash and DRAM pricing trends

This article on StorageSearch.com shows an interesting trend and prediction in the pricing change between DRAM and NAND Flash, as you can see from the chart both are dropping but NAND is dropping faster.

What acutally happened was that NAND prices rose, likely due to demand from companies like Apple, below the first chart shows the price of 8GB MLC NAND Flash from 2007 to today. These modules are commonly used in smaller capacity 2.5″ solid state disk drives. Dram prices also rose as depicted in the second chart below.

MLC Flash 8Gb Module Pricing 2007 – Today

Dram Pricing DDR2 1Gb Module 2007 – Today


There is currently $$$ being spent in building new manufacturing facilities to produce NAND flash so unless new sources of demand appear prices will start to fall again soon. So what does this mean if you are currently designing an enterprise solution that will be deployed to production in the next 12 months and be used for up to the the next 10 years? Well it depends on your application but if it makes significant use of a relational database management system or RDBMS then you can expect a huge decrease in the cost of delivering IOPS.

Calculating IOPS cost

Today you can purchase a FusionIO card for ~$7500 that offers 320GB of storage and a R/W 100k/140k IOPS, a single 15k 450GB SAS drive costs around $300 and will deliver ~175 IOPS to match the mixed  read/write performance of the card you would need to purchase 300 drives, enclosures and controllers, a rough calculation:
15k Disks – 300 x $300 = $90,000
Enclosures – 10 x $5000 = $50,000
Total = $140,000

RAID IOPS Calculator
The problem with the FusionIO solution is that it does not scale easily each card must be installed into a Pci-E slot in a server, most 1U servers will only have one slot. It is also directly attached to the server, when using a SAN storage can be shared amongst servers. If on top of this you are taking advantage of virtualisation it is possible to move VM’s from one physical host to another in realtime and your server will remain online and suffer a small reduction in performance for a few minutes. This makes it possible to take physical machines offline for maintenance without affective the availablity of your application. FusionIO does not offer fault tolerance if the physical machine that the card is installed in fails then you lose access to that storage until the machine becomes available again. In this situation you must have a mirror machine available with a recent copy of your data. But this is still a good value solution at $15,000 vs $140,000.

The water is a little bit murky

It is still not clear exactly how vendors are going to deliver the level of IOPS performance offerd by FusionIO in a SAN based format. The problem seems to center around the speed of interfaces like iSCSI and the bandwidth and latency of gigabit ethernet. Although the storage devices already exist there is a lot of work needed to improve the pipeline between hosts and SAN. That said this is a solvable problem, it’s not “if” but “when”.

The impact on database performance

RDBMS’s like Sql Server, MySql and Oracle are going to get a huge performance boost as NAND Flash based storage systems come into use, typically IO has been the bottleneck in any database based application in future this will likely change to CPU. Oracle and Sql Server are both licensed per CPU/Core so when building applications this should be taken into account. Any operation that requires a significant amount of CPU time should not be perfromed by the database unless absolutely necessary. Databases should just be used for what they were originaly built and that is to persist data and allow it to be retrieved. The need for inline caches should be reduced due to this performance boost.

My first UK SQL Server User Group meeting

Yesterday I attended my first UK SQL Server User Group meeting.
Event web page

Topics

– SQL Server I/O – Tony Rogerson
– Understanding how Sql performs Joins with NestedLoops, Merge & Hash – Ami Levin
– DbSophic QURE – Workload Tuning – Ami Levin
– SQL Streaminsight – Allan Mitchell – twitter: @allanSQLIS

I was most interested in the first topic but missed the talk as I did not arrive till 18:30 and the evening started @ 17:30. I don’t finish work till 18:00 and had to get across town. Of the three talks that I caught the content and delivery was at a very high standard.

Understanding how Sql Server performs Joins with NestedLoops, Merge & Hash – Ami Levin

If you write a lot of SQL queries or are often called upon to tune SQL Server then you should learn how/why the query optimizer selects from three physical operators to perform joins and how sometime it can be caught out. This talk was given by Ami Levin who is a SQL MVP and works for DBSophic. It was really interesting and the first time I have had any understanding of what these terms that appear in the execution plan mean. One really interesting thing that Ami pointed out is that sometimes the method Sql uses to estimate the number of rows and actual number of rows will differ by a large amount, this can result in a sub optimal operator being selected to perform the join. If when tuning you spot this sort of discrepancy further investigation is warranted. The following is a scenario that Ami gave when this can occur:

SELECT *
FROM SalesLT.SalesOrderHeader SOH
INNER JOIN SalesLT.SalesOrderDetail SOD ON SOH.SalesOrderID = SOD.SalesOrderID
INNER JOIN SalesLT.Product P ON P.ProductID = SOD.ProductID
WHERE SOH.TotalDue > 1500
AND SOD.UnitPrice > 300
AND P.StandardCost > 300

So in the SQL above the optimizer will use statistics to estimate roughly how many rows it will return by looking at each table in turn and estimating how many rows each where clause will return. In this case say that each Where clause will return ~1% of the rows in a table containing 100k rows so the optimizer estimates the number of rows to be 1% of 100,000 = 1000 rows then, 1% of 1000 = 10, then 1% of 10 < 1. So the optimizer thinks 1 row or less will be in the results set. Unfortunately what the optimizer cannot know is that the where clauses are releated to the data, in this case ORDERS that cost more and have more expensive unit cost and a higher so the actual number of rows returned will be nearer to 1000, hence why sometimes the optimizer can choose a sub optimal physical operator to perform a join.

If you think the wrong operator is being chosen you can use the OPTION keyword this article discusses in more detail:

Make SQL Server 2005 Address more than 2Gb on Windows 2003/2000 32bit versions


If you are using Sql Server 2005 straight out of the box on
32bit Windows 2003 / 2000 and you would like Sql Server to be able to address
more than 2Gb of RAM you have to enable AWE.

http://msdn.microsoft.com/en-us/library/ms190731(SQL.90).aspx

Schema Design Rules

These are the schema design commandments I live by, they are kept here purely for my reference.
Rules marked with * are absolute and if not implemented will result in karma so bad you will be reincarnated as a bottom feeder.

1.) Pluralise table names.
2.) *Every table in the db will have a primary key, without exception.
3.) Be consistent with Case and separators when naming tables and columns
4.) Enforce referential integrity with foreign keys.
5.) Avoid acronyms wherever possible when naming tables and columns.
6.) Avoid using the float datatype for columns that will store monetary values.
7.) Use varchar instead of char(n) unless specifically required.

If you have any rules you swear by then please comment with your reasoning.
This is a work in progress.

Choosing Hardware for a SQL Server 2005 Database for Web Applications

When you are tasked with choosing a new database server, let the business requirements dictate the hardware and license choices you make. I am only going to cover SQL Server 2005 Std and Enterprise editions, if you require less horse power then any modern desktop will be ample. If you require more horsepower then you need to talk to one of the Tier 1 vendors Sun, IBM or HP and you’ll need a big cheque book.

SQL Server 2005 Licenses

So lets start with SQL Server Licensing, there are a few different licensing models available, per seat, per device or per CPU. The chances are you will only be interested in per CPU, if your web application is publically available or is an extranet type app you will not have domain users accessing it, a possible exception is an intranet app. From now on I am going to focus on the per CPU licence.

SQL Server 2005 now ships for x32, x64 and IA64 Itanium processors. The price is the same for each at around £4500 for a Std licence and £16000 for an Enterprise license. MS multicore policy is to base this on the physical number of CPUs and not the number of cores. This is important because Intel and Amd are delivering more MIPS by adding cores instead of going after higher clock speeds. When choosing a server it makes financial sense to rember this, for example:

               2005 Std Edition License for a Quad CPU server will cost 4 x 4500 = £18,000
               2005 Std Edition License for a Single CPU Quad Core will cost 1 x 4500 = £4,500

This example is extreme and a single cpu with four cores may not yield the same performance as four CPU’s but it does illustrate how making an informed hardware purchase can keep the server licence costs down. It is also worth remembering that opting for a 64bit server has been shown to yield 200% performance gain in real world applications. If you are building a server that will be dedicated to running your database then choosing x64 should be a no-brainer unless you have some sort of corporate policy that prevents it. If you intend to run other apps then you should check if they are available in x64 versions. As a rule it is a good idea to just run SQL Server on its own. Std Edition will support a maximum of four CPU’s and Enterprise is unlimited.

Server Hardware

RAM or Memory if you prefer is a key component of any database server and it makes a big difference to performance. SQL server 2005 Std and Enterprise both support the maximum provided by the OS. If you choose Std Edition then Windows 2003 x64 R2 is the best OS choice. It supports up to 4 cpu’s and can address up to 32GB of RAM and costs less than £700. If you have chosen Sql Server Enterprise edition then the recommended operating system is 2003 Datacenter x64 it is cheaper than Enterprise at ~£2000 and will support up to 64 Cpu’s and 1 terrabyte of RAM for more info on os licensing use this link. 2003 Server Comparison

AMD or Intel? I don’t think there is a huge difference in performance if you need a dual cpu server, if you are building a quad then AMD’s Opteron performs better but you can only scale up to eight cpu’s where as Intel offer 32 cpu solutions.

SCSI, Fibre Channel, SAN or SAS ? SAS probably it will future proof better you can have over 126,000 drives, prices are a little high at the moment but should fall, fiber offers good performance and the price is good now but I think it will give way to SAS. SAN’s are big money but offer the highest performance. SCSI is starting to look like it is on the way out and limits the number of drives to around fify on a four channel card. SAS Architecture

SAS Articles:
http://www.serialstoragewire.com/Articles/2004_1026/5_vitesse_it_insights_1004.html
http://www.techworld.com/storage/features/index.cfm?featureid=1319&pagtype=samecatsamechan

Application Workload

OLAP or OLTP? What sort of application(s) are going to be accessing the database, if you are going to be doing large numbers of transactions then random disk access is going to be your I/O bottle neck and you will want lots of low capacity high speed disks. If you are building a data warehouse for reporting then you will need to store more data and access will be more sequential so less disks with a higher capacity and a lower speed will make sense. If you are migrating from an older Sql Server then you probably can guage what sort of capacity you require. If not you should do some rough calculations double the results and add a bit more for luck.

Availability and Scale

If your application is enterprise level, has thousands of concurrent users and terrabytes of data and must be available 24/7 365 days a year then you are going to have to shell out the extra £££ for an Enterprise licence. Enterprise has a whole host of features that support availbility and managing very large databases that Std does not. If you have hundreds of users and the buisness does not require the system be available 24/7 and will permit out of office hours down time then Std will be fine.

Server Lifetime

Lifetime of the server? Generally you should be building a server that will give you a guaranteed three years service, it may turn out to be five years before a replacement is needed, a lot of the warranties offered will be five years so this is probably as long as is sensible for a production server. It is almost certain that after three years technology will have of moved on significantly enough to  mean that upgrading to a server twice as powerful server will cost the same or less as the server you are just about to purchase.

 

How to Stop a Batch in SQL Query Analyzer

I was compiling a release script the other day, and needed to check if all data items in a column were unique before adding a Unique constraint to a table. I wanted a way to stop Query Analyzer (QA) from executing the batch if the check revealed there were duplicate items in the column. I tried a few obvious things like Return and raiserror but these didn’t work. I then realised that the issue lay with the GO command. I found this in BOL “GO is not a Transact-SQL statement; it is a command recognized by the osql and isql utilities and SQL Query Analyzer.” unfortunately there isn’t a corresponding STOP command so you cannot easily perform a test in T-SQL and then tell Query Analyzer (QA) to stop processing the rest of the batch.


I did however find a dirty way around this problem. Using a While loop and the Print command I was able to repeatedly display a message to the user explaining the problem and instructing them to cancel the execution. Not very elegant but it did work.


BOL Search Terms: Batches, Batch Processing