Amazon SimpleDB Integration

As you may already be aware, Amazon do a little more than just sell books. They have quietly and slowly changing the way we think of web services and cloud computing. With their ondemands servers (EC2), unlimited file storage (S3), messaging (SQS), databases (SimpleDB) they are truely making us all rethink how we architect tomorrows systems.

Amazon's SimpleDB service, is a system for storing and querying data without any consideration for scaling and storage. You just use it. Priced like all their other systems, the more you use the more you pay.

Like all of Amazon's web services this is another powerful tool to add to their arsenal, and now you as a CFML developer can easily get at this service through OpenBD's CFQUERY extension.

Amazon Simple DB Basics

Amazon SimpleDB is not a true relational database. Instead you can think of it as a series of Hashtables, stored in a single domain. A domain in the Amazon-speak is not dissimilar to a table in SQL, where their Item is close in thinking to a row. In a traditional database, you have a fixed number of columns per table, but in Amazon thats not the case, you can have up to 255 attributes (or columns) in a row and do not all have to be defined.

There is also no concept of a table/domain definition. You just add/delete data, with the assumption that you provide a unique identifier for each row or Item. Another small gotcha is that all your data is stored as literal strings. So 10 would be stored as "10". The only time you need to worry about that is when doing less-than greater-than queries, as they would be performed at a lexical level.

  • Amazon stores data in Domains then Items, with each item having any number of attributes
  • Maximum of 100 domains per user account
  • Maximum of 10GB per domain
  • All data is stored as strings; got to be careful when doing ItemA > ItemB as its lexical comparisons not numeric (ie pad out numbers)
  • No individual item can be over 1k in size
  • Maximum attributes per row/item is 255
  • Each row has a unique ID (ItemName)

Pricing for Amazon SimpleDB can be found here, but they charge per-GB on data going in and out, and the amount of CPU time your query takes.

CFML Integration

Getting access to this functionality is very easy with OpenBD. We've added Simple DB functionality to our official engine that is available for use. When looking at providing access to this service, we debated whether it should be a set of functions, new set of tags or something else. The answer was staring us all in the face; CFQUERY.

CFQUERY is of course the CFML window into data storage, and the original creators of this tag already had built in future extensibility by the utilising the dbtype="" attribute. Historically only really used to differientiate between a SQL Query and a Query-of-Queries query. So we added a new dbtype; amazon.

This lets you build INSERT / DELETE / SELECT statements for accessing data sitting inside of Amazon SimpleDB.

If that wasn't cool and easy enough, the real side effect of using CFQUERY for your Amazon SimpleDB API is that it literally saves you money. For each request you make of Amazon SimpleDB, it costs money. But by utilising the inbuilt query caching of CFQUERY (including the OpenBD caching enhancements) you don't need to query Amazon half as much as you would normally would.

Lets look at some sample code, and how else OpenBD helps you interact with Amazon SimpleDB.

Sample Code

First of all, we didn't get away with not implementing some functions. These were purely to assist in the creation of the Amazon datasource and the high level management of domains.

<cfset amazonDS = AmazonRegisterDataSource( "MyIdentifier", awsAccessId, awsSecretKey )>

<cfset AmazonSimpleDbCreateDomain( amazonDS, "mydomain" )>
<cfset AmazonSimpleDbDeleteDomain( amazonDS, "mydomain" )>
<cfset qry = AmazonSimpleDbListDomains( amazonDS )>

The first function AmazonRegisterDataSource() sets up the Amazon datasource, and once done you won't need to do it again. You don't even need to keep a reference to it, because all that is returned is a String object that will be your reference to it. This call takes in your two Amazon AWS access codes which opens up the world of Amazon to you.

To create a new domain you simple call AmazonSimpleDbCreateDomain() passing in the Amazon datasource and the name you want your domain to be. Similiarly deleting the domain is performed using the AmazonSimpleDbDeleteDomain() function. You can get a CFML query back of all your current domains by calling AmazonSimpleDbListDomains().

Inserting data

Let us start by inserting data into our domain. We are all familiar with the INSERT syntax for SQL, so you'll be able to dump data straight into your Amazon SimpleDB very quickly.

<cfquery dbtype="amazon" datasource="#amazonDS#">
  insert into mydomainname (ItemName, "name", "age") values (
  <cfqueryparam value="">,
  <cfqueryparam value="#session.age#">)

As you can see, it is a standard INSERT statement, complete with CFQUERYPARAM tags to help you format your data. Please note though, when inserting you will need to provide in the column list, ItemName. This is the unique identifier, or index, for you row. If the row already exists, then the attritbute columns are overwritten.

Deleting data

Deleting data is equally as painless, except there are two types of deletes. You can delete an attribute from a given row, or you can delete the complete row. Remember, Amazon charges you for the data, whether you use it or not, so the ability to delete a given column in a given row is very powerful (and cost effective!).

<cfquery dbtype="amazon" datasource="#amazonDS#">
  delete from mydomainname
  where ItemName='myrowid'
  [AND ItemAttribute='myattribute']

So here you can see you either delete the whole row by specifying the unique id to the ItemName column, or you delete a specific attribute using the ItemAttribute keyword.

Note, Amazon does warn that due to the way their system operates and synchronizes, if you do an add or delete of data, then it may not be immediately available if you query for it straight after. In practice though, we haven't noticed this.

Selecting data

So now that you have data sitting within Amazon SimpleDB, you will no doubt want to pull it back out. This is done with the SELECT statement, within a CFQUERY tag. Remember, you can utilise the caching techniques of CFQUERY to increase performance.

<cfquery dbtype="amazon" datasource="#amazonDS#" name="qry">
  select * from ItemAttribute
  where domain='mydomain' and ItemName='myrowid'

Here we are pulling back all the attributes for a given item or row. This will be a single row query. This may seem a little strange, but this maps onto how Amazon manage their data.

So the question becomes, what ItemName's do I need to pull back based on a given criteria or query. You can easily determine using the following SELECT statement.

<cfquery dbtype="amazon" datasource="#amazonDS#" name="qry">
  select ItemName,NextToken from mydomain
  where [Amazon Query]
  limit [nexttoken,],5

This one may require a little explanation as its more of an Amazon issue. Amazon has no real notion of paging results. You can get a maximum of 250 items back in one go, and to get the next set you must pass back a special token that will allow Amazon to get you the next set. This token is taken from the previous query resultset.

Querying the data, you use Amazon's special query language, which isn't dissimilar to how some SQL databases format their commands.

For example, in our example, to query for all people that are in their 20's we would write the following.

<cfquery dbtype="amazon" datasource="#amazonDS#" name="qry">
  select ItemName,NextToken from mydomain
  where ['age'>'19' AND 'age'<'30']
  limit 100

Recall we said that all data is stored as pure string's. This means your queries may look a little strange at first. But you soon get use to it.

However, you may be wondering how you can manage numbers of unequal length. We've added a new attribute to CFQUERYPARAM, called PADDING="" that lets you specify the number of leading zero's to a number if the value passed in is a number.

If you use the CFQUERYPARAM for inserting your data, OpenBD will figure out the best way to represent your data within Amazon SimpleDB so querying for it doesn't cause any bizarre side effects. For example, date objects can cause problems if not careful. Best to stick to using CFQUERYPARAM.

Other Functions for Simple DB

There are functions that let you operate with all of the services provided by SimpleDB.

Function Name Description
AmazonSimpledbCreateDomain Creates a new SimpleDB domain for storing data
AmazonSimpledbDeleteAttribute Deletes the attribute (and optional value) from the ItemName inside the domain
AmazonSimpledbDeleteDomain Deletes a SimpleDB domain, removing all data immediately
AmazonSimpledbGetAttributes Gets all the attributes for the given domain and ItemName. Supports the consistentread flag of SimpleDB.
AmazonSimpledbListDomains Lists all the domains within this datasource
AmazonSimpledbSetAttribute Sets the attribute (and optional value) to the ItemName inside the domain
AmazonSimpledbSetStruct Sets all the attributes in data to the ItemName in domain