If you’re new to DynamoDB, chances are that all you’ve heard so far would be “Query is the way to go, do NOT use Scan Operations”. Which is true in most cases, but in what cases are they discouraged and “Scan” is encouraged ?
Let’s say you have a DB for Payment Transactions at a product selling website. You would most likely have the unique ‘Primary partition key’ as the Transaction ID. And a couple of Global Secondary Indices (GSI) for SellerID/ProductID and BuyerID. These would be used to “Query” for User Transaction History and Product Sales Information by the Users of the platform.
However there are so many other parameters upon which you would like your transaction data to be fetched. Lets say at the end of the month, you’d want the Monthly sales data for tax purposes. Or maybe fetch sales on the basis of a category of products. In DynamoDB you can’t just “SELECT * FROM” and do other such structured complex queries. So do you create a GSI for all these 5–10 cases ? Absolutely not.
Here’s why :
Whenever you write/update an Item in a table, you don’t just consume 1 WCU. It is simultaneously updated in all the GSIs as well which consume more WCU. So, your WCU become 1+(No. of GSI). And with 10–12 GSI for queries you would end up paying upto 1000% percent more than you should have, or worse, you’ll throttle your DB at scale. This will happen even if you don’t use those GSI. Their mere existence will skyrocket your WCUs and it’s capacity (or billing in case for On-demand capacity).
Solution ?
Scan Operations. Any query not requested frequently, especially the ones which the users won’t request should NOT be a GSI based “query”. All Internal DB Operations (Getting the no. of Transactions, Volume, Weekly/monthly data etc etc) should be done via Scan Operations.