How to do DynamoDB transactions with more than max allowed items

[[DynamoDB]] allows you to write (Put, Update or Delete) multiple items across multiple tables within a single atomic transaction using the [TransactWriteItems API operation](https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_TransactWriteItems.html). If you have a highly concurrent application, you can include a `ConditionExpression` in your individual write items to ensure that the write will only be committed if a certain field on the item being updated matches an expected field (e.g. `status='AVAILABLE'`). You can also use the top-level `ConditionCheck` option to apply a condition to an item in a table that is not being modified by the transaction. However, the problem is that the TransactWriteItems operation has an upper limit of 25 items that can be updated in a single transaction. If you have a use case where more than this number need to be committed, you can't do this in a single atomic operation and need to consider alternative approaches to ensure data integrity. The easiest solution for this quandary is to avoid the problem by passing this limit back to your users/API consumers by simply rejecting any requests to perform a transaction with more than 25 items. If this isn't an option, another solution is to try to re-architect your data model such that the entities being acted upon aren't all stored in separate DynamoDB items (e.g. have them as an array inside a single item). However, these solutions may not be suitable for your use case, so you may still need to consider another process which updates all the items in your DynamoDB table in a suitable manner. CAUTION: The following are think-out-loud ideas that I haven't tested or used yet. ## Idea 1 - Pessimistic locking on a top-level parent item using a saga pattern (co-ordinated by a Step Functions state machine) If there is a parent entity above the items being updated in your DynamoDB table, an optional field could be added to it to signify that it has been locked and its children should not be updated. If there's not a natural parent entity, then you could introduce a global item in your DynamoDB table to store this field. *Important: this solution relies on the fact that all clients with access to the DynamoDB table(s) in question respect this lock field when performing any writes. Integrity is not enforced by DynamoDB.* For this example, I have two entities stored in a single DynamoDB table: Product and InventoryUnit. A Product has multiple InventoryUnits beneath it. A User can order up to 200 InventoryUnits of a Product in a single transaction. Remember that MAX_TRANSACT_WRITE_ITEMS=25 To commit this transaction is a multi-step process: 1. Prepare DynamoDB WriteItems for each InventoryUnit item to be written (Updates) and group them into batches of size MAX_TRANSACT_WRITE_ITEMS. 2. If batch count == 1, skip to step 6 (and don't include ConditionCheck with reference to lockToken) 3. Generate a unique lockToken (e.g. a ULID) `generatedLockToken` 4. Acquire lock on the Product in question(PRODUCT_1) with a conditional Update operation: - Update: PRODUCT_1, { $set: { lockToken: generatedLockToken }} - ConditionExpression { attribute_not_exists(lockToken) } 5. If update fails, Wait and then retry previous step. 6. If update succeeds, loop in series over each batch of InventoryUnits to be written: - Perform a TransactWriteItems as follows: - For each InventoryUnit in the batch: - Update: ITEM_n, { $set: { status: 'SOLD', soldToUserId: userId }} - ConditionExpression: { status: 'AVAILABLE', attribute_not_exists(soldToUserId) } - ConditionCheck: Key=PRODUCT_1, { lockToken=generatedLockToken } - If txn for batch succeeds, keep a copy of the written items in a `writtenBatches array - If txn for a batch fails, retry n times with exponential backoff (if error is transient). 7. If a txn in the batch loop failed (after retries), rollback the changed items by looping over `writtenBatches` and performing TransactWriteItems for each batch as follows: - For each InventoryUnit in the batch: - Update: ITEM_n, { $set: { status: 'AVAILABLE'}, $unset: { soldToUserId }} - ConditionExpression: { status: 'SOLD' } - ConditionCheck: Key=PRODUCT_1, { lockToken=generatedLockToken } 8. Release lock on Product: - Update: PRODUCT_1, { $unset: { lockToken }} To co-ordinate the steps in this process, it might be best to do it inside a Step Functions state machine which has built in primitives for waiting, retrying and error handling. ### Assumptions & limitations of this solution - As earlier stated, all other clients of the DynamoDB table need to respect the `lockToken` field on the Product before marking child InventoryUnits as SOLD. - You know what values to revert fields to if a transaction fails (e.g. I knew to set status back to AVAILABLE). If this is not the case, you would need to read back the full DynamoDB InventoryUnit item before creating the update request. - Potential for "dirty" reads: Any reads of the InventoryUnits while in the middle of the transaction processing could result in some showing as AVAILABLE and some as SOLD. This may be ok in your use case, but is important to note. ### Potential enhancements #### To address "dirty" reads *(based on [Pete Naylor (DynamoDB Product Manager)'s suggestion](https://twitter.com/pete_aws/status/1498887221188857858))* If a client reads an InventoryUnit item with status=SOLD but whose parent Product is still "locked" by another client, this InventoryUnit could potentially be rolled back, and so the client might make a false assumption based on this data. To resolve this, the InventoryUnit item could store more contextual data to allow the client to determine if it's in a "commited" state or a transient one subject to rollback. This could be achieved by setting the following new attributes: - On the Product item: - `productVersion`. This would be set to the `generatedLockToken` ULID value whenever the lock is released on a Product in the case of a successful batch of TransactWrites. - On the InventoryUnit item: - `productVersion` (string) and `itemVersions`(array). `productVersion` would be set to the `generatedLockToken` ULID value in the same TransactWriteItems request which sets the `status` and other updated fields. A copy of the old version of the InventoryUnit item (or at least a subset of all the changeable fields along with the old `productVersion` value) should be appended to the `itemVersions` array. In the case of a rollback, the `productVersion` field would need to be reverted to its previous value. This would allow clients reading back InventoryUnits and its associated Product item to determine if the `productVersion` on both entities match, and if not, it can assume that an InventoryUnit item is in the process of being updated and that its old values are accessible through the `itemVersions` field. (It may be possible that instead of an `itemVersions` array, you only need to store a single `lastItemVersion` map/object field) --- ## References - https://github.com/awslabs/dynamodb-transactions/blob/master/DESIGN.md - [My Twitter question on this](https://twitter.com/paulswail/status/1498700915783737359)