Background Details About BULK API

Note: The below calculations are based on the sample record count = 60,326 records.

1. Created a test file with 60,326 records and I am trying to insert into contact Object.

2. Given batch size as 10K and enabled “Use Bulk API” check box in data loader.

3. Now, data loader will split the 10K records as one batch and each batch will continuous records. But, the processing might start from any batch.
Meaning, below is the sample batches:
batch-1 = 20k to 30K,
batch-2 = 1 to 10k,
batch-3 = 50k to 60k,
batch-4 = 40k to 50k,
batch-5 = remaining 326 records.
In Data loader enable the below checkbox:
 If we enable the above permission then please observe the below processing diagram.
4. If you observe the above diagram clearly then we can easily understand the meaning of the Asynchronous process. All batches will start processing parallely but we will be under impression that each batch will start only if it collects 10K records [because we have given batch size as 10k] which is completely wrong. 

5. Batch will start processing after collecting a chunk of records(which contains 200 records) and it will continuously add 200 records (only) at a time till 10K(assuming I given batch size as 10k). 

Note: There is a small difference between Batch and chunk. In this example 10k Records we need to call as “batch” and every time it is proceeding 200 records this we call as “chunk”.

6. The total number of debug logs generated for this 60,326 records =7 debug logs. 

7. Suppose if there is a trigger like below 

trigger TheTrigger on Account (before insert, after insert)
{
    System.debug('----Entering into Trigger---'+Trigger.size());
}

8. In each debug log trigger will invoke 50 times. That is 50*200=10,000 records = 1 debug log.

9. As per this example, 10K records are considered as 1 transaction. You might get a doubt how we can justify each batch is 1 transaction? To prove this I have captured debug logs. We have total 7 debug logs generated right, I opened all and in each debug log only one time "EXECUTION_STARTED",  "EXECUTION_FINISHED" is present. Meaning, as per this document  

“An execution unit is equivalent to a transaction. It contains everything that occurred within the transaction. EXECUTION_STARTED and EXECUTION_FINISHED delimit an execution unit.”

10. We have enabled bulk mode to process records. So, the batches will process asynchronously, assume there is a trigger involved on the same object, the question here is will trigger execute in Sync mode or Async mode?
Answer: Still the trigger will execute in sync mode only.

11. Now you might think then what is the use of enabling bulk mode? Even after enabling, triggers are executing in Syn mode?
Answer: If “Bulk API” is enabled then salesforce is not recommending to use triggers instead we need to implement that logic in batch class and process those records once the load is done. What is the proof? Please refer to this document.

12. We cannot say BULK API is the ultimate solution to process large sets of records because there are a few considerations we need to think before using bulk API otherwise it leads to many problems. For example:
  • A common issue is ROW Lock.
  • Updating ownership for records with private sharing. etc.,
To avoid these types of issues we can use serial mode with Bulk API.

13. If we enable serial mode with bulk processing in data loader
If we enable the above permission then please observe the below processing diagram.
From the above diagram, it is clear that at any point only 1 batch will start processing and remaining batches will enter into Queue. Again for this process debug logs, processing time all will change.

The interesting fact is that the above limits are only for DML operations (Insert, Update, Upsert, and Delete).

For extracting/ retrieving/ querying the records, Salesforce is providing additional limits. Please refer to this document for additional details about Bulk Query.

tHiNk gooD and dO thE bEsT.........MANJU NATH 🌝

Comments