Spring Data JPA Batch Insertion

Posted By : Rahul Chauhan | 23-Apr-2019

Java

1. Overview
Going out to the database is expensive. We may be able to improve performance and consistency by batching multiple inserts into one.
In this tutorial, we’ll look at how to do this with Spring Data JPA.

2. Spring JPA Repository
First, we’ll need a simple entity. Let’s call it Customer:

 @Entity  
 public class Customer {  
   @Id  
   @GeneratedValue(strategy = GenerationType.AUTO)  
   private Long id;  
   private String firstName;  
   private String lastName;  
   // constructor, getters, setters   
 }

And then, we need our repository:

 public interface CustomerRepository extends CrudRepository<Customer, Long> {  
 }

This exposes a saveAll method for us, which will batch several inserts into one.
So, let’s leverage that in a controller:

 @RestController  
 public class CustomerController {    
   @Autowired  
   CustomerRepository customerRepository;    
   @PostMapping("/customers")  
   public ResponseEntity<String> insertCustomers() {      
     Customer c1 = new Customer("James", "Gosling");  
     Customer c2 = new Customer("Doug", "Lea");  
     Customer c3 = new Customer("Martin", "Fowler");  
     Customer c4 = new Customer("Brian", "Goetz");  
     List<Customer> customers = Arrays.asList(c1, c2, c3, c4);  
     customerRepository.saveAll(customers);  
     return ResponseEntity.created("/customers");  
   }  
   // ... @GetMapping to read customers  
 }

3. Testing Our Endpoint
Testing our code is simple with MockMvc:

 @Autowired  
 private MockMvc mockMvc;  
 @Test  
 public void whenInsertingCustomers_thenCustomersAreCreated() throws Exception {  
   this.mockMvc.perform(post("/customers"))  
    .andExpect(status().isCreated()));  
 }

4. Are We Sure We’re Batching?
Actually, there is just a bit more configuration to do – let’s do a quick demo to illustrate the difference.
First, let’s add the following property to application.properties to see some statistics:

 spring.jpa.properties.hibernate.generate_statistics=true

At this point, if we run the test, we’ll see stats like the following:

 11232586 nanoseconds spent preparing 4 JDBC statements;  
 4076610 nanoseconds spent executing 4 JDBC statements;  
 0 nanoseconds spent executing 0 JDBC batches;

So, we created four customers, which is great, but note that none of them were inside a batch.
The reason is that batching is not switched on by default in some cases.
In our case, it’s because we are using id auto-generation. So, by default, saveAll does each insert separately.
So, let’s switch it on:

 spring.jpa.properties.hibernate.jdbc.batch_size=4  
 spring.jpa.properties.hibernate.order_inserts=true

The first property tells Hibernate to inserts in batches of four. The order_inserts property tells Hibernate to take the time to group inserts by the entity, creating larger batches.
So, the second time we run our test, we’ll see the inserts were batched:

 16577314 nanoseconds spent preparing 4 JDBC statements;  
 2207548 nanoseconds spent executing 4 JDBC statements;  
 2003005 nanoseconds spent executing 1 JDBC batches;

We can use the same approach to deletes and updates (remembering that Hibernate also has an order_updates property).

5. Conclusion
By using the batch inserts, we can see some performance gains.
We need to be aware that batching is automatically disabled in some cases, and we should check and plan for this before we ship.