Parse large CSV file in groovy and grails or java using multithreading
Posted By : Shakil Pathan | 30-Sep-2015
Hi,
In this blog, I am going to explain you about how to split a large CSV file and parse the split files using multithreading.
In one of my projects, I have to parse a CSV file and have to save in the data, the file was having more than one lac rows, which was taking time when I was parsing it sequentially.
Then I split the file into different files and parses the files simultaneously. Here is the code for splitting the file :
String splitPath = directory + '/productFeedSplit/' String filePrefix = 'splits' String fileExtension = '.txt' int i int fileNumber if(!dataFile.exists()) { log.debug "File does not exist" } else { i=0 fileNumber = 0 new File(splitPath).mkdir() File fileToWrite = new File(splitPath + filePrefix + fileNumber + fileExtension) dataFile.eachLine { line-> if(i>10000) { i=0 fileNumber+=1 fileToWrite = new File(splitPath + filePrefix + fileNumber + fileExtension) } i=i+1 fileToWrite << ("$line\r\n") } }
Then I parse multiple CSV files using csv parser grails plugin using multiple threads. Here is the code :
def allThreads = [] fileNumber.times { number -> File newFile = new File(splitPath + filePrefix + number + fileExtension) allThreads << Thread.start { newFile.toCsvReader(['skipLines': 1, 'quoteChar': '\u0000']).eachLine { tokens -> Product.withTransaction { Product productExist = Product.findByProductId(tokens[0]) if(productExist == null) { Product newProduct = new Product() newProduct.productId = tokens[0] newProduct.brand = tokens[1] newProduct.category = tokens[2] newProduct.name = tokens[3] newProduct.description = tokens[4] newProduct.imageUrl = tokens[5] newProduct.link = tokens[6] newProduct.price = tokens[7].toDouble() newProduct.sku = tokens[8].toInteger() newProduct.importDate = new Date() if(!newProduct.save(flush: true)) { log.debug "newProduct errors " + newProduct.errors } } } } } } allThreads.each { it.join() } // TODO delete files and folders and update and delete products new File(splitPath).deleteDir() if( fileToProgress.exists() ){ fileToProgress.delete() }
In the above code I am saving the tokens from the files in product domain.
Hope It helps!
THANKS
Cookies are important to the proper functioning of a site. To improve your experience, we use cookies to remember log-in details and provide secure log-in, collect statistics to optimize site functionality, and deliver content tailored to your interests. Click Agree and Proceed to accept cookies and go directly to the site or click on View Cookie Settings to see detailed descriptions of the types of cookies and choose whether to accept certain cookies while on the site.
About Author
Shakil Pathan
Shakil is an experienced Groovy and Grails developer . He has also worked extensively on developing STB applications using NetGem .