Parse large CSV file in groovy and grails or java using multithreading

Posted By : Shakil Pathan | 30-Sep-2015

Hi,

In this blog, I am going to explain you about how to split a large CSV file and parse the split files using multithreading.

In one of my projects, I have to parse a CSV file and have to save in the data, the file was having more than one lac rows, which was taking time when I was parsing it sequentially.

Then I split the file into different files and parses the files simultaneously. Here is the code for splitting the file :

		String splitPath = directory + '/productFeedSplit/'
		String filePrefix = 'splits'
		String fileExtension = '.txt'
		int i
		int fileNumber
		if(!dataFile.exists()) {
			log.debug "File does not exist"
		}
		else {
			i=0
			fileNumber = 0
			new File(splitPath).mkdir()
			File fileToWrite = new File(splitPath + filePrefix + fileNumber + fileExtension)
			dataFile.eachLine { line->
				if(i>10000) {
					i=0
					fileNumber+=1
					fileToWrite = new File(splitPath + filePrefix + fileNumber + fileExtension)
				}
				i=i+1
				fileToWrite << ("$line\r\n")
			}
		}

Then I parse multiple CSV files using csv parser grails plugin using multiple threads. Here is the code :

		def allThreads = []
		fileNumber.times { number ->
			File newFile = new File(splitPath + filePrefix + number + fileExtension)
			allThreads << Thread.start {
				newFile.toCsvReader(['skipLines': 1, 'quoteChar': '\u0000']).eachLine { tokens ->
					Product.withTransaction {
						Product productExist = Product.findByProductId(tokens[0])
						if(productExist == null) {
							Product newProduct = new Product()
							newProduct.productId = tokens[0]
							newProduct.brand = tokens[1]
							newProduct.category = tokens[2]
							newProduct.name = tokens[3]
							newProduct.description = tokens[4]
							newProduct.imageUrl = tokens[5]
							newProduct.link = tokens[6]
							newProduct.price = tokens[7].toDouble()
							newProduct.sku = tokens[8].toInteger()
							newProduct.importDate = new Date()
							if(!newProduct.save(flush: true)) {
								log.debug "newProduct errors " + newProduct.errors
							}
						}
					}
				}
			}
		}
		allThreads.each { it.join() }

		// TODO delete files and folders and update and delete products
		new File(splitPath).deleteDir()
		if( fileToProgress.exists() ){
			fileToProgress.delete()
		}

In the above code I am saving the tokens from the files in product domain.

Hope It helps!

 

THANKS

About Author

Author Image
Shakil Pathan

Shakil is an experienced Groovy and Grails developer . He has also worked extensively on developing STB applications using NetGem .

Request for Proposal

Name is required

Comment is required

Sending message..