Learn Image Text Recognition Using Google Cloud Vision API
Posted By : Mohd Adnan | 30-Apr-2018
According to Google Cloud Vision API Documentation -
Cloud Vision API enables developers to integrate Google CloudVision detection features within applications including face and landmark detection, image labeling, optical character recognition (OCR), and tagging of explicit content.
Prerequisites:
1. Before we begin, we need to set up a project on Google Cloud Developers console (Link: https://console.cloud.google.com/)
2. Enable the Google Cloud Vision API under 'API and Services'
3. Copy the API key under Credentials which looks like this 'AIzaSyCwpab-fbRd6*******ne60NyTkA'
4. Android Studio(3+) with
Let's try our hands on implementation:
Define Permissions in AndroidManifest.xml
Create an Activity where
Define Constants:
private static final String CLOUD_VISION_API_KEY =API_KEY;
public static final String FILE_NAME = "temp.jpg";
private static final String ANDROID_CERT_HEADER = "X-Android-Cert";
private static final String ANDROID_PACKAGE_HEADER = "X-Android-Package";
private static final int MAX_LABEL_RESULTS = 10;
private static final int MAX_DIMENSION = 1200;
private static final String TAG = MainActivity.class.getSimpleName();
private static final int GALLERY_PERMISSIONS_REQUEST = 0;
private static final int GALLERY_IMAGE_REQUEST = 1;
public static final int CAMERA_PERMISSIONS_REQUEST = 2;
public static final int CAMERA_IMAGE_REQUEST = 3;
Note: CLOUD_VISION_API_KEY is a variable where you'll have to define your API Key copied from Google Cloud developers console.
Create a function to start Device Camera
public void startCamera() {
if (PermissionUtils.requestPermission(
this,
CAMERA_PERMISSIONS_REQUEST,
Manifest.permission.READ_EXTERNAL_STORAGE,
Manifest.permission.CAMERA)) {
Intent intent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
Uri photoUri = FileProvider.getUriForFile(this, getApplicationContext().getPackageName() + ".provider", getCameraFile());
intent.putExtra(MediaStore.EXTRA_OUTPUT, photoUri);
intent.addFlags(Intent.FLAG_GRANT_READ_URI_PERMISSION);
startActivityForResult(intent, CAMERA_IMAGE_REQUEST);
}
}
Read the image captured onActivityResult
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data);
if (requestCode == CAMERA_IMAGE_REQUEST && resultCode == RESULT_OK) {
Uri photoUri = FileProvider.getUriForFile(this, getApplicationContext().getPackageName() + ".provider", getCameraFile());
uploadImage(photoUri);
}
}
Create functions to uploadImage to Google Cloud Storage for processing
public void uploadImage(Uri uri) {
if (uri != null) {
try {
// scale the image to save on bandwidth
Bitmap bitmap =
scaleBitmapDown(
MediaStore.Images.Media.getBitmap(getContentResolver(), uri),
MAX_DIMENSION);
callCloudVision(bitmap);
mMainImage.setImageBitmap(bitmap);
} catch (IOException e) {
Log.d(TAG, "Image picking failed because " + e.getMessage());
Toast.makeText(this, R.string.image_picker_error, Toast.LENGTH_LONG).show();
}
} else {
Log.d(TAG, "Image picker gave us a null image.");
Toast.makeText(this, R.string.image_picker_error, Toast.LENGTH_LONG).show();
}
}
private void callCloudVision(final Bitmap bitmap) {
// Switch text to loading
mImageDetails.setText(R.string.loading_message);
// Do the real work in an async task, because we need to use the network anyway
try {
AsyncTask textDetectionTask = new TextDetectionTask(this, prepareAnnotationRequest(bitmap));
labelDetectionTask.execute();
} catch (IOException e) {
Log.d(TAG, "failed to make API request because of other IOException " +
e.getMessage());
}
}
Create an AsyncTask that processes the image for text detection in
private Vision.Images.Annotate prepareAnnotationRequest(Bitmap bitmap) throws IOException {
HttpTransport httpTransport = AndroidHttp.newCompatibleTransport();
JsonFactory jsonFactory = GsonFactory.getDefaultInstance();
VisionRequestInitializer requestInitializer =
new VisionRequestInitializer(CLOUD_VISION_API_KEY) {
/**
* We override this so we can inject important identifying fields into the HTTP
* headers. This enables use of a restricted cloud platform API key.
*/
@Override
protected void initializeVisionRequest(VisionRequest visionRequest)
throws IOException {
super.initializeVisionRequest(visionRequest);
String packageName = getPackageName();
visionRequest.getRequestHeaders().set(ANDROID_PACKAGE_HEADER, packageName);
String sig = PackageManagerUtils.getSignature(getPackageManager(), packageName);
visionRequest.getRequestHeaders().set(ANDROID_CERT_HEADER, sig);
}
};
Vision.Builder builder = new Vision.Builder(httpTransport, jsonFactory, null);
builder.setVisionRequestInitializer(requestInitializer);
Vision vision = builder.build();
BatchAnnotateImagesRequest batchAnnotateImagesRequest =
new BatchAnnotateImagesRequest();
batchAnnotateImagesRequest.setRequests(new ArrayList() {{
AnnotateImageRequest annotateImageRequest = new AnnotateImageRequest();
// Add the image
Image base64EncodedImage = new Image();
// Convert the bitmap to a JPEG
// Just in case it's a format that Android understands but Cloud Vision
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
bitmap.compress(Bitmap.CompressFormat.JPEG, 90, byteArrayOutputStream);
byte[] imageBytes = byteArrayOutputStream.toByteArray();
// Base64 encode the JPEG
base64EncodedImage.encodeContent(imageBytes);
annotateImageRequest.setImage(base64EncodedImage);
// add the features we want
annotateImageRequest.setFeatures(new ArrayList() {{
Feature labelDetection = new Feature();
textDetection.setType("DOCUMENT_TEXT_DETECTION");
add(labelDetection);
}});
// Add the list of one thing to the request
add(annotateImageRequest);
}});
Vision.Images.Annotate annotateRequest =
vision.images().annotate(batchAnnotateImagesRequest);
// Due to a bug: requests to Vision API containing large images fail when GZipped.
annotateRequest.setDisableGZipContent(true);
Log.d(TAG, "created Cloud Vision request object, sending request");
return annotateRequest;
}
private static class TextDetectionTask extends AsyncTask {
private final WeakReference mActivityWeakReference;
private Vision.Images.Annotate mRequest;
TextDetectionTask(MainActivity activity, Vision.Images.Annotate annotate) {
mActivityWeakReference = new WeakReference<>(activity);
mRequest = annotate;
}
@Override
protected String doInBackground(Object... params) {
try {
Log.d(TAG, "created Cloud Vision request object, sending request");
BatchAnnotateImagesResponse response = mRequest.execute();
return convertResponseToString(response);
} catch (GoogleJsonResponseException e) {
Log.d(TAG, "failed to make API request because " + e.getContent());
} catch (IOException e) {
Log.d(TAG, "failed to make API request because of other IOException " +
e.getMessage());
}
return "Cloud Vision API request failed. Check logs for details.";
}
protected void onPostExecute(String result) {
MainActivity activity = mActivityWeakReference.get();
if (activity != null && !activity.isFinishing()) {
TextView imageDetail = activity.findViewById(R.id.image_details);
imageDetail.setText(result);
}
}
}
Create a function that will scale down the bitmap captured to make processing faster
private Bitmap scaleBitmapDown(Bitmap bitmap, int maxDimension) {
int originalWidth = bitmap.getWidth();
int originalHeight = bitmap.getHeight();
int resizedWidth = maxDimension;
int resizedHeight = maxDimension;
if (originalHeight > originalWidth) {
resizedHeight = maxDimension;
resizedWidth = (int) (resizedHeight * (float) originalWidth / (float) originalHeight);
} else if (originalWidth > originalHeight) {
resizedWidth = maxDimension;
resizedHeight = (int) (resizedWidth * (float) originalHeight / (float) originalWidth);
} else if (originalHeight == originalWidth) {
resizedHeight = maxDimension;
resizedWidth = maxDimension;
}
return Bitmap.createScaledBitmap(bitmap, resizedWidth, resizedHeight, false);
}
Finally, create a function to display the text extracted from
private static String convertResponseToString(BatchAnnotateImagesResponse response) {
String result = "";
TextAnnotation fullTextAnnotation = response.getResponses().get(0).getFullTextAnnotation();
if (fullTextAnnotation != null) {
result = fullTextAnnotation.get("text").toString();
} else {
result = "";
}
return result;
}
The variable result will contain the text extracted from
Hope that helps :)
Cookies are important to the proper functioning of a site. To improve your experience, we use cookies to remember log-in details and provide secure log-in, collect statistics to optimize site functionality, and deliver content tailored to your interests. Click Agree and Proceed to accept cookies and go directly to the site or click on View Cookie Settings to see detailed descriptions of the types of cookies and choose whether to accept certain cookies while on the site.
About Author
Mohd Adnan
Adnan, an experienced Backend Developer, boasts a robust expertise spanning multiple technologies, prominently Java. He possesses an extensive grasp of cutting-edge technologies and boasts hands-on proficiency in Core Java, Spring Boot, Hibernate, Apache Kafka messaging queue, Redis, as well as relational databases such as MySQL and PostgreSQL. Adnan consistently delivers invaluable contributions to a variety of client projects, including Vision360 (UK) - Konfer, Bitsclan, Yogamu, Bill Barry DevOps support, enhedu.com, Noorisys, One Infinity- DevOps Setup, and more. He exhibits exceptional analytical skills alongside a creative mindset. Moreover, he possesses a fervent passion for reading books and exploring novel technologies and innovations.