Moving files from S3 to Google Storage with Camel

Enterprise integration is hard and complicated without tools.  I used a ESB, CapeClear, for my last project, and was really amazed by its capability.  Integration with different systems in the cloud is a norm, but ESB or middleware are quite expensive and may be overkill for small projects.  A friend mentioned Apache Camel to me last week, and I found it has done a great job on realizing all the well known Enterprise Integration Patterns(EIPs).  It looks like it can be a very useful tool for integrating various services in the cloud.

I did an experiment with Camel to integrate Amazon S3 with Google Storage over the weekend.  You will find the source code at https://github.com/barryku/SpringCloud/tree/master/CamelApp.  You can run this program with Maven easily.  Just put in your Amazon and Google Storage’s credentials and bucket names to src/main/resources/META-INF/spring/spring.properties, and run mvn compile exec:java -Dexec.mainClass=com.barryku.camel.FileCopy.

The Java code below shows you how you can move files from S3 to Google Storage with just a few lines of code using Camel,

CamelContext context = (CamelContext) springContext.getBean("camelContext");
context.addRoutes(new RouteBuilder() {

	@Override
	public void configure() throws Exception {
		from("s3file:///").beanRef("gsFileManager", "process");
	}
});

context.start();
Thread.sleep(3000);
context.stop();

There are a lot of ready to use components(endpoints) in Camel you can leverage with your projects, but nothing for S3 and Google Storage yet.  That’s why I spent last weekend to write my own implementation. S3 has been around for a while, so its Java API is mature and easy to use comparing to Google Storage’s. Even with the luck of finding an alpha version of google-api-java-client with Google Storage support early on, I still went through a lot of trouble getting it working with my program.  Anyway, let me show you the primary logic of working with both S3 and Google Storage.  The following is my s3file component implementation,

AmazonS3 s3 = new AmazonS3Client(
		new BasicAWSCredentials(apiKey, apiKeySecret));
boolean isRootFolder = path.equals("/");
ObjectListing objList = isRootFolder ? s3.listObjects(bucket) : s3.listObjects(bucket, path);

for (S3ObjectSummary summary:objList.getObjectSummaries()) {
	//ignore folders
	if(! summary.getKey().endsWith(FOLDER_SUFFIX)){
		S3Object obj = s3.getObject(
				new GetObjectRequest(bucket, summary.getKey()));
		logger.info("retrieving " + summary.getKey());
		FileOutputStream fout = new FileOutputStream(TEMP_FOLDER + (isRootFolder ? "/" + summary.getKey():
				summary.getKey().substring(path.length())));
		InputStream in = obj.getObjectContent();
		byte[] buf = new byte[1024];
	    int len;
	    while ((len = in.read(buf)) > 0){
	      fout.write(buf, 0, len);
	     }
	     in.close();
	     fout.close();
	}
}

All S3 files are downloaded to a local temp folder when the Camel route component, s3file, is being set up. The code should be self-explanatory. On the other hand, working with GS is similar to working with a typical restful Web Service. However, it can be quite tricky to construct a restful GS request without the help of its Java client API. The following is the code that finally worked for me,

HttpTransport transport = GoogleTransport.create();
GoogleStorageAuthentication.authorize(transport, apiKey, apiKeySecret);

HttpRequest request = transport.buildPutRequest();
InputStreamContent isc = new InputStreamContent();
isc.inputStream = new ByteArrayInputStream(content);
isc.type = type;
request.content = isc;
request.url = new GenericUrl(url + bucket + "/" + URLEncoder.encode(fileName, "utf8"));
GoogleHeaders headers = (GoogleHeaders) request.headers;
headers.date = httpDateFormat.format(new Date());
try {
	HttpResponse response = request.execute();
	logger.info(fileName + " uploaded");
	//workaround for timeout issue after 3 consecutive connections
	//consume the response will ensure Appache's HTTP client close connection
	getStreamContent(response.getContent());
} catch (HttpResponseException e) {
	logger.warn(getStreamContent(e.response.getContent()), e);
}

Although my implementation for s3file component and the gsFileManager bean works, it’s still kludgy. I plan to refactor later to make them more like real Camel components.  Here are a few tips that may save you time when seeing those 403 or other hard to interpret errors working with google-api-java-client.

  1. For Get requests for a given bucket, you must add a slash at the end of URL if you are using the format of http://bucketname.commondatastorage.googleapis.com/
  2. For Put requests, you must set type of your InputStreamContent.
  3. You will get MalformedHeaderValue error if you use PST, however, pst and GMT work just fine.
This entry was posted in Amazon WS, Cloud, Google, SOA and tagged , , . Bookmark the permalink.

2 Responses to Moving files from S3 to Google Storage with Camel

  1. Dave says:

    Hi,
    I have been looking for an S3 component and have come across your post. I can’t see any license, but given it’s on github I assume it’s unlicensed/public domain?

    I am wanting to extend your S3File component to do pushing of files to S3 as well in an application for a paying client so the licensing needs to be clear.

    Thanks,
    Dave

    • barry says:

      I don’t know much about the licensing stuff, so I will say it’s unlicensed. Feel free to use it in your project.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>