Skip Links

Blog

Posts tagged with "aws".

Elastic Beanstalk

Sid

Sid

25 Jan 2011 14:04

I was at a webinar last night about AWS Elastic Beanstalk, Amazon’s newly released web app cloud deployment platform.

The timing of the webinar (10 a.m. PST == dinner time in the UK) meant that I had to leave before the end, but it was a pretty interesting introduction and left me wanting to check out.

AWS Elastic Beanstalk is a Java-Tomcat stack that allows easy deployment of JEE applications (via AWS itself, via an Eclipse toolkit, via the command line, or SDK/API). Perhaps the best thing about deployment is that you can change a bunch of configuration items a lot easier than having to redeploy.

The usual but incredibly useful stuff about scalability and growth being taken care for you holds just as it does for Google App Engine for Java. As does the fact that there’s a console with which you can monitor your apps – again via the AWS dashboard.

It comes with database support (MySQL, SQL*Server, Oracle, DB2). JRuby support is also in there and the stack seems more standard that the GAEJ stack. That coupled with the fact that you have root access so can go low-level with root access means a lot more flexibility than GAEJ gives.

AWS have a sample app that can be deployed so you can see although I’m not sure I get the point of that really (won’t anybody inclined to or capable of deploying will probably have their own sample Java web apps they’d like to try out anyway?).

Other platforms are planned, but Amazon were coy about even mentioning let alone committing to a roadmap on the webinar. I figured they just hadn’t decided yet but that’s no matter.

We’re planning to try it out in the next month or so with a JRuby port of Jair so will post up our findings as we go.

Tagged in: cloud, aws, beanstalk, java, paas, jair, yield management

Jair, JRuby, and Amazon Web Services

Sid

Sid

10 Mar 2011 13:04

The last post a good month and a half ago spoke about porting our revenue management product Jair to JRuby and deploying on Amazon’s Elastic Beanstalk.

Well, we’ve ported the app and apart from a few hiccups with gem versions not being supported in JRuby – we had to upgrade from the CSV to FasterCSV gem which I guess is no bad thing! – all went smoothly.

I really like JRuby as there’s something satisfying about writing in Ruby but deploying on to Tomcat. Even more satisfying is when the application actually runs up on Tomcat!

I’m not yet fully convinced as on shutdown Tomcat issues dire warnings about memory leaks being “likely to happen” but we’re going to do some application profiling soon so we’ll see.

Rather than use Elastic Beanstalk (only available in Amazon’s US regions at the moment) we just deployed on to a customised Bitnami Tomcat stack on AWS EC2. Again, this seemed a lot easier than the last time I tried although perhaps that’s down to having the right tools installed beforehand(!).

I’ve got a bunch of tips / things that we learned from the porting exercise and also AWS deployment so I’ll post those up in the next few days.

For us, AWS rather than GAEJ, is the way forward. Although there are different services on GAEJ (can only think of caching and IM offhand) so it could be useful for some apps, but how long before AWS offer the same? As well as that with AWS there is the freedom to install things like memcached etc.

I do feel a bit guilty though every day at 4p.m. when the True North chatbot (one of our apps on the Google App Engine) pings me on IM … maybe it’ll be its turn for porting next.

Tagged in: aws, gaej, revenue management, jruby, cloud, saas, amazon, google

Using Filezilla for file transfer to AWS

Sid

Sid

29 Mar 2011 08:11

Although (at least with the bitnami stack) you can deploy your WAR files to Tomcat using the manager, sometimes you need to upload other files (e.g. database scripts) to your AWS instance.

I like using Filezilla as it makes FTP easy, so was happy to find I could use it to move local files to my EC2 instance.

It needs a bit of set up in order to use your AWS key but it’s pretty simple. Here’s a quick how to. You’ll need the following:

  • Putty – You’ll probably be already using this to connect to AWSdownload page
  • Pageant – An SSH authentication agent for Putty – download page
  • Your AWS key pair file converted to a Putty PPK file. There are a few resources that descrive how to convert a PEM to a PPK file. Basically you do this using PuTTYGen from the same download location as above.

Once you’ve installed all of the above, launch Pageant (in Windows it lives in your system tray) and then add your PPK converted key-pair file.

Then launch Filezilla and make a new connection to your EC2 instance using its public name. You can find this through right-clicking on the instance and then choosing connect. A dialog box comes up with the connection details.

Make sure that you select SFTPSSH File Transfer Protocol as per the image and have the right user name and then you’re done. Happy transferring!

Tagged in: aws, saas, filezilla, putty, file transfer, guides

Integrating LinkedIn data with our contacts app

Sid

Sid

06 Apr 2011 16:30

Last week we ran up a JRuby contacts tracking app which we deployed on AWS. It’s small but beautifully-formed – insert your own joke now – and is a lot better than our having to exchange emails with contacts and leads.

Despite that, yesterday I got fed up of copying contacts from LinkedIn to the app so I thought I’d add a way the user can browse and add their LinkedIn contacts. This was straightforward enough to do in a day, but also had a few gotchas along the way so after a quick bit of background I thought I’d share those.

LinkedIn provide both a JS and a REST API to their
services
and split the API in to three primary domains:

  • Profile API – to get information about people (e.g. name, role, company, etc.)
  • Connections – provides a list of connections for a user
  • People Search API – which mimics the search capabilities when you log in to LinkedIn

There are a couple of other APIs around Invitations and Sharing updates etc but those were less of interest to what we were trying to achieve.

Our goal was pretty simple – to save data entry by accessing our connections from LinkedIn and copying their name, role, company, and LinkedIn profile to our contacts application.

This broke down in to 5 technical tasks:

  1. Register our contacts application with LinkedIn (needed for the authentication
    step)
  2. Authenticate using OAuth and the tokens provided from registration
  3. Retrieve connections
  4. Pull profile information from each connection
  5. Save contact information in to the contacts database

LinkedIn uses OAuth to authenticate and authorize and having had fun in the past with OAuth and Google I was glad this time to see that Wynn’s API not only wrapped OAuth but also had some examples that worked
out of the box
.

I’m not going to replicate the code here so I recommend people take a look.

With authentication, the only issue I had was with the OAuth gem under JRuby. It was throwing an exception “Wrong # of arguments (3 for 2)” in the digest.rb class. All that it needed was a different Digest object to be instantiated. I put the following code in to a module called digest.rb and all was well:

require 'oauth/signature/base'
require 'digest/hmac'
require 'openssl'
#
# Hack/fix to allow oauth to be used with JRuby / Tomcat.
# The digest class doesn't work so use the OpenSSL digest class instead
#
#
module OAuth::Signature::HMAC
  class Base < OAuth::Signature::Base
    private
      def digest
        self.class.digest_class Object.module_eval("::Digest::#{self.class.digest_klass}")
        digest  = OpenSSL::Digest::Digest.new('sha1')
        OpenSSL::HMAC.digest(digest, secret, signature_base_string)
      end
   end
end

Once authenticated then it’s just a case of using the methods in the gem’s
Client class to fetch back the objects. Not all of the API is supported but there was enough for what we needed to do. Again, the detail is in the gem source and examples, and the only more “complex” thing that we did was to take the “headline” field (e.g. my headline is Co-Founder at True North) and split that in to a role and position.

# most are --role-- at --company-- but will not fit all
posn_company = lic.headline.split(" at ")
posn=posn_company[0]
company=posn_company[1]||""

Not everybody’s is in this format but enough are to make it not worth going any more complex.

Finally it was a case of deploying on to our running version of contacts on AWS. We used a modified Ubuntu/Tomcat/MySQL bitnami stack again as it’s been good to us before and is pretty straightforward to set up.

This is where the final gotcha
got me. The version of Ubuntu didn’t have some of the XML libraries needed and I ended up with “Could not open library xml2 libxml2.so”. This was just a case of installing those libraries on the machine.

Again, the joy of AWS and the bitnami AMI we could just run
sudo apt-get install libxml2 libxml2-dev libxslt1-dev. I made a us a new AWS AMI with the patch in (so I don’t have to remember again).

For us that was it. The code itself was quick to do – some of the issues took time but by the end of the day we were all happily importing from LinkedIn.

Ping me on http://twitter.com/truenorth_sid if you want to know more or are interested in a cloud-based contacts tracker with LinkedIn integration!

Finally, in order to protect the innocent, the screenshot above is test data generated by Benjamin Curtis’ Faker gem rather than our real contacts!

Tagged in: linkedin, jruby, aws, integration, social networks, api, contacts, bitnami

Generating SQL for rails migrations

Mark

Mark

07 Jun 2011 10:45

We have been migrating a lot of our apps onto JRuby with a goal of running them on Tomcat on AWS.

One of difficulties in doing that is that we can’t run migrations but need to migrate our database in just the same way you would for a standard rails app.

I searched around assuming there must be a solution and found Jay Fields article Rails Generate SQL from Migrations and also a hint at a tool active_record_io_mode which I could no longer access the source code for.

I had problems making Jay’s code run for certain migrations and not being as smart as him couldn’t fix them so I created a more basic version which is a combination of Jay’s and Mark/Zach’s approach.

I have patched Active Record to do a few things:

  • Patch ActiveRecord::Base to store an object which any sql that’s run gets written to
  • Patch the AbstractAdapter log_info method to check for that object and write any sql to it (with appropriate separators)
  • Patch the Migrator migrate method to create an up or down file for a migration and set that in the object (step 1).

This file just needs placing in lib at the start of your project and then require it in your environment.rb. I’m sure there is a cleaner way to do this.

There are a few limitations of the approach:

  • you have to run a migration to get the sql (which really means a migrate/rollback/migrate)
  • it’s not been tested far and wide, only on 2.3.5 and using jruby

Still, it helps us and I’m sure we can finess it as we move forward.

Tagged in: aws, rails, migrations, sql

Disconnected: No supported authentication methods available

Mark

Mark

28 Jun 2011 09:25

I had a strange error trying to connect to AWS using Filezilla. We use bitnami images, connecting using PuTTY with Pageant to manage keys.

Each time I tried to connect, with Pageant running, I got the error message “Disconnected: No supported authentication methods available” but colleagues with the same settings didn’t get that. I didn’t have any problem connecting via PuTTY.

I found the solution via this ticket report. I ran the reg edit and deleted my PuTTY Session entries and the connection issues went away.

Tagged in: filezilla, aws, putty, pageant

Managing large datasets using AWS Elastic MapReduce

Sid

Sid

20 Jul 2011 14:02

Recently we’ve seen a growing number requests from clients and potential clients to help tame their data mountain to get new insights and also report on day-to-day business activity.

We’ve used Infobright and latterly Hadoop to achieve this and have been able to serve up some good results quickly using both tools.

One slight headache has been the learning curve in setting up and configuring Hadoop. We’ve made our own Hadoop AMIs which we can spin up on Amazon Web Services but there’s still a little bit of work to do in manually setting up job configurations, spinning
up servers etc.

Amazon provide their own managed Hadoop service, Elastic MapReduce which takes away a fair bit of the pain for you although at a slightly increased cost. We got out our calculators and figured that the time saved was worth the money as it meant we would get to the business issues earlier.

This post is a brief intro to setting up an example Elastic MapReduce job based on the UFO sightings dataset distributed by Infochimps (we’re using the TSV file if you want to reproduce).

There’s a perfectly good
and well-written intro
on the AWS site so if you’re just
getting started then best read that first as this post dives in to an example.

The UFO dataset is our data “Hello World!” and we already have several Hadoop Streaming tasks that manipulate this.

In true example style, the one here is the simplest – it takes the sightings since 1900 and counts them by year.

There are four steps to running the job on AWS:

  1. Upload your data to S3: the input file, any mappers or reducers
  2. Create a new job flow (using the Elastic MapReduce CLI – see instructions in the AWS guide above)
  3. Check the job has run ok and is terminated (otherwise you’ll be charged as it runs)
  4. Download your data for further processing

Upload data

This speaks for itself. There are only a couple of things to note really. First, make sure that your output folder doesn’t already exist in S3 otherwise Hadoop will fail. Secondly, you can have different buckets in S3 for any of the items. The job flow script specifies where these all live.

Below are snippets from the map and reduce jobs.

Mapper – map_yyyy.rb


POS_SIGHTING_YEAR=0
CUTOFF_YEAR = 1900


STDIN.each_line do |line|
tokens=line.split("\t")  sighting_year=tokens[POS_SIGHTING_YEAR].slice(0,4)
puts "#{sighting_year}" if sighting_year.to_i > CUTOFF_YEAR
end

Reducer – reduce_yyyy.rb

last_year,count=nil,0
puts "year,count"
STDIN.each_line do |year|
year.chomp!
if last_year && last_year != year
  puts "#{last_year},#count}"
  count = 1			
else				
  count+=1			
end
last_year = year  
end
puts "#{last_year},#{count}"

Create a new job flow

This is done via the AWS Elastic MapReduce Command Line Interface (CLI).

Instructions on how to install and use are in the link above but the syntax is pretty self-explanatory:

ruby elastic-mapreduce --create --name "UFO sightings by year" --stream --mapper s3://<path to your bucket>/map_yyyy.rb --reducer s3://<path to your bucket>/reduce_yyyy.rb --input s3://<path to your bucket>/ufo_awesome.tsv --output s3://<path to your bucket>/output

The key points to note are the fact it’s a streaming Hadoop job, and the fact that the output location can be called whatever you want – just make sure it doesn’t already exist.

After running this, it will create a new Elastic MapReduce job on an EC2 instance (by default a small one), run the job using the mapper and reducer specified and store the output in your S3 bucket.

You can track the progress of the job (and terminate if necessary) in the AWS Management Console on the Elastic MapReduce tab.

Download the data!

All that’s left to do is download the data from the location you specified and do any additional processing on it. You’ll notice there’s also a folder in your S3 bucket with job statistics which can be useful when tuning a job.

Here’s a slightly more complex visualisation of the data where we show sightings by US state, year, and month

Happy mapping (and reducing)!

Tagged in: data visualisation, hadoop, mapreduce, aws, infobright, ufo