Rails + Sitemap + Heroku + AWS

tl;dr Generate the sitemap files, push them to AWS and set up a route that redirects to those files from Rails.

While exploring google web master tools and inspecting some aspects of Insider AI SEO, I recognized a missing piece of the puzzle: sitemap! There are a few options out there for generating sitemaps for Rails, most of which generate a set of XML files and drop them in your public directory. This wont work for Insider AI as it has dynamic blog content that I want mapped so that it’s indexed by search engines. If you’ve worked much with Heroku, you know that it’s not a static file server. In fact, if you generate or attempt to store uploaded files on Heroku, they’ll get stomped out :(.

Goal: Generate dynamic sitemaps.
Problem: Heroku doesn’t play nice with generated static files.
Solution: Upload generated sitemaps to AWS.

The gem I landed on is called sitemap_generator. In the wiki on their github page there are some examples for getting up and running with Fog and CarrierWave.

These solutions were a bit heavy weight for me, so I ended up modifying this code. To eventually have a nice solution for generating sitemaps and uploading them to AWS.

Here’s everything you need to know:

1. Sign up for AWS
2. Create an IAM User (note the KEY_ID and ACCESS_KEY)
3. Create a bucket on S3 (note the bucket name as BUCKET)
4. Add a policy to the bucket to allow uploading (they have a policy generator, or you can use this overly promiscuous one)

{
	"Version": "2012-10-17",
	"Id": "Policy1",
	"Statement": [
		{
			"Sid": "Stmt1",
			"Effect": "Allow",
			"Principal": {
				"AWS": "*"
			},
			"Action": "s3:*",
			"Resource": "arn:aws:s3:::YOUR_AWS_BUCKET_NAME/*"
		}
	]
}

5. Add these gems to the Gemfile (I use figaro for key management)

# Gemfile
gem 'aws-sdk', '< 2.0'
gem 'figaro'
gem 'sitemap_generator'

7. Install figaro (creates config/application.yml and git ignores it, safety first!)

figaro install

8. Make the keys and bucket name available to the env. config/application.yml

AWS_ACCESS_KEY_ID: KEY_ID
AWS_SECRET_ACCESS_KEY: ACCESS_KEY
AWS_BUCKET: BUCKET

9. Create config/sitemap.rb to define what gets mapped

# config/sitemap.rb
SitemapGenerator::Sitemap.default_host = "https://insiderai.com"
SitemapGenerator::Sitemap.create_index = true
SitemapGenerator::Sitemap.public_path = 'public/sitemaps/'
SitemapGenerator::Sitemap.create do
  add '/welcome'
  add '/blog'
  add '/about'
  Post.find_each do |post|
    add post_path(post), lastmod: post.updated_at
  end
end

10. Create lib/tasks/sitemap.rake to define the rake task for refreshing the sitemap

require 'aws'
namespace :sitemap do
  desc 'Upload the sitemap files to S3'
  task upload_to_s3: :environment do
    s3 = AWS::S3.new(
      access_key_id: ENV['AWS_ACCESS_KEY_ID'],
      secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
    )
    bucket = s3.buckets[ENV['AWS_BUCKET']]
    Dir.entries(File.join(Rails.root, "public", "sitemaps")).each do |file_name|
      next if ['.', '..'].include? file_name
      path = "sitemaps/#{file_name}"
      file = File.join(Rails.root, "public", "sitemaps", file_name)

      begin
        object = bucket.objects[path]
        object.write(file: file)
      rescue Exception => e
        raise e
      end
      puts "Saved #{file_name} to S3"
    end
  end
end

11. Redirect requests for your sitemap to the files stored on AWS. (Needs improvement, but works)

# config/routes.rb
get "sitemap.xml.gz" => "sitemaps#sitemap", format: :xml, as: :sitemap

# app/controllers/sitemaps_controller.rb
class SitemapsController < ApplicationController
  def sitemap
    redirect_to "https://s3.amazonaws.com/#{ ENV['AWS_BUCKET'] }/sitemaps/sitemap.xml.gz"
  end
end

Hope this helps! Let me know if you get stuck somewhere and I’ll do my best to help you out 🙂

Advertisements
Rails + Sitemap + Heroku + AWS