Rails + Sitemap + Heroku + AWS

tl;dr Generate the sitemap files, push them to AWS and set up a route that redirects to those files from Rails.

While exploring google web master tools and inspecting some aspects of Insider AI SEO, I recognized a missing piece of the puzzle: sitemap! There are a few options out there for generating sitemaps for Rails, most of which generate a set of XML files and drop them in your public directory. This wont work for Insider AI as it has dynamic blog content that I want mapped so that it’s indexed by search engines. If you’ve worked much with Heroku, you know that it’s not a static file server. In fact, if you generate or attempt to store uploaded files on Heroku, they’ll get stomped out :(.

Goal: Generate dynamic sitemaps.
Problem: Heroku doesn’t play nice with generated static files.
Solution: Upload generated sitemaps to AWS.

The gem I landed on is called sitemap_generator. In the wiki on their github page there are some examples for getting up and running with Fog and CarrierWave.

These solutions were a bit heavy weight for me, so I ended up modifying this code. To eventually have a nice solution for generating sitemaps and uploading them to AWS.

Here’s everything you need to know:

1. Sign up for AWS
2. Create an IAM User (note the KEY_ID and ACCESS_KEY)
3. Create a bucket on S3 (note the bucket name as BUCKET)
4. Add a policy to the bucket to allow uploading (they have a policy generator, or you can use this overly promiscuous one)

{
	"Version": "2012-10-17",
	"Id": "Policy1",
	"Statement": [
		{
			"Sid": "Stmt1",
			"Effect": "Allow",
			"Principal": {
				"AWS": "*"
			},
			"Action": "s3:*",
			"Resource": "arn:aws:s3:::YOUR_AWS_BUCKET_NAME/*"
		}
	]
}

5. Add these gems to the Gemfile (I use figaro for key management)

# Gemfile
gem 'aws-sdk', '< 2.0'
gem 'figaro'
gem 'sitemap_generator'

7. Install figaro (creates config/application.yml and git ignores it, safety first!)

figaro install

8. Make the keys and bucket name available to the env. config/application.yml

AWS_ACCESS_KEY_ID: KEY_ID
AWS_SECRET_ACCESS_KEY: ACCESS_KEY
AWS_BUCKET: BUCKET

9. Create config/sitemap.rb to define what gets mapped

# config/sitemap.rb
SitemapGenerator::Sitemap.default_host = "https://insiderai.com"
SitemapGenerator::Sitemap.create_index = true
SitemapGenerator::Sitemap.public_path = 'public/sitemaps/'
SitemapGenerator::Sitemap.create do
  add '/welcome'
  add '/blog'
  add '/about'
  Post.find_each do |post|
    add post_path(post), lastmod: post.updated_at
  end
end

10. Create lib/tasks/sitemap.rake to define the rake task for refreshing the sitemap

require 'aws'
namespace :sitemap do
  desc 'Upload the sitemap files to S3'
  task upload_to_s3: :environment do
    s3 = AWS::S3.new(
      access_key_id: ENV['AWS_ACCESS_KEY_ID'],
      secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
    )
    bucket = s3.buckets[ENV['AWS_BUCKET']]
    Dir.entries(File.join(Rails.root, "public", "sitemaps")).each do |file_name|
      next if ['.', '..'].include? file_name
      path = "sitemaps/#{file_name}"
      file = File.join(Rails.root, "public", "sitemaps", file_name)

      begin
        object = bucket.objects[path]
        object.write(file: file)
      rescue Exception => e
        raise e
      end
      puts "Saved #{file_name} to S3"
    end
  end
end

11. Redirect requests for your sitemap to the files stored on AWS. (Needs improvement, but works)

# config/routes.rb
get "sitemap.xml.gz" => "sitemaps#sitemap", format: :xml, as: :sitemap

# app/controllers/sitemaps_controller.rb
class SitemapsController < ApplicationController
  def sitemap
    redirect_to "https://s3.amazonaws.com/#{ ENV['AWS_BUCKET'] }/sitemaps/sitemap.xml.gz"
  end
end

Hope this helps! Let me know if you get stuck somewhere and I’ll do my best to help you out 🙂

Advertisements
Rails + Sitemap + Heroku + AWS

9 thoughts on “Rails + Sitemap + Heroku + AWS

  1. i follow all the steps , but after i create the sitemaps with rake sitemap:refresh and we run rake sitemap:upload_to_s3 , instead of uploading the sitemap ,it upload a blank file .it seems like the script cannot read the public directory . im using rails 4.Do you have an idea how to make the script upload the sitemap to s3.

    Like

    1. Sorry that its taken so long for a reply :). my guess is that either you need to create the `sitemaps` directory inside `public/` or no routes were added in the `sitemap.create` block. Feel free to tweet @w1zeman1p and I’d be happy to help more real time.

      Like

  2. cg says:

    Thank you very much for this post. It was really helpful. I do have one question. When I go to this link: https://s3.amazonaws.com/#{ ENV[‘MY_BUCKET_ON_S3’] }/sitemaps/sitemap.xml.gz”, it immediately downloads the file. Is that desired behavior? I am concerned because what if Google or Bing goes to that link and don’t see a file displayed on the page?

    Like

    1. Yep. thats exactly what you want. Google and Bing servers will download that file and interpret it :). it depends on the browser, but generally they’ll just download .xml.gz files.

      Like

      1. cg says:

        Thank you so much for the reply. But now I am encountering an issue. I am at Google’s Search Console and tested my sitemap link: http://.com/sitemap.xml.gz and test result is an error. The message is: “Your Sitemap appears to be an HTML page. Please use a supported sitemap format instead.”. I can’t submit my sitemap as result of it. 😦

        Your help is greatly appreciated!

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s