Rails + Sitemap + Heroku + AWS

Posted on

tl;dr Generate the sitemap files, push them to AWS and set up a route that redirects to those files from Rails.

While exploring google web master tools and inspecting some aspects of Insider AI SEO, I recognized a missing piece of the puzzle: sitemap! There are a few options out there for generating sitemaps for Rails, most of which generate a set of XML files and drop them in your public directory. This wont work for Insider AI as it has dynamic blog content that I want mapped so that it’s indexed by search engines. If you’ve worked much with Heroku, you know that it’s not a static file server. In fact, if you generate or attempt to store uploaded files on Heroku, they’ll get stomped out :(.

Goal: Generate dynamic sitemaps.

Problem: Heroku doesn’t play nice with generated static files.

Solution: Upload generated sitemaps to AWS.

The gem I landed on is called sitemap_generator. In the wiki on their github page there are some examples for getting up and running with Fog and CarrierWave.

These solutions were a bit heavy weight for me, so I ended up modifying this code. To eventually have a nice solution for generating sitemaps and uploading them to AWS.

Here’s everything you need to know:

  1. Sign up for AWS
  2. Create an IAM User (note the KEY_ID and ACCESS_KEY)
  3. Create a bucket on S3 (note the bucket name as BUCKET)
  4. Add a policy to the bucket to allow uploading (they have a policy generator, or you can use this overly promiscuous one)
{
"Version": "2012-10-17",
"Id": "Policy1",
"Statement": [
{
"Sid": "Stmt1",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "s3:*",
"Resource": "arn:aws:s3:::YOUR_AWS_BUCKET_NAME/*"
}
]
}
  1. Add these gems to the Gemfile (I use figaro for key management)
# Gemfile
gem 'aws-sdk', '< 2.0'
gem 'figaro'
gem 'sitemap_generator'
  1. Install figaro (creates config/application.yml and git ignores it, safety first!)
figaro install
  1. Make the keys and bucket name available to the env. config/application.yml
AWS_ACCESS_KEY_ID: KEY_ID
AWS_SECRET_ACCESS_KEY: ACCESS_KEY
AWS_BUCKET: BUCKET
  1. Create config/sitemap.rb to define what gets mapped
# config/sitemap.rb
SitemapGenerator::Sitemap.default_host = "https://cjavdev.netlify.app/"
SitemapGenerator::Sitemap.create_index = true
SitemapGenerator::Sitemap.public_path = 'public/sitemaps/'
SitemapGenerator::Sitemap.create do
add '/welcome'
add '/blog'
add '/about'
Post.find_each do |post|
add post_path(post), lastmod: post.updated_at
end
end
  1. Create lib/tasks/sitemap.rake to define the rake task for refreshing the sitemap
require 'aws'
namespace :sitemap do
desc 'Upload the sitemap files to S3'
task upload_to_s3: :environment do
s3 = AWS::S3.new(
access_key_id: ENV['AWS_ACCESS_KEY_ID'],
secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
)
bucket = s3.buckets[ENV['AWS_BUCKET']]
Dir.entries(File.join(Rails.root, "public", "sitemaps")).each do |file_name|
next if ['.', '..'].include? file_name
path = "sitemaps/#{file_name}"
file = File.join(Rails.root, "public", "sitemaps", file_name)

begin
object = bucket.objects[path]
object.write(file: file)
rescue Exception => e
raise e
end
puts "Saved #{file_name} to S3"
end
end
end
  1. Redirect requests for your sitemap to the files stored on AWS. (Needs improvement, but works)
# config/routes.rb
get "sitemap.xml.gz" => "sitemaps#sitemap", format: :xml, as: :sitemap

# app/controllers/sitemaps_controller.rb
class SitemapsController < ApplicationController
def sitemap
redirect_to "https://s3.amazonaws.com/#{ ENV['AWS_BUCKET'] }/sitemaps/sitemap.xml.gz"
end
end

Hope this helps! Let me know if you get stuck somewhere and I’ll do my best to help you out 🙂