Rails Server Side Analytics From Scratch
You are being redirected to https://thoughtbot.com/blog/rails-server-side-analytics-from-scratch
In this tutorial, we’ll learn how to create server-side analytics from scratch using Ruby on Rails. If at any point you wish to explore on your own, simply clone or fork the example repository on which this post is based.
Benefits and Risks
It’s important to understand the benefits and risks before implementing your own server-side analytics. Below are a few examples of each.
This tutorial will address all privacy related risks.
Benefits
- Better performance when compared to client-side tracking via JavaScript.
- Can’t be blocked by an ad blocker.
- You have complete control and ownership of your data, and are not giving it to a third party.
- The code can abstracted into a gem, and reused across multiple applications.
Risks
- You are responsible for keeping your user’s data secure.
- How long should you keep your user’s data?
- What happens to a user’s data if they cancel their account?
- Should you be tracking a user without their consent?
Create visitor and track their events
Create visitor and event models
# db/migrate/[timestamp]_create_visitors.rb
class CreateVisitors < ActiveRecord::Migration[7.0]
def change
create_table :visitors do |t|
t.string :user_agent
t.timestamps
end
end
end
The user_agent
column will store data about the user’s device and browser. This value will be returned from the request object.
# db/migrate/[timestamp]_create_events.rb
class CreateEvents < ActiveRecord::Migration[7.0]
def change
create_table :events do |t|
t.string :path, null: false
t.string :method, null: false
t.string :params
t.references :visitor, null: false, foreign_key: true
t.timestamps
end
end
end
The path
column will store the relative path that was requested, while the method
column will store the type of request made. This will help us determine if someone visited a page as opposed to filled out a form. Finally, the params
column will store the value of any query_parameters or request_parameters Again, all of these values will be returned from the request object.
# app/models/visitor.rb
class Visitor < ApplicationRecord
has_many :events
end
# app/models/event.rb
class Event < ApplicationRecord
serialize :params
belongs_to :visitor
end
The value of params
will be a Hash
since we’re getting that value from request.query_parameters
or request.request_parameters
. Calling serialize will ensure the value of params
is saved to the database as a serialized object, and also retrieved by deserializing into the same object.
Create a new visitor record each time an anonymous user visits your site
# app/models/current.rb
class Current < ActiveSupport::CurrentAttributes
attribute :visitor
end
The Current
model is non-database backed, and inherits from ActiveSupport::CurrentAttributes which keeps all the per-request attributes easily available to the whole system. This will store the current visitor between requests.
# app/controllers/concerns/set_current_visitor.rb
module SetCurrentVisitor
extend ActiveSupport::Concern
included do
before_action :set_current_visitor
end
private
def set_current_visitor
Current.visitor ||= Visitor.find_by(id: session[:visitor_id]) || create_current_visitor
end
def create_current_visitor
visitor = Visitor.create!(
user_agent: request.user_agent
)
session[:visitor_id] = visitor.id
visitor
end
end
The SetCurrentVisitor
module is an ActiveSupport::Concern that stores the logic for setting the current visitor. If the session contains a value for visitor_id
that matches an existing Visitor#id
then that record will be returned. Otherwise, a new visitor record will be created and saved in the session
.
The user_agent
is set to the value returned from the request.user_agent
that is stored in the request.headers. This value is used to tell what type of device and browser the visitor is using. This information can also be used to detect bots, which could affect data.
# app/controllers/application_controller.rb
class ApplicationController < ActionController::Base
include SetCurrentVisitor
end
Including the SetCurrentVisitor
module in the ApplicationController
ensures the visitor is tracked on all requests.
Track the visitor’s events
# app/controllers/concerns/track_event.rb
module TrackEvent
extend ActiveSupport::Concern
def track_event
Current.visitor.events.create(
path: request.path,
method: request.method,
params: event_params
)
end
private
def event_params
request.query_parameters.presence || request.request_parameters.presence
end
end
The TrackEvent
module is an ActiveSupport::Concern that stores the logic for tracking a visitor’s events. The track_event
method simply takes the path
and method
returned from the request and stores it on an event
record that is associated with the current visitor
. The private event_params
method returns the value of query_parameters or request_parameters. Calling presence ensures the logical OR operator will work correctly.
The way in which the TrackEvent
module is included will affect when and how event
records are tracked.
Including the TrackEvent
module in a Controller allows you to call the track_event
method on specific actions. This is the least comprehensive approach, but also the most performant.
# app/controllers/some_controller.rb
class SomeController < ApplicationController
include TrackEvent
def some_action
track_event
end
end
However, you could also use a filter to call the track_event
method on multiple actions.
# app/controllers/some_controller.rb
class SomeController < ApplicationController
include TrackEvent
before_action :track_event
end
Finally, you could include the TrackEvent
module in the ApplicationController
and use a filter to call the track_event
method on all controller actions. This would allow you to track every event that every visitor makes.
# app/controllers/application_controller.rb
class ApplicationController < ActionController::Base
include SetCurrentVisitor
include TrackEvent
before_action :track_event
end
This is the most comprehensive approach, but also the least performant. The risk with this approach is that an additional database call is made at the beginning of every request for every visitor. This could result in degraded performance across the entire application.
In all cases, the track_event
method will return the newly created event
object, but that return value is ignored. The goal is for the application to always execute each controller action even if the track_event
method fails to create a new record.
Filter sensitive data from event records
Extra care needs to be taken since we’re saving the value of request_parameters into our events
. Although Rails automatically filters sensitive data from the logs, it does not automatically filter sensitive data from being saved into the database.
This could be a problem if event records are created on sign up.
class UserController < ApplicationController
include TrackEvent
def create
track_event
@user = User.create(user_params)
end
private
def user_params
params.require(:user).permit(:email, :password)
end
def
Event.last.params
# => { user: { email: "user@example.com", password: "MyS3ktretPassword!" } }
Since track_event
is being called on the create
action, the values from the sign-up form will be saved to the event because they will be present in the request_parameters. Even though Rails filters the password
parameter by default, this doesn’t include when that value is saved to the database.
Update TrackEvent module
--- a/app/controllers/concerns/track_event.rb
+++ b/app/controllers/concerns/track_event.rb
@@ -5,12 +5,20 @@ module TrackEvent
Current.visitor.events.create(
path: request.path,
method: request.method,
- params: event_params
+ params: filter_sensitive_data(event_params)
)
end
private
+ def filter_sensitive_data(params)
+ return if params.nil?
+
+ ActiveSupport::ParameterFilter.new(
+ Rails.application.config.filter_parameters
+ ).filter(params)
+ end
+
def event_params
request.query_parameters.presence || request.request_parameters.presence
end
The method responsible for setting the value of the params
before saving it to an event
can be refactored to filter out sensitive values. All that needs to be done is instantiate a new instance of ActiveSupport::ParameterFilter and pass it a hash of parameters to filter. Fortunately, that hash
already exists in the form of filtered_parameters. From there, calling filter against the incoming parameters will ensure the output is filtered.
Now the params
will be filtered before being saved on an event
record.
Event.last.params
# => { user: { email: "user@example.com", password: "[FILTERED]" } }
Rails ships with a set of defaults, but you can always modify this list by updating the filter_parameter_logging
initializer.
--- a/config/initializers/filter_parameter_logging.rb
+++ b/config/initializers/filter_parameter_logging.rb
@@ -4,5 +4,5 @@
# sensitive information. See the ActiveSupport::ParameterFilter documentation for supported
# notations and behaviors.
Rails.application.config.filter_parameters += [
- :passw, :secret, :token, :_key, :crypt, :salt, :certificate, :otp, :ssn
+ :passw, :secret, :token, :_key, :crypt, :salt, :certificate, :otp, :ssn, :credit_
]
Create event records in the background
It’s possible to call track_event
before every single request, which allows for every event a visitor makes to be tracked. This is the most comprehensive approach, but is also the least performant.
# app/controllers/application_controller.rb
class ApplicationController < ActionController::Base
include SetCurrentVisitor
include TrackEvent
before_action :track_event
end
Creating a new event
record per each request is a big hit to performance, since an extra request will be made to the database every time a visitor visits a page or fills out a form. Fortunately, this can be solved by leveraging Active Job
Create a job to create events in the background
This job simply wraps the logic needed to create an event
record.
# app/jobs/create_event_job.rb
class CreateEventJob < ApplicationJob
queue_as :default
def perform(visitor:, path:, method:, params:)
visitor.events.create!(
path: path,
method: method,
params: params
)
end
end
Update the track_event method
Modifying the existing track_event
method ensures all event
records are created in the background.
Note that the Current.visitor
is passed into the job, since the job is processed outside the request-cycle.
--- a/app/controllers/concerns/track_event.rb
+++ b/app/controllers/concerns/track_event.rb
@@ -2,7 +2,8 @@ module TrackEvent
extend ActiveSupport::Concern
def track_event
- Current.visitor.events.create(
+ CreateEventJob.perform_later(
+ visitor: Current.visitor,
path: request.path,
method: request.method,
params: filter_sensitive_data(event_params)
Allow a visitor to enable their session to be tracked
A visitor should be given the opportunity to “opt-in” to being tracked in an effort to respect their right to privacy. This is even more important when tracking server-side events, since a visitor’s ad blocker will not work.
Create an endpoint to store the visitor’s privacy preference
One approach is to store the visitor’s preference in the session. This demo assumes an “opt-in” approach, meaning that the visitor will not be tracked until they explicitly enable tracking.
# config/routes.rb
Rails.application.routes.draw do
post "enable_analytics", to: "analytics#enable", as: :enable_analytics
end
# app/controllers/analytics_controller.rb
class AnalyticsController < ApplicationController
def enable
session[:enable_analytics] = true
redirect_to root_path, notice: "You have enabled your session to be tracked."
end
end
--- a/app/views/layouts/application.html.erb
+++ b/app/views/layouts/application.html.erb
@@ -12,5 +12,8 @@
<body>
<%= yield %>
+ <% unless session[:enable_analytics] == true %>
+ <%= button_to "Enable Analytics", enable_analytics_path %>
+ <% end %>
</body>
</html>
Refactor the SetCurrentVisitor and TrackEvent modules to respect the visitor’s preference
As soon as a visitor enables tracking, the application will create a visitor
record and start attaching event
records to the visitor
.
--- a/app/controllers/concerns/set_current_visitor.rb
+++ b/app/controllers/concerns/set_current_visitor.rb
@@ -2,7 +2,7 @@ module SetCurrentVisitor
extend ActiveSupport::Concern
included do
- before_action :set_current_visitor
+ before_action :set_current_visitor, if: :should_set_current_visitor?
end
private
@@ -20,4 +20,8 @@ module SetCurrentVisitor
def set_current_visitor
Current.visitor ||= Visitor.find_by(id: session[:visitor_id]) || create_current_visitor
end
+
+ def should_set_current_visitor?
+ session[:enable_analytics] == true
+ end
end
--- a/app/controllers/concerns/track_event.rb
+++ b/app/controllers/concerns/track_event.rb
@@ -2,12 +2,14 @@ module TrackEvent
extend ActiveSupport::Concern
def track_event
- CreateEventJob.perform_later(
- visitor: Current.visitor,
- path: request.path,
- method: request.method,
- params: filter_sensitive_data(event_params)
- )
+ if session[:enable_analytics] == true
+ CreateEventJob.perform_later(
+ visitor: Current.visitor,
+ path: request.path,
+ method: request.method,
+ params: filter_sensitive_data(event_params)
+ )
+ end
end
private
Create methods to return analytics
Tracking events is only valuable if the data can be queried. It’s common to want to know how long a visitor is spending on the site, as well as how many times a page has been viewed.
Query for time on site
Combining ActiveRecord::QueryMethods with aggregate functions and date/time functions and operators results in a query that returns a two-dimensional array
where each item returned is a visitors.id
and the amount of time in seconds that visitor
spent on the site.
This is calculated by finding the difference between the created_at
values of the visitor's
first and last events
. Because there are no lower_bounds
or upper_bounds
columns on the visitors
table, pluck is called, since it returns attribute values. This allows Arel.sql to be called within pluck
in order to run the calculation.
--- a/app/models/visitor.rb
+++ b/app/models/visitor.rb
@@ -1,3 +1,21 @@
class Visitor < ApplicationRecord
has_many :events
+
+ def self.time_on_site
+ select("visitor_id, lower_bounds, upper_bounds")
+ .from(
+ Event
+ .select(
+ "visitor_id,
+ MIN(created_at) AS lower_bounds,
+ MAX(created_at) AS upper_bounds"
+ )
+ .group(:visitor_id)
+ )
+ .pluck(
+ "visitor_id",
+ Arel.sql("EXTRACT(EPOCH FROM (upper_bounds - lower_bounds))")
+ )
+ .sort
+ end
end
Visitor.time_on_site
# => [[3, 0.0], [1, 60.0], [2, 3660.0]]
Query for page views
Combining ActiveRecord::QueryMethods with ActiveRecord::Calculations results in a query that returns a hash
where each item returned is the path visited along with how many times that page has been visited. Filtering the results where the method
is GET
ensures results are limited to page views and does not include form submissions.
Using distinct along with from and group creates a query that returns unique page views by ignoring multiple page visits from a visitor
.
--- a/app/models/event.rb
+++ b/app/models/event.rb
@@ -1,4 +1,24 @@
class Event < ApplicationRecord
serialize :params
belongs_to :visitor
+
+ def self.page_views
+ select(:path)
+ .where(method: "GET")
+ .group(:path)
+ .count
+ end
+
+ def self.unique_page_views
+ select(:path)
+ .from(
+ Event
+ .select(:path, :visitor_id)
+ .distinct
+ .where(method: "GET")
+ .group(:path, :visitor_id)
+ )
+ .group(:path)
+ .count(:path)
+ end
end
Event.page_views
# => {"/" => 5, "/search" => 4}
Event.unique_page_views
# => {"/" => 3, "/search" => 1}
Associate data with the current user
Right now, this data is completely anonymous, but it can be helpful to have it associated with an actual user to build a more accurate profile of each user. This comes with additional risks which will be addressed in subsequent sections.
Associate a visitor with a user
Because a user
may never sign up or sign in while visiting the site, it’s necessary to keep this association optional. This also means removing null: false
from the database migration.
# db/migrate/[timestamp]_add_user_id_to_visitors.rb
class AddUserIdToVisitors < ActiveRecord::Migration[7.0]
def change
add_reference :visitors, :user, foreign_key: true
end
end
--- a/app/models/visitor.rb
+++ b/app/models/visitor.rb
@@ -1,4 +1,5 @@
class Visitor < ApplicationRecord
+ belongs_to :user, optional: true
has_many :events
def self.time_on_site
Associate the Current.visitor with the current_user
Because every application’s authentication system is different, the implementation will vary. All that matters is that the value of the user
on the Current.visitor
is set to that of the current_user
. Because a new Current.visitor
is created each time the session resets, this ensures that a user
will be associated with a new visit
record each time they sign in to the application.
This is important because it will keep each visit
and its associated events
segmented, which is necessary in order to correctly calculate how much time a user
has spent on the site.
# app/controllers/sessions_controller.rb
class SessionsController < ApplicationController
def create
...
Current.visitor.presence && Current.visitor.update!(user: current_user)
end
end
# app/controllers/users_controller.rb
class UsersController < ApplicationController
def create
...
Current.visitor.presence && Current.visitor.update!(user: current_user)
end
end
Create methods to return user analytics
Now that a visit
is being associated with a user
it will be helpful to create additional queries to return analytics on user
accounts.
Query for time on site per visitor
Creating a separate time_on_site_for_visitor
scope that returns the most recent and oldest created_at
values per visitor
allows that query to be chained with ActiveRecord::Calculations methods. This also allows the query to be cleanly reused in the total_time_on_site_for_visitor
and average_time_on_site_for_visitor
class methods.
Each class method has access to the virtual upper_bounds
and lower_bounds
columns passed in from the time_on_site_for_visitor
scope. This makes it possible to use those values in Arel.sql.
--- a/app/models/visitor.rb
+++ b/app/models/visitor.rb
@@ -2,6 +2,20 @@ class Visitor < ApplicationRecord
belongs_to :user, optional: true
has_many :events
+ scope :time_on_site_for_visitor, ->(visitor) {
+ select("lower_bounds, upper_bounds")
+ .from(
+ Event
+ .select(
+ "visitor_id,
+ MIN(created_at) AS lower_bounds,
+ MAX(created_at) AS upper_bounds"
+ )
+ .where(visitor: visitor)
+ .group(:visitor_id)
+ )
+ }
+
def self.time_on_site
select("visitor_id, lower_bounds, upper_bounds")
.from(
@@ -19,4 +33,16 @@ class Visitor < ApplicationRecord
)
.sort
end
+
+ def self.total_time_on_site_for_visitor(visitor)
+ time_on_site_for_visitor(visitor).sum(
+ Arel.sql("EXTRACT(EPOCH FROM (upper_bounds - lower_bounds))")
+ )
+ end
+
+ def self.average_time_on_site_for_visitor(visitor)
+ time_on_site_for_visitor(visitor).average(
+ Arel.sql("EXTRACT(EPOCH FROM (upper_bounds - lower_bounds))")
+ )
+ end
end
Query for time on site per user and associate a user with an event and a visit.
The queries created in the previous step can be easily reused in instance methods on the user
.
# app/models/user.rb
class User < ApplicationRecord
has_many :visits, class_name: "Visitor"
has_many :events, through: :visits
def time_on_site
Visitor.total_time_on_site_for_visitor(visits)
end
def average_time_on_site
Visitor.average_time_on_site_for_visitor(visits)
end
end
User.first.time_on_site
# Time in seconds
# => 3600
User.first.average_time_on_site
# Time in seconds
# => 2730
Additionally, the use of has_many adds the ability to query for event
and visit
records on the user
.
User.first.events
# The user's entire event history.
# => [#<Event>, #<Event>]
User.first.visits
# Each of the user's visits.
# => [#<Visitor>, #<Visitor>]
Provide multiple mechanisms for clearing user history
It’s important to provide multiple mechanisms for clearing user history in an effort to reduce risk and keep your user’s privacy in mind.
Update migrations to ensure associated data is deleted automatically
Updating the existing migrations to use a cascading foreign key ensure that when a user
is deleted from the database their associated visitor
and event
records will be automatically deleted too.
An alternative approach is to use dependent: :destroy
in the associated models, but this is less performant because it will iterate through each record in order to trigger any callbacks or validations.
--- a/db/migrate/[timestamp]_create_events.rb
+++ b/db/migrate/[timestamp]_create_events.rb
@@ -4,7 +4,7 @@ class CreateEvents < ActiveRecord::Migration[7.0]
t.string :path, null: false
t.string :method, null: false
t.string :params
- t.references :visitor, null: false, foreign_key: true
+ t.references :visitor, null: false, foreign_key: {on_delete: :cascade}
t.timestamps
end
--- a/db/migrate/[timestamp]_add_user_id_to_visitors.rb
+++ b/db/migrate/[timestamp]_add_user_id_to_visitors.rb
@@ -1,5 +1,5 @@
class AddUserIdToVisitors < ActiveRecord::Migration[7.0]
def change
- add_reference :visitors, :user, foreign_key: true
+ add_reference :visitors, :user, foreign_key: {on_delete: :cascade}
end
end
Create an endpoint allowing a user to clear their history on-demand
It’s common to allow a user to be able to clear their history. Adding this endpoint ensures they have a way to do this on-demand.
--- a/config/routes.rb
+++ b/config/routes.rb
@@ -8,4 +8,5 @@ Rails.application.routes.draw do
post "sign_up", to: "pages#sign_up", as: :sign_up
post "sign_in", to: "pages#sign_in", as: :sign_in
post "enable_analytics", to: "analytics#enable", as: :enable_analytics
+ delete "clear_history", to: "analytics#clear_history", as: :clear_history
end
--- a/app/controllers/analytics_controller.rb
+++ b/app/controllers/analytics_controller.rb
@@ -4,4 +4,11 @@ class AnalyticsController < ApplicationController
redirect_to root_path, notice: "You have enabled your session to be tracked."
end
+
+ def clear_history
+ current_user.visits.destroy_all
+
+ redirect_to root_path, notice: "History deleted."
+ end
end
The updates made to the foreign_key
option in the existing migrations make it so that calling destroy_all will also delete the user’s associated event
records.
Create a mechanism to delete history older than a certain date
Adding a mechanism to delete user history that is older than a certain date should be considered in order to reduce risk (and database space).
--- a/app/models/visitor.rb
+++ b/app/models/visitor.rb
@@ -45,4 +45,8 @@ class Visitor < ApplicationRecord
Arel.sql("EXTRACT(EPOCH FROM (upper_bounds - lower_bounds))")
)
end
+
+ def self.delete_all_older_than(timestamp)
+ destroy_by("created_at < ?", timestamp)
+ end
end
Visitor.delete_all_older_than(6.months.ago)
Using destroy_by makes this easy. Consider scheduling a recurring job to call this method.
Wrapping up
Creating a transparent “opt-in” tracking approach in combination with allowing users to delete their history helps foster trust between you and your user base. This trust is just as valuable as any metrics you’ll capture.