Talks
Assuming you have ruby and Bundler going, you’ll need a Gemfile to include db_subsetter. We’re working against MySQL for this example, so we’ll load that gem too.
#Gemfile
source "https://rubygems.org"
gem "db_subsetter"
gem "mysql2"
This is a minimal export script to export a subset and see what we get.
#!/usr/bin/env ruby
require 'db_subsetter'
ActiveRecord::Base.establish_connection(
adapter: "mysql2",
host: "127.0.0.1",
username: "joe",
database: "big_db"
)
exporter = DbSubsetter::Exporter.new
exporter.export("demo-export.sqlite3")
Upon running, db_subsetter will verify exportability on all tables and let us know what problems it finds.
$ bundle exec ruby ./export.rb
Verifying table exportability ...
Verifying: blog_articles
Verifying: campaign_tracker_campaigns
Verifying: campaign_tracker_page_views
Verifying: campaign_tracker_visitors
Verifying: charges
Verifying: components
Verifying: customer_types
Verifying: customers
Verifying: departments
Verifying: features
Verifying: invoice_statuses
Verifying: invoices
Verifying: number_maps
Verifying: pages
Verifying: permissions
Verifying: picture_mappings
Verifying: pictures
Verifying: products
Verifying: schema_migrations
Verifying: sessions
Verifying: states
Verifying: videos
ERROR: Too many rows in: campaign_tracker_page_views (3825837)
ERROR: Too many rows in: campaign_tracker_visitors (1233616)
ERROR: Too many rows in: charges (141628)
ERROR: Too many rows in: customers (59351)
ERROR: Too many rows in: invoices (103001)
<snip>/exporter.rb:35:in `verify_exportability': Some tables are not exportable (ArgumentError)
from <snip>/lib/db_subsetter/exporter.rb:42:in `export'
from ./export.rb:18:in `<main>'
So let’s start a quick subset filter to get this under control. First things first, let’s just ignore the campaign tracking tables completely, we don’t need that for development.
# demo_subset_filter.rb
class DemoSubsetFilter < DbSubsetter::Filter
def ignore_tables
%w( campaign_tracker_page_views campaign_tracker_visitors )
end
end
# export.rb
#!/usr/bin/env ruby
require 'db_subsetter'
require './demo_subset_filter'
ActiveRecord::Base.establish_connection(
adapter: "mysql2",
host: "127.0.0.1",
username: "joe",
database: "big_db"
)
exporter = DbSubsetter::Exporter.new
exporter.filter = DemoSubsetFilter.new # <- Configure our new filter
exporter.export("demo-export.sqlite3")
Let’s run that and see what we get now:
$ bundle exec ruby ./export.rb
Verifying table exportability ...
Verifying: blog_articles
Verifying: campaign_tracker_campaigns
Verifying: charges
Verifying: components
Verifying: customer_types
Verifying: customers
Verifying: departments
Verifying: features
Verifying: invoice_statuses
Verifying: invoices
Verifying: number_maps
Verifying: pages
Verifying: permissions
Verifying: picture_mappings
Verifying: pictures
Verifying: products
Verifying: schema_migrations
Verifying: sessions
Verifying: states
Verifying: videos
ERROR: Too many rows in: charges (141628)
ERROR: Too many rows in: customers (59351)
ERROR: Too many rows in: invoices (103001)
<snip>/exporter.rb:35:in `verify_exportability': Some tables are not exportable (ArgumentError)
from <snip>/lib/db_subsetter/exporter.rb:42:in `export'
from ./export.rb:14:in `<main>'
Great! The two giant tables don’t show up at all anymore. Now let’s get a handle on the big tables.
# demo_subset_filter.rb
class Customer < ActiveRecord::Base; end
class DemoSubsetFilter < DbSubsetter::Filter
def customer_ids
Customer.order(:id => :desc).limit(1000).pluck(:id)
end
def filter_customers(query)
query.where(query[:id].in(customer_ids))
end
end
Which runs and removes the error on customers.
$ bundle exec ruby ./export.rb
Verifying table exportability ...
Verifying: blog_articles
Verifying: campaign_tracker_campaigns
Verifying: charges
Verifying: components
Verifying: customer_types
Verifying: customers
Verifying: departments
Verifying: features
Verifying: invoice_statuses
Verifying: invoices
Verifying: number_maps
Verifying: pages
Verifying: permissions
Verifying: picture_mappings
Verifying: pictures
Verifying: products
Verifying: schema_migrations
Verifying: sessions
Verifying: states
Verifying: videos
ERROR: Too many rows in: charges (141628)
ERROR: Too many rows in: invoices (103001)
<snip>/exporter.rb:35:in `verify_exportability': Some tables are not exportable (ArgumentError)
from <snip>/lib/db_subsetter/exporter.rb:42:in `export'
from ./export.rb:14:in `<main>'
Excactly what we expect. So now let’s do the other two tables:
# demo_subset_filter.rb
class Customer < ActiveRecord::Base; end
class Invoice < ActiveRecord::Bas; end
class Charge < ActiveRecord::Base; end
class DemoSubsetFilter < DbSubsetter::Filter
def customer_ids
Customer.order(:id => :desc).limit(1000).pluck(:id)
end
def invoice_ids
Invoice.where(:customer_id => customer_ids).pluck(:id)
end
def charge_ids
Charge.where(:invoice_id => invoice_ids).pluck(:id)
end
def filter_customers(query)
query.where(query[:id].in(customer_ids))
end
def filter_invoices(query)
query.where(query[:id].in(invoice_ids))
end
def filter_charges(query)
query.where(query[:id].in(charge_ids))
end
def ignore_tables
%w( campaign_tracker_page_views campaign_tracker_visitors )
end
end
And run that:
$ bundle exec ruby ./export.rb
Verifying table exportability ...
Verifying: blog_articles
Verifying: campaign_tracker_campaigns
Verifying: charges
Verifying: components
Verifying: customer_types
Verifying: customers
Verifying: departments
Verifying: features
Verifying: invoice_statuses
Verifying: invoices
Verifying: number_maps
Verifying: pages
Verifying: permissions
Verifying: picture_mappings
Verifying: pictures
Verifying: products
Verifying: schema_migrations
Verifying: sessions
Verifying: states
Verifying: videos
Exporting data...
Exporting: blog_articles (1 pages).
Exporting: campaign_tracker_campaigns (1 pages).
Exporting: charges (1 pages).
Exporting: components (1 pages).
Exporting: customer_types (1 pages).
Exporting: customers (1 pages).
Exporting: departments (1 pages).
Exporting: features (1 pages).
Exporting: invoice_statuses (1 pages).
Exporting: invoices (1 pages).
Exporting: number_maps (0 pages)
Exporting: pages (1 pages).
Exporting: permissions (1 pages).
Exporting: picture_mappings (1 pages).
Exporting: pictures (1 pages).
Exporting: products (1 pages).
Exporting: schema_migrations (1 pages).
Exporting: sessions (0 pages)
Exporting: states (1 pages).
Exporting: videos (1 pages).
Success! Now to show off some more functionality, let’s say we want to only export active products and their associated charges.
class Customer < ActiveRecord::Base; end
class Invoice < ActiveRecord::Base; end
class Charge < ActiveRecord::Base; end
class Product < ActiveRecord::Base; end
class DemoSubsetFilter < DbSubsetter::Filter
def customer_ids
Customer.order(:id => :desc).limit(1000).pluck(:id)
end
def invoice_ids
Invoice.where(:customer_id => customer_ids).pluck(:id)
end
def charge_ids
Charge.where(:invoice_id => invoice_ids).pluck(:id)
end
def product_ids
Product.where(:discontinued => false).limit(500).pluck(:id)
end
def filter_customers(query)
query.where(query[:id].in(customer_ids))
end
def filter_invoices(query)
query.where(query[:id].in(invoice_ids))
end
def filter_charges(query)
query.where(query[:id].in(charge_ids)).where(query[:product_id].in(product_ids))
end
def ignore_tables
%w( campaign_tracker_page_views campaign_tracker_visitors )
end
end
Sometimes you’re going to want more than the default limit of rows. We can override that simply. I would like to make this more granular so you can set it really high on a few tables but leave it low elsewhere to keep data sizes from creeping up.
#!/usr/bin/env ruby
require 'db_subsetter'
require './demo_subset_filter5'
ActiveRecord::Base.establish_connection(
adapter: "mysql2",
host: "127.0.0.1",
username: "joe",
database: "big_db"
)
exporter = DbSubsetter::Exporter.new
exporter.filter = DemoSubsetFilter.new
exporter.max_filtered_rows = 5000
exporter.export("demo-export.sqlite3")
Now it’d be awful handy to reset everybody’s password at export time and assign them a random name to protect their privacy.
This API is weak - I’m still not sure what it should look like. Expect this to change.
# demo_subset_scrambler.rb
class DemoSubsetScrambler < DbSubsetter::Scrambler
def scramble_customers(row)
scramble_column(:customers, :password, row, 'hashthisfirst')
row
end
end
# export.rb
#!/usr/bin/env ruby
require 'db_subsetter'
require './demo_subset_filter6'
require './demo_subset_scrambler6'
ActiveRecord::Base.establish_connection(
adapter: "mysql2",
host: "127.0.0.1",
username: "joe",
database: "big_db"
)
exporter = DbSubsetter::Exporter.new
exporter.filter = DemoSubsetFilter.new
exporter.add_scrambler DemoSubsetScrambler.new # <- tell our exporter
exporter.max_filtered_rows = 5000
exporter.export("demo-export.sqlite3")
We also realize it’s horrible practice to have customer PII on developer laptops, so let’s give everybody a random name:
# demo_subset_scrambler.rb
class DemoSubsetScrambler < DbSubsetter::Scrambler
def scramble_customers(row)
new_name = RandomWord.adjs.next.capitalize + " " + RandomWord.nouns.next.capitalize
scramble_column(:customers, :name, row, new_name )
scramble_column(:customers, :password, row, 'hashthisfirst')
row
end
end
To import we need to setup an importer just like our exporter.
#!/usr/bin/env ruby
require 'db_subsetter'
ActiveRecord::Base.establish_connection(
adapter: "mysql2",
host: "127.0.0.1",
username: "joe",
database: "little_db"
)
importer = DbSubsetter::Importer.new("demo-export.sqlite3", DbSubsetter::Dialect::MySQL)
importer.import
We run that and check to check on the data.
mysql> select name, password from customers limit 10;
+----------------------------+---------------+
| name | password |
+----------------------------+---------------+
| Anoperineal Charadrii | hashthisfirst |
| Branchless Lace_making | hashthisfirst |
| Viscid Pitressin | hashthisfirst |
| Daughterly Ice_rink | hashthisfirst |
| Elect Whey | hashthisfirst |
| Saccadic Showpiece | hashthisfirst |
| Faux Blinks | hashthisfirst |
| Investigative Instilment | hashthisfirst |
| Haywire Adansonia_gregorii | hashthisfirst |
| Grumous Pushup | hashthisfirst |
+----------------------------+---------------+
10 rows in set (0.00 sec)
Bonus feature. We get a report of a production user with problems we can’t easily replicate. Let’s grab that user as part of the subset.
# demo_subset_filter.rb
def customer_ids
Customer.order(:id => :desc).limit(1000).pluck(:id) +
Customer.where(:email => 'problem@user.com').pluck(:id)
end