Pivotal Labs

New York Standup 9/30/2008

edit Posted by Jim Kingdon on Tuesday September 30, 2008 at 07:02PM

  • What's the best way to import a million records into a postgres database via ActiveRecord (which is needed to implement some application-specific logic)? We anticipate waiting a second (or so) between inserts to avoid slowing down the production database (which is under load, almost entirely reads). If there is any ActiveRecord feature which helps batch together inserts, noone knew about it. As for generally how long this will take (estimates range from 9 to 27 hours), and what the load on the production database will be, we planned on answering that with a trial run of a small number of these records.

  • We're thinking of having capistrano deploy to two demo servers, one particularly aimed at showing to prospective users of our application, and the other mostly for story acceptance. The former would be hosted at a hosting company; the latter an internally run machine. Several people reported they have done this on their projects, and the problems were minor, mostly having to do with whether the deployed location (/u/apps/whatever or some such) is different on the two machines (the solution would be to use the capistrano variables, but tracking down all the places that need to do that could be an issue).

  • Erector tip of the day: in a Rails project, you can put a file (named edit.rb or edit.html.rb) in your view directory, and Rails/Erector will find the template implicitly (as it would for ERB, HAML, etc). It is not necessary to explicitly call render from your controller method.

Standup 08/29/2008

edit Posted by Jim Kingdon on Friday August 29, 2008 at 04:52PM

  • Using multiple buckets for Amazon S3. One of our sites has a lot of images (perhaps 30+ photos per page, different for each page and user) and got significant benefits from using four buckets instead of one. Multiple buckets allows browsers to fetch several images in parallel. Increasing it beyond four probably wouldn't help, as browsers have a limit on how many parallel requests they will send.

  • Amazon S3 now has a copy command. This could be useful, for example, if you have a lot of data in a single bucket and want to move it to multiple buckets. Copy is faster than downloading and re-uploading all that data. The ruby S3 gem, however, only lets you copy in one bucket, so you'll need to bypass the S3 gem.

  • We wrote a script to dump a local SQL database and copy it up to a remote server (for example, a demo or production server). This is in contrast with a script we wrote some months ago which copies from demo to a local workstation (for test data, reproducing data-driven bugs, etc). The push to remote feature was for a situation in which there was a bunch of data to be generated (based on some XML input files) and we could afford to bog down a workstation for half an hour, but not an overloaded (and perhaps underpowered) server.

  • Deprec is a set of capistrano recipes for setting up a remote server (in conjunction with deploying an application), for example creating accounts, ssh keys, init scripts, logrotate, etc.

  • Capistrano 2.3 has weird sudo issues (deleting old releases or something). Recommend Capistrano 2.5.

Collapsing Migrations

edit Posted by Alex Chaffee on Wednesday December 12, 2007 at 09:19PM

(6:30 pm: updated to use mysqldump) (12/14/07: updated to remove db:reset since the Rails 2.0 version now does something different.) (12/15/07: updated to not set ENV['RAILS_ENV'] since that gets passed down to child processes)

There was an old hacker who lived in a shoe; she had so many migrations she didn't know what to do. Every time her build ran clean, she spent a whole minute staring at the screen.

Fortunately, she read this blog post and now her db:setup task is so fast she's started building multiple test environments so she can run tests in parallel!