How to organize large R programs?

Dan Goldstein picture Dan Goldstein · Aug 12, 2009 · Viewed 27.6k times · Source

When I undertake an R project of any complexity, my scripts quickly get long and confusing.

What are some practices I can adopt so that my code will always be a pleasure to work with? I'm thinking about things like

  • Placement of functions in source files
  • When to break something out to another source file
  • What should be in the master file
  • Using functions as organizational units (whether this is worthwhile given that R makes it hard to access global state)
  • Indentation / line break practices.
    • Treat ( like {?
    • Put things like )} on 1 or 2 lines?

Basically, what are your rules of thumb for organizing large R scripts?

Answer

Dirk Eddelbuettel picture Dirk Eddelbuettel · Aug 12, 2009

The standard answer is to use packages -- see the Writing R Extensions manual as well as different tutorials on the web.

It gives you

  • a quasi-automatic way to organize your code by topic
  • strongly encourages you to write a help file, making you think about the interface
  • a lot of sanity checks via R CMD check
  • a chance to add regression tests
  • as well as a means for namespaces.

Just running source() over code works for really short snippets. Everything else should be in a package -- even if you do not plan to publish it as you can write internal packages for internal repositories.

As for the 'how to edit' part, the R Internals manual has excellent R coding standards in Section 6. Otherwise, I tend to use defaults in Emacs' ESS mode.

Update 2008-Aug-13: David Smith just blogged about the Google R Style Guide.