Haskell type vs. newtype with respect to type safety

StevenC picture StevenC · Jun 13, 2009 · Viewed 10k times · Source

I know newtype is more often compared to data in Haskell, but I'm posing this comparison from more of a design point-of-view than as a technical problem.

In imperitive/OO languages, there is the anti-pattern "primitive obsession", where the prolific use of primitive types reduces the type-safety of a program and introduces accidentally interchangeability of same-typed values, otherwise intended for different purposes. For example, many things can be a String, but it would be nice if a compiler could know, statically, which we mean to be a name and which we mean to be the city in an address.

So, how often then, do Haskell programmers employ newtype to give type distinctions to otherwise primitive values? The use of type introduces an alias and gives a program's readability clearer semantics, but doesn't prevent accidentally interchanges of values. As I learn haskell I notice that the type system is as powerful as any I have come across. Therefore, I would think this is a natural and common practice, but I haven't seen much or any discussion of the use of newtype in this light.

Of course a lot of programmers do things differently, but is this at all common in haskell?

Answer

Christopher Done picture Christopher Done · Oct 10, 2010

The main uses for newtypes are:

  1. For defining alternative instances for types.
  2. Documentation.
  3. Data/format correctness assurance.

I'm working on an application right now in which I use newtypes extensively. newtypes in Haskell are a purely compile-time concept. E.g. with unwrappers below, unFilename (Filename "x") compiled to the same code as "x". There is absolutely zero run-time hit. There is with data types. This makes it a very nice way to achieve the above listed goals.

-- | A file name (not a file path).
newtype Filename = Filename { unFilename :: String }
    deriving (Show,Eq)

I don't want to accidentally treat this as a file path. It's not a file path. It's the name of a conceptual file somewhere in the database.

It's very important for algorithms to refer to the right thing, newtypes help with this. It's also very important for security, for example, consider upload of files to a web application. I have these types:

-- | A sanitized (safe) filename.
newtype SanitizedFilename = 
  SanitizedFilename { unSafe :: String } deriving Show

-- | Unique, sanitized filename.
newtype UniqueFilename =
  UniqueFilename { unUnique :: SanitizedFilename } deriving Show

-- | An uploaded file.
data File = File {
   file_name     :: String         -- ^ Uploaded file.
  ,file_location :: UniqueFilename -- ^ Saved location.
  ,file_type     :: String         -- ^ File type.
  } deriving (Show)

Suppose I have this function which cleans a filename from a file that's been uploaded:

-- | Sanitize a filename for saving to upload directory.
sanitizeFilename :: String            -- ^ Arbitrary filename.
                 -> SanitizedFilename -- ^ Sanitized filename.
sanitizeFilename = SanitizedFilename . filter ok where 
  ok c = isDigit c || isLetter c || elem c "-_."

Now from that I generate a unique filename:

-- | Generate a unique filename.
uniqueFilename :: SanitizedFilename -- ^ Sanitized filename.
               -> IO UniqueFilename -- ^ Unique filename.

It's dangerous to generate a unique filename from an arbitrary filename, it should be sanitized first. Likewise, a unique filename is thus always safe by extension. I can save the file to disk now and put that filename in my database if I want to.

But it can also be annoying to have to wrap/unwrap a lot. In the long run, I see it as worth it especially for avoiding value mismatches. ViewPatterns help somewhat:

-- | Get the form fields for a form.
formFields :: ConferenceId -> Controller [Field]
formFields (unConferenceId -> cid) = getFields where
   ... code using cid ..

Maybe you'll say that unwrapping it in a function is a problem -- what if you pass cid to a function wrongly? Not an issue, all functions using a conference id will use the ConferenceId type. What emerges is a sort of function-to-function-level contract system that is forced at compile time. Pretty nice. So yeah I use it as often as I can, especially in big systems.