Watch Chris Learn!

Writings about the things I learn (Ruby / Haskell / Web / Other Stuff)

Pretty Printing a Tree With Text.PrettyPrint

I looked into Haskell’s pretty package (and the prettyclass package) to print out a simple tree structure I defined. I wanted a nicely nested output.

Why not Show?

The show typeclass isn’t what we want for human readable output. read . show should be the same as id. This means that we can’t ever throw away extraneous data , and we have to be 100% sure to preserve any structure.

These rules get in the way of human readable output. For example, printing a User record might omit the full address if it gets too long, not include the bio, and generally lay out the data in a simplified manner.

The Pretty typeclass

The prettyclass package defines a general typeclass for all types that can be printed out for human consumption. It comes with some standard types defined (like Int and similar)

Before we talk about pretty printing, we need to look at our type first.

1
data Tree a = Leaf a | Node (Tree a) (Tree a) deriving (Show)

A simple binary tree, where interior nodes don’t store anything, and leaves hold all the values.

We need to define an instance of our Pretty for our type. As the docs say, Minimal complete definition is either pPrintPrec or pPrint.

pPrintPrec takes a PrettyLevel, which defines the level of detailed requested. Since we want to show all the data in the tree, this is unnecessary for us, we can implement the simpler pPrint function.

The typeclass instance definition

Since our type is parameterized on a, we need to limit it somehow. In this case, we’re going to say that if our a is a member of Pretty , then we can pretty print a whole tree of a

1
2
3
4
import Text.PrettyPrint.HughesPJClass

instance (Pretty a) => Pretty (Tree a) where
  pPrint tree = ...

The Text.PrettyPrint

The pretty package implements a TON of helpers to actually lay out the data.

In this case, our goal is to have the tree look like:

1
2
3
4
5
Node:
  Leaf: 1
  Node:
    Leaf: 2
    Leaf: 3

Each level gets labeled by type, and nesting levels get indented by 2.

The Leaf Case

So first, lets do the Leaf case, where we print the literal Leaf: and then ask the a type to be pretty printed itself.

1
  pPrint (Leaf a)   = text "Leaf: " <> pPrint a

Pretty implements <> the same as a monoid, combining two Docs into one. Pretty also implements a <+> which is like <> except it will insert a space between two non-empty documents.

The Node case

The Node case is much more interesting.

1
2
3
  pPrint (Node l r) = vcat [ text "Node:"
                           , nest 2 (pPrint l)
                           , nest 2 (pPrint r)]

First we destructure the argument , then we build a 3 element list , each containing a Doc type. The first one is the literal Node: text , then the next two are indented by 2 spaces. Then we recursively call pPrint on the left and right sub-trees.

nest takes an indent level, and a document and returns a new document with the same content, except indented.

The vcat function takes a list of documents, and lays them out vertically.

Fairly straight forward.

And…

I was impressed by how easy this library was to use. Although I was rather confused by how hard it was to use the typeclass. The pretty package specifically has a module that defines the Pretty class, but GHC couldn’t find it.

I could see the use of this library in a large project, full of custom types.

A logging function could easily ask to prettyShow each individual item it logs.

Full code is available at: https://github.com/cschneid/cschneid-pretty/

Spock Basics

Spock Intro – Minimal Web Framework

Spock is a slick little web framework in Haskell that builds off of Scotty’s legacy – although it apparently doesn’t share code any more.

It supports middleware, and a nice routing api.

For instance, to setup logging and static file serving, you just wire up two Wai middlewares.

1
2
3
4
appMiddleware :: SpockT IO ()
appMiddleware = do
  middleware logStdoutDev
  middleware $ staticPolicy (noDots >-> addBase "static")

Then the routes get built up (referencing the actual handler functions defined elsewhere)

1
2
3
4
5
appRoutes :: SpockT IO ()
appRoutes = do
  get "/"          $ Static.root
  get "/users"     $ Users.index
  get "/users/:id" $ Users.show

Then connect the pieces up and run on port 3000.

1
main = runSpock 3000 (appMiddleware >> appRoutes)

Handlers

I found myself repeating the specific ActionT type (the Spock route handler), so I type aliased it to be specific to my app (wrapping IO). This has the benefit of letting me change it in only one spot if/when I decide that I need a different monad transformer stack.

1
type HandlerM = ActionT IO

Then the actual handlers just have HandlerM and the return value (mostly just unit)

1
2
root :: HandlerM ()
root = text "Hello!"

There are a TON of helper functions to use in the context of a handler – redirect, json, html, setHeader, etc, etc.

More and More

Spock claims to support sessions, database connection pooling and more, but I haven’t had a chance to dive into that integration.

Typed Scalding Pipes

Quick Recap

A while back I described a Hadoop job that I implemented with Scalding (Distilling the Newest Record with Scalding). To recap, the goal was to take a huge list of “facts”, each containing a single timestamped fact about a large piece of system data. The goal is to get a recombined version of a domain object at a given time.

A Fact

1
2
3
4
{ "asserted_at": "2014-05-01T04:02:56Z",
  "subject": "device:123",
  "property": "serial_number",
  "value": "V29B044" }

Argonaut.

The first version of this job I wrote used a built-in JSON parser. Turns out that’s an iffy approach, so I turned to the Argonaut library to parse my JSON into well structured Scala structs.

This code is almost literally off the Argonaut examples. I was really impressed at how easy this was.

1
2
3
4
5
6
7
8
9
import argonaut._
import Argonaut._

case class Fact(asserted_at: String, subject : String, property: String, value: Json)

object Fact {
  implicit def FactCodecJson : CodecJson[Fact] =
    casecodec4(Fact.apply, Fact.unapply)("asserted_at", "subject", "property", "value")
}

This allows me to take a string and call decodeOption on it to get a Option[Fact].

Functions!

One of the things I really wanted to explore was splitting up the large job into an aggregate of lots of small jobs. The best way to do that of course is using functions.

Here’s an easy one to do the JSON parsing:

1
2
3
4
5
6
def parseJsonAsFact(pipe : TypedPipe[String]) : TypedPipe[Fact] = {
  pipe
    .map    { _.decodeOption[Fact] }
    .filter { _.nonEmpty }
    .map    { _.orNull }
}

This takes a TypedPipe[String] and for each string, transforms it into a TypedPipe[Fact], or just throws away anything that didn’t parse.

Getting the input and parsing

Actually fetching input, and working with it to make output is easy. We assemble our small functions with Scala’s andThen combinator. This makes one large function that is named job, which we then run with the input.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
val input_file  = args.getOrElse("input",  "/master-dataset")
val output_file = args.getOrElse("output", "/output")

// Everything is stored as a SequenceFile, where the key is the timestamp it was recorded.
val source   = WritableSequenceFile[DoubleWritable, Text](input_file, ('sequenceFileKey, 'factJSON))
val rawInput = TypedPipe.from(source)
val input    = rawInput.map { _._2.toString } // TypedPipe[String]

// This is a single column, so Tsv is misleading, no tabs will be output
val output = TypedTsv[String](output_file)

// Build up a large function that is our entire pipeline.
val job = parseJsonAsFact _ andThen
          ///// More steps here.

job(input).write(output)

Finishing up the job

Here is an example of the whole pipeline I have written. It did take me a bit to figure out how filterByType could be parameterized by the thing I was filtering.

1
2
3
4
5
6
7
8
9
10
11
// TypedPipe[String] => TypedPipe[String], which is handily our input and output types.
val job = facts.parseJsonAsFact _                        andThen   //    TypedPipe[Fact]
          (facts.filterByType _).curried("observations") andThen   // => TypedPipe[Fact] (only observation related ones)
          facts.filterNewest _                           andThen   // => TypedPipe[Fact] (only the newest of any given subject/property)
          createMeasurementDate _                        andThen   // => TypedPipe[Fact] (new records with measurement_date in the stream)
          mergeObservations _                            andThen   // => TypedPipe[Observation] combine facts into observations
          renderAsJson _                                           // => TypedPipe[String] observations spun out as json

def filterByType(filter : String, pipe : TypedPipe[Fact]) : TypedPipe[Fact] = {
  pipe.filter { _.subject.startsWith(filter) }
}

The filtering is easy. But there’s a little trick in that the .groupBy call changes a TypedPipe into a Grouped, which has two type arguments – the “group key” and the type of the values that match that key.

Second note: the custom sorting of facts was a hurdle I had to get over, turned out to be easy – just define the sorting, and call it in a rather unintuitive way.

1
2
3
4
5
6
7
8
9
10
11
def filterNewest(pipe : TypedPipe[Fact]) : TypedPipe[Fact] = {
  pipe
    .groupBy { fact : Fact => (fact.subject, fact.property) } // => Grouped[Fact, (String, String)]
    .sortedReverseTake(1)(AssertedAtOrdering)
    .values
    .flatten
}

object AssertedAtOrdering extends Ordering[Fact] {
  def compare(a:Fact, b:Fact) = a.asserted_at compare b.asserted_at
}

I won’t go into all the pieces of my whole pipeline, most of it isn’t all that interesting, but I do want to note that you can return more or less records from a pipe than came in. It doesn’t have to be a 1:1 tranformation.

For example, I needed both the date and the full datetime in my observation domain object. This function does that for me by splitting the pipe in two with filters, sidelining the uninteresting half (not_measurement_at_pipe), returning multiple values from the measurement_at_pipe. Finally the sidelined pipe can be merged back into the stream.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def createMeasurementDate(pipe : TypedPipe[Fact]) : TypedPipe[Fact] = {
  val measurement_at_pipe     = pipe.filter    { fact : Fact => fact.property == "measurement_at" }
  val not_measurement_at_pipe = pipe.filterNot { fact : Fact => fact.property == "measurement_at" }
  val converted_pipe = measurement_at_pipe
    .flatMap { fact : Fact =>
      List(
            fact,
            fact.copy(property = "measurement_date",
                      value    = jString(fact.value.stringOr("0000-00-00").substring(0, "yyyy-mm-dd".length)))
          )
    }

  converted_pipe ++ not_measurement_at_pipe
}

Final Thoughts

I really like the Typed api for writing jobs. The Scala compiler informs you of errors (which is a much faster testing cycle than waiting for Hadoop to run a compiled jar and fail at some point. That makes it a 10 second response time, versus a 5 minute response time).

In addition, it’s so much easier to keep track of real classes and work on them, than trying to track sets of untyped, named fields.

So use the TypedApi, and let Scala do more of the work for you.

Machines

I gave a short talk at the local Haskell meetup yesterday about the library “Machines” by the ever-so-famous Edward Kmett.

This is a quick roundup of what I learned, and the resources I ran across.

Counting Words

The initial task I gave myself to learn was to read an input line, and report how many words were in that line.

That consisted of 3 machines wired together in a pipeline. I only had to write a custom function for the worker in the middle. And even that was a one-liner.

The auto function (and it’s autoM friend) seem like the easiest way to create a simple mapper type machine that takes some input, does a bit of work, and spits out output.

1
2
3
4
5
6
7
eachLineCount :: IO ()
eachLineCount = runT_ $ repeatedly (yield =<< liftIO getLine)
                     ~> countWords
                     ~> autoM print

countWords :: Process String Int
countWords = auto (length . splitOn " ")

Teeing two inputs together

The other big thing I tackled was the Tee type. It lets you read from one of two incoming streams of data, explicitly. For example, logically you can say: “Give me the next value off the left stream”

There’s another type of multi-input machine I didn’t dive into called Wye that allows for a blind await in the consuming end, and the left pipe will be read until its empty, and then the right pipe will be read (as opposed to explicitly asking for Left or Right on a Tee)

Actually building the Tee was relatively simple once I figured out the tee function. I have a commented out version at the bottom of the next snippet that manually assembled the Tee using addL and capR. It is equivalent to the much shorter tee version.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
compareLineCounts :: String -> IO ()
compareLineCounts fixedString =
  runT_ $ tee (repeated fixedString ~> countWords) (ioInput ~> countWords) mergeInput
       ~> compareWords
       ~> autoM putStrLn

ioInput :: (MonadIO m) => SourceT m String
ioInput = repeatedly $ do
                        liftIO $ putStrLn "Enter your new line to compare: "
                        x <- liftIO getLine
                        yield x

mergeInput :: Tee a a (a,a)
mergeInput = repeatedly $ do
              x <- awaits L
              y <- awaits R
              yield (x, y)

compareWords :: (Ord a) => Process (a, a) String
compareWords = repeatedly $ do (x,y) <- await
                               yield $ case compare x y of
                                        GT -> "Greater Than"
                                        LT -> "Less Than"
                                        EQ -> "Equal To"

-- compareLineCounts :: String -> IO ()
-- compareLineCounts fixedString =
--   runT_ $ (capR (repeated fixedString ~> countWords) $
--            addL (ioInput              ~> countWords)
--            mergeInput)
--        ~> compareWords
--        ~> autoM putStrLn

Thanks

Many thanks are in order to @kmett, @yoeight, @glguy, @cartazio and everybody else I asked questions of, all of whom helped me immensely on IRC.

Distilling the Newest Record With Scalding

The Problem Statement

Background

At Comverge, we are building a new project based on the lambda architecture. One of the core aspects of the lambda architecture revolves around an immutable, always growing store of data.

We store this data as a series of facts. Each fact is a single statement about the state of the world at a given time. For example, here is a set of facts generated when a new user signs up.

1
2
3
{ "subject" => "user:1", "property" => "username", "value" => "cschneid", "asserted_at" => "2014-03-01T06:00:00Z" }
{ "subject" => "user:1", "property" => "realname", "value" => "Chris", "asserted_at" => "2014-03-01T06:00:00Z" }
{ "subject" => "user:1", "property" => "password", "value" => "b4e7a69126ef83206b8db39fb78f2bdf", "asserted_at" => "2014-03-01T06:00:00Z" }

It often takes a bunch of facts working in concert to build a consistent view of the world.

The real beauty of this approach is when new records come in, we can rewind time and still see what we knew at what point.

For example, if this user changes their username, we don’t change the old username record, but just record the new one, and let the timestamps tell us the current state of the user.

1
{ "subject" => "user:1", "property" => "username", "value" => "ChrisTheWizard", "asserted_at" => "2014-03-05T06:00:00Z" }

Actual Problem Statement

I want a map-reduce job to generate the newest state of everything in our system. So in the background example, I would want a single record that contained:

1
{ "subject" => "user:1", "username" => "ChrisTheWizard", "realname" => "Chris", "password" => "b4e7a69126ef83206b8db39fb78f2bdf"}

We’ve thrown away the older username fact, and rearranged the data.

Scalding

So I played around with hadoop in various forms, and ended up with Scalding as an environment to write map-reduce jobs in.

I’ll walk you through the code I ended up with, and where I’m still working to finish up.

A few imports

We of course need scalding’s libraries, and our specific use cases need json parsing, and a mutable map for collecting up the final view of the data.

1
2
3
import com.twitter.scalding._
import scala.util.parsing.json._
import scala.collection.mutable

Setup Input & Output

Next up is the top matter of the code, where we setup the input and output files and types.

The data we have is specifically stored in hadoop sequence files, where the key is the timestamp of when the data was written, and an encoded JSON structure of the fact. We don’t care about the sequence file’s timestamp, so we just throw that away.

Similarly, the final output should be stored as JSON. The JsonLine class makes that really easy, but is fairly inflexible. It may be that I’ll need to write my own output class at some point.

1
2
3
class ExtractJSON(args : Args) extends Job(args) {
  val input = WritableSequenceFile("/advanced-apps/master-dataset/Facts.1394232008699", ('sequenceFileKey, 'factJSON))
  val output = JsonLine("/fact_data_output")

JSON Parse

Everything from here on out is a single pipeline.

We take the input file, and start reading from it. The first thing we need to do is deserialize the json (a single field of text as far as scalding is concerned) into the set of fields that we actually care about. We use the built-in JSON parser in scala to do this work for us.

Notice that a new field that hasn’t been mentioned shows up too, indicating if the JSON parsed or not. We will use that in the very next code snippet.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
  input
    .read

    // Parse the json of each fact. Extract out the 4 expected values.
    .map(('factJSON) -> ('parse_status, 'asserted_at, 'subject, 'property, 'value)) {
      line: String => {
        JSON.parseFull(line) match {
          case Some(data: Map[String, Any]) => ("success",
                                                data("asserted_at"),
                                                data("subject"),
                                                data("property"),
                                                data("value")
                                              )
          case None => ("failed", "", "", "", "")
        }
      }
    }

Error check the JSON parse

The JSON parsing code always works, and lets a value through. But some of those values may have “failed” in the parse_status field. If they do, just stop parsing that whole tuple (ie, throw away that whole line of input).

After that’s done, we have no more use for that field, so throw it away to keep the dataset small as we continue to move through the input.

1
2
.filter('parse_status) { status: String => status != "failed" }
.discard('parse_status)

Find the newest

Now our goal is to find only the newest version of a fact for each pair of subject / property. Continuing the background example, this would be the newest username that we know.

We do a groupBy, then sort the results based on the timestamp. Then take the first one result of that sort. (the newest item). Scalding provides an all-in-one way to do that with sortWithTake, so just use that.

The _ variable is a bit surprising to me. Mostly my ignorance of scala, it must be an implicit argument being passed into this anonymous function. In any case, it represents the whole grouping of {subject / property}

The comparison function is tricky since my value field is an Any, which can’t be automatically sorted by the language. So instead I give it an explicit rule to sort by (just use the timestamp, and ignore the value field). But I do need the value to be included in that sortWithTake so it comes out the other side of the funnel with the value I was looking for.

Once done, flatten out the temporary items field that we stored that pair of asserted_at, value into, and get rid of it.

1
2
3
4
5
6
7
8
// Find the newest asserted at for each combo of subject & property
.groupBy('subject, 'property) {
  _.sortWithTake[(String, Any)](('asserted_at, 'value) -> 'items , 1) {
    case ((asL, _), (asR, _)) => asL > asR
  }
}
.flatten[(String, Any)](('items) -> ('asserted_at, 'value))
.discard('items)

Combine the many facts

At this point, we now have all the newest facts, having removed any outdated ones during the sort.

So the job now is to combine many rows of facts about a subject into a single row that represents all of what we know about that subject.

Once again, we groupBy, but this time just on subject.

Then we use foldLeft to loop over each property/value pair that we get and save it into a mutable Map. I had a bit of fun here trying to figure out how the syntax for adding to a Map works. See the result below for how I did it (apparently there are 2 or 3 different ways).

The tuple that comes out of this step is {subject, properties(property/value, property/value…)}

1
2
3
4
5
6
7
.groupBy('subject) {
  _.foldLeft(('property, 'value) -> 'properties)(mutable.Map.empty[String,Any]) {
    (properties: mutable.Map[String,Any], propAndVal: (String, Any)) =>
    val (prop, value) = propAndVal
    properties += prop -> value
  }
}

Finish Up

So now we have a tuple of data we want, lets serialize it back out to disk and close the class we were working inside of.

1
2
.write(output)
}

Hopefully that helped!

It took me about 2 days to get the whole stack there working right, and posco on IRC was super helpful in getting me unstuck.

Next Steps

I need to figure out the actual output format I want. I think it includes streaming the output into Cassandra, rather than simple a JSON format on disk. That will involve figuring out how to connect to Cassandra and do the insert. I’ll try to write a follow-up post about that.

$! Threadsafety

I was investigating the $! variable in Ruby, specifically if it is truely a global variable the way the leading $ implies.

I made a quick test case, where I attempt to raise errors, then print out the message. This should detect a race condition after a few attempts.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
t1 = Thread.new do
  100000.times do
    begin
      raise "T1 Error"
    rescue
      puts "T1 - #{$!}"
    end
  end
end

t2 = Thread.new do
  100000.times do |i|
    begin
      raise "T2 Error"
    rescue
      puts "T2 - #{$!}"
    end
  end
end

t1.join
t2.join

But the worse that happens is the newline getting printed out of order:

1
2
3
4
$ ruby globals.rb | grep "T1.*T2"
T1 - T1 ErrorT2 - T2 Error
T1 - T1 ErrorT2 - T2 Error
T1 - T1 ErrorT2 - T2 Error

Result

So the result of all this is that no, the $! is not a real global, but instead thread-local (at least).

Hopefully I can go dig into the code to figure out what scope it really is.

Working Entirely in EitherT

This is the last post in my series of stuff on the Either monad.

  1. Playing with the Either Monad
  2. Using the Either Monad Inside Another Monad
  3. EitherT Inside of IO

It’s a smallish change to the code, where I get rid of a lot of the annoying casting code to go Either -> EitherT, and instead just write everything in EitherT.

The biggest change was the type signature of my failure code. See how I add the Monad constraint, and update the return value to be EitherT wrapped around whatever monad you have.

What’s cool about this is that it’ll work right for both IO, and every other monad we want to embed this eitherFailure code into. Which means that as a hypothetical application’s monad transformer stack builds up, it would be easy to just plug this code in and go.

1
2
3
eitherFailure :: Monad m => Flag -> String -> EitherT String m String
eitherFailure Pass  val = right $ "-> Passed " ++ val
eitherFailure Error val = left  $ "-> Failed " ++ val

One other gotcha is that I had to change Right to right, which is a function that returns a hoisted version of the Either value. No biggie, just wouldn’t typecheck till I did.

If you read this code, you’ll see that the transformation from Maybe to MaybeT is very similar, right down to using just and nothing as functions, rather than the Just and Nothing data constructors.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import Control.Error
import Control.Monad.Trans

-- A type for my example functions to pass or fail on.
data Flag = Pass | Error

main :: IO ()
main = do
  putStrLn "Starting to do work:"

  result <- runEitherT $ do
      lift $ putStrLn "Give me the first input please:"
      initialText <- lift getLine
      x <- eitherFailure Error initialText

      lift $ putStrLn "Give me the second input please:"
      secondText <- lift getLine
      y <- eitherFailure Pass (secondText ++ x)

      noteT ("Failed the Maybe: " ++ y) $ maybeFailure Pass y

  case result of
    Left  val -> putStrLn $ "Work Result: Failed\n " ++ val
    Right val -> putStrLn $ "Work Result: Passed\n " ++ val

  putStrLn "Ok, finished. Have a nice day"

eitherFailure :: Monad m => Flag -> String -> EitherT String m String
eitherFailure Pass  val = right $ "-> Passed " ++ val
eitherFailure Error val = left  $ "-> Failed " ++ val

maybeFailure :: Monad m => Flag -> String -> MaybeT m String
maybeFailure Pass  val = just $ "-> Passed maybe " ++ val
maybeFailure Error _   = nothing

EitherT Inside of IO

Keeping with our series of posts about using the Either Monad in various ways:

  1. Playing with the Either Monad
  2. Using the Either Monad Inside Another Monad

This time, I expand from Either to EitherT, which allows us to interleave an outer monad with an inner one.

When we call runEitherT with a do block, we are making a new context, where we make an EitherT type, wrapped around an inner IO type. I am not sure what the exact type there is, I’ll have to look into that later.

I import Control.Monad.Trans to get access to the lift function. That lets us go down a layer into that EitherT wrapped around the IO to run IO commands.

You can see how in the workflow of the EitherT section, it asks for some text, does some “work” that may fail, and then asks for the next bit of text to work on.

The coolest part is that if the first bit fails, it bails out of the whole workflow with the correct Left value, not even asking for the second bit of input.

The only other gotcha is that EitherT isn’t quite the same as a normal Either type, so you have to use functions to convert between them. hoistEither and hoistMaybe take a normal version of Either/Maybe and turn it into EitherT/MaybeT.

Similarly, we had to use noteT instead of note. Same behavior, but it just works on the transformed versions of the types.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import Control.Error
import Control.Monad.Trans

-- A type for my example functions to pass or fail on.
data Flag = Pass | Error

main :: IO ()
main = do
  putStrLn "Starting to do work:"

  result <- runEitherT $ do
      lift $ putStrLn "Give me the first input please:"
      initialText <- lift getLine
      x <- hoistEither $ eitherFailure Error initialText

      lift $ putStrLn "Give me the second input please:"
      secondText <- lift getLine
      y <- hoistEither $ eitherFailure Pass (secondText ++ x)

      noteT ("Failed the Maybe: " ++ y) $ hoistMaybe $ maybeFailure Pass y

  case result of
    Left  val -> putStrLn $ "Work Result: Failed\n " ++ val
    Right val -> putStrLn $ "Work Result: Passed\n " ++ val

  putStrLn "Ok, finished. Have a nice day"

-- Simple function that we can use to force it to error out with a Left, or
-- pass with a Right value. It just includes some helper text as its content,
-- showing what happened.
eitherFailure :: Flag -> String -> Either String String
eitherFailure Pass  val = Right $ "-> Passed " ++ val
eitherFailure Error val = Left  $ "-> Failed " ++ val

-- Simlar to eitherFailure, but return a (Just String) or a Nothing based on
-- if we told it to fail.
maybeFailure :: Flag -> String -> Maybe String
maybeFailure Pass  val = Just $ "-> Passed maybe " ++ val
maybeFailure Error _   = Nothing

Using the Either Monad Inside Another Monad

After yesterday’s post about the Either Monad I wanted to see if it was easy to embed that bit of doWork stuff right into the main function.

This was mostly about learning the syntax, I would suggest keeping stuff separate as much as possible in real code.

The biggest gotcha I found was that the indentation of the x <- eitherFailure... bit needed to be deeper than the r in the result token. This ended up being more than my normal 2 space indent.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import Control.Error

-- A type for my example functions to pass or fail on.
data Flag = Pass | Error

main :: IO ()
main = do
  putStrLn "Starting to do work:"

  -- The inner monad here is Either. But note that we have
  -- no IO ability inside of it.
  let result = do
      x <- eitherFailure Pass "Initial Thing"
      y <- eitherFailure Error ("Second Thing " ++ x)
      note ("Failed the Maybe: " ++ y) $ maybeFailure Pass y

  case result of
    Left  val -> putStrLn $ "Work Result: Failed\n " ++ val
    Right val -> putStrLn $ "Work Result: Passed\n " ++ val
  putStrLn "Ok, finished. Have a nice day"

-- Simple function that we can use to force it to error out with a Left, or
-- pass with a Right value. It just includes some helper text as its content,
-- showing what happened.
eitherFailure :: Flag -> String -> Either String String
eitherFailure Pass  val = Right $ "-> Passed " ++ val
eitherFailure Error val = Left  $ "-> Failed " ++ val

-- Simlar to eitherFailure, but return a (Just String) or a Nothing based on
-- if we told it to fail.
maybeFailure :: Flag -> String -> Maybe String
maybeFailure Pass  val = Just $ "-> Passed maybe " ++ val
maybeFailure Error _   = Nothing

You can see it’s the same code, except the result in main is calculated directly there, rather than calling another function.

Note that this isn’t the transformer library, so you can’t be clever and do stuff like lift and friends to do IO in that Either workflow.

Playing With the Either Monad in Haskell

After playing with the bitcoin price fetcher, I was disappointed at how… hard it was to deal with the multiple layers of potential errors. I started looking into the errors package on Hackage for a way out. It is a one-stop-shop for all the standard error handling mechanisms in Haskell. It reexports the standard Either and Maybe types, and also adds many helper functions to move between Either and Maybe types, in addition to helping out with the various transformer versions of both (MaybeT and EitherT)

I will play with MaybeT and EitherT later, for now I’m happy to have figured out the Either monad, and want to share the annotated example I’ve cobbled together.

Grab the code into a file, cabal install errors, and start toying with the various places I use the Pass and Error types in the doWork function. You’ll see how nicely Haskell handles a long string of things, where any one of them could fail out.

I’ll have to go rewrite the bitcoin scraper with my newfound knowledge…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import Control.Error

-- A type for my example functions to pass or fail on.
data Flag = Pass | Error

main :: IO ()
main = do
  putStrLn "Starting to do work:"
  let result = doWork
  case result of
    Left  val -> putStrLn $ "Work Result: Failed\n " ++ val
    Right val -> putStrLn $ "Work Result: Passed\n " ++ val
  putStrLn "Ok, finished. Have a nice day"

-- This is a driver function, simulating an error prone path
-- through the app.  Each step could possibly error out, and
-- when any of them do, we want to just bail out.
--
-- Remember the definition of the Either monad is:
-- instance Monad (Either e) where
--   return = Right
--   Right m >>= k = k m
--   Left e  >>= _ = Left e
--
-- So a Left value short circuits the rest of the Monad, and a Right value
-- passes the value off to the next step.
doWork :: Either String String
doWork = do -- use do notation syntax sugar for the Either monad

    -- First, do something that may or may not work. We get back a type of
    -- Either String String (since that's the type of the example
    -- eitherFailure function here)
    x <- eitherFailure Pass "Initial Thing"

    -- Based on what we get in x, just go ahead and attempt it.
    -- Note that the function eitherFailure takes a simple
    -- String as its argument.  So we didn't have to unwrap the
    -- first Either value.
    y <- eitherFailure Error ("Second Thing " ++ x)

    -- We can't just wire a Maybe value in the middle here,
    -- since it doesn't typecheck. (Maybe isn't an Either),
    -- even though they play similarly. If we just tried, we'd get:

    -- z <- maybeFailure Error
    -- Couldn't match type `Maybe' with `Either String'

    -- But instead, we can use Control.Error.Util.note to convert
    -- an "empty" Nothing value into a Left value with a descriptive
    -- error.  So now we'd get a proper Either value we can chain
    -- into this overall monad.
    note ("Failed the Maybe: " ++ y) $ maybeFailure Pass y

    -- Since the last line of this `do` block is the type we plan on
    -- returning, there's no `return` call needed.


-- Simple function that we can use to force it to error out with a Left, or
-- pass with a Right value. It just includes some helper text as its content,
-- showing what happened.
eitherFailure :: Flag -> String -> Either String String
eitherFailure Pass  val = Right $ "-> Passed " ++ val
eitherFailure Error val = Left  $ "-> Failed " ++ val

-- Simlar to eitherFailure, but return a (Just String) or a Nothing based on
-- if we told it to fail.
maybeFailure :: Flag -> String -> Maybe String
maybeFailure Pass  val = Just $ "-> Passed maybe " ++ val
maybeFailure Error _   = Nothing

My favorite part of diving into the error library is that my worry from yesterday, except then I’d have to switch the Maybe result out of Lens-Aeson into a “fuller” Either type. is just the note function I demoed above.