Data as the abstraction
Why Haskell is so important and what we can learn from it
part 1: introduction to concept
Fair warning: this text won’t really be about Haskell. Many other people have already written praises of Haskell as a language or the experience of using it way better than I could have. What I want to talk about is Haskell as a concept, which is to say, use Haskell as an example in order to talk about a concept that would exist regardless of it. Therefore it’s not necessary to have in-depth Haskell knowledge to understand the argument presented here, familiarity with its basic ideas should be enough.
Usually when people praise Haskell’s type system they talk about how it makes sure your code is always correct when you are writing or refactoring it. This is of course true and important, but it’s also somewhat of an understatement. Truly, the type system is not only something Haskell has, it’s what Haskell is. Types are data (it is perhaps even more accurate to say that data has a type and a value and then talk about values, like Rich Hickey did here so brilliantly, but I also don’t think it’s wrong to use data as our core concept, which I will do here), and there is nothing you can do in Haskell that is not data. No logic can be written in Haskell outside of data.
Think about Haskell programming constructs: the often dreaded Haskell monads are really just side effects represented as data. Typeclasses are data that define certain behaviour, yes, but this behaviour, like all behaviour in Haskell, is merely a rule, a declaration, a law, which states how one piece of data is transformed into another. This is also what functions are. You can’t really do things in Haskell except for in an appropriately titled do block which communicates with the outside world. Other than that everything is telling Haskell how to transform one piece of data into another. Higher order functions, currying and function composition are all natural consequences of this: since a function is just a piece of data it makes sense that we can pass it around and combine it with other pieces of data. In other words: we can treat it the same way as we would a variable in mathematics.
part 2: a very simple practical example
If you already understand what I mean by ‘any Haskell program is just a series of data transformations’, you can skip this next part, but I do think a very practical example can be useful to supplement the somewhat philosophical nature of this argument. The example is from a project I did, a semi-random text generator. We will analyse its most important part which is pretty much just a text formatter that formats text with random words. First this will be explained in plain english to illustrate the ‘series of data transformations’ point, then the actual code will be included (which more experiences Haskellers can feel free to critique).
data:
All the necessary inputs to the program are represented as one data type called Generator which has three fields. The first field is of type Map, the second is a FilePath and third is a list of integers.
data Generator = Generator (Map.Map Int [String]) FilePath [Int]
At program execution time, these pieces of data could for example be the following:
-
first field is a data structure containing lists of words:
wordsMap = Map.fromList[(1, ['short', 'simple', 'stupid'], 2, ['our, 'placeholder', 'useless'], 3, ['reading', 'looking at', 'ignoring'])]
-
second field is a filename ‘input.txt’ containing the text:
inputText = 'This is a {1} example of {2} text. Thanks for {3} it.'
-
third field is a list of three integers:
randIntList = [1,2,2]
This list is generated randomly every time the program runs, which is why the final output is also different every time. While the generation of this list and its randomness happen in another part of the program and are out of scope of this example, it is important to note its two constraints. Firstly, the integers it contains must be between 0 and 2, as the possible indexes of the words in the lists in wordsMap are only 0,1 and 2 (since all lists of words are of length 3). Secondly its length must be 3, because it must be equal to the length of wordsMap itself. Read on to understand why these constraints are such.
transformations:
These are the transformations used to generate the final output from the above pieces of data:
-
Transform inputText into a list of words:
listOfWords = ['This', 'is', 'a', '{1}', 'example', 'of', '{2}', 'text.', 'Thanks', 'for', '{3}', 'it'.]
-
Create a list of tuples from randIntList and its lenght (3) like so:
intMap = [(1,1), (2,2), (3,2)]
-
Create a new Map data structure from intMap and wordsMap by replacing integers in intMap with the words of the same index in a list in wordsMap like so:
transformedMap = Map.fromList[(1, 'simple'), (2, 'useless'), (3, 'ignoring')]
-
Create a new list of words by checking every word in listOfWords to see whether the word needs to be replaced (if it is of the form { + integer + }). If yes, use transformedMap to replace the integer inside the curly brackets with the corresponding word. The result will look like this:
finalListOfWords = ['This','is','a','simple','example','of','useless','text.', 'Thanks', 'for', 'ignoring', 'it'.]
-
Transform finalListofWords back to text:
finalText = 'This is a simple example of useless text. Thanks for ignoring it.'
full code:
import System.IO
import qualified Data.Map as Map
import Data.Char (digitToInt)
data Generator = Generator (Map.Map Int [String]) FilePath [Int]
generate :: Generator -> IO String
generate (Generator inputmap filepath randomnumbers) = do
handle <- openFile filepath ReadMode
contents <- hGetContents handle
let wordslist = words contents
let transformedMap = zipWith transformMap (createRandomMap randomnumbers)
let finalString = unwords $ map (getreplacementWord $ Map.fromList $ transformedMap $ Map.toList inputmap) wordslist
return finalString
createRandomMap :: [Int] -> [(Int, Int)]
createRandomMap randList = zip [1..length randList] randList
transformMap :: (Int, Int) -> (Int, [String]) -> (Int, String)
transformMap firstTuple secondTuple = (fst firstTuple, snd secondTuple !! snd firstTuple)
getreplacementWord :: Map.Map Int String -> String -> String
getreplacementWord map word =
case getReplacement word of
Nothing -> word
Just x -> getItemFromMap map (digitToInt x)
getItemFromMap :: Map.Map Int String -> Int -> String
getItemFromMap map intItem =
case Map.lookup intItem map of
Nothing -> " "
Just x -> x
getReplacement:: String -> Maybe Char
getReplacement word
| chars == 3 = getNumOfReplacement word
| otherwise = Nothing
where chars = length word
getNumOfReplacement :: String -> Maybe Char
getNumOfReplacement word
| first_char =='{' && third_char == '}' = Just second_char
| otherwise = Nothing
where first_char = head word
second_char = word !! 1
third_char = word !! 2
part 3: the actual argument
The above algorithm could also be written in a pretty similar way in many other languages, even some non-functional ones. What makes Haskell different however is that this is the only way in which you can do it. It forces you to yes, learn some mathematical concepts, but on a more fundamental level to reason about your code only in terms of data and transformations between it.
In imperative languages, the difference between the simplest and the most complicated program is that the simplest one is just a procedure: do this, then do this, then this. But the most complicated one is an abstraction: almost always a class and often also a design pattern. In Haskell however, both the simplest and the most complicated possible program are transformations of data, the more complicated one will simply have more complex data (such as many levels of nested custom types) and more (complicated) transformations. Where another language would turn to a design pattern to implement a solution to a complex problem, in Haskell we would approach this by asking ourselves ‘How can we break this problem down into smaller pieces, then represent them as data?’ and then design our types in a way that is conducive to solving the problem (the Generator type designed to help hold pieces of data needed for text generation is a very simple example of exactly that).
Before arguing which of those approaches is ‘better’ we must ask ourselves: ‘better’ from which perspective? Because if it’s from the perspective of the person writing code then it’s no surprise that this breeds objections along the lines of ‘well everyone has their favourite programming language or paradigm, stop trying to enforce yours’ and to be fair, that is valid - no one should be forced to work with a technology they don’t like working with. But I believe people trying to argue some objective superiority of the functional paradigm are really arguing that it is better from the perspective of the program execution. Data is predictable, safe, it cannot break, it has no ability to influence anything around it. While behaviour-based abstractions such as classes and design patterns are certainly better than no abstractions at all, they need to be implemented exactly correctly to produce the expected outcome. There is no way to check at the time of program execution whether an implementation of a class or design pattern actually makes sense semantically beyond being syntactically correct. But when data is the only abstraction, we can do that - and that is exactly what the Haskell compiler does and what its type system is for. It can check the correctness of an abstractions’s internal logic, and that is extraordinarily powerful.
An even simpler way to put this (and I’m probably not the first one to make this analogy) would be to think of a behaviour-based abstraction as a recipe while data is just a list of ingredients. While the success of the former depends heavily on the human factor, the latter lays this burden mostly on that which executes the program. And needless to say, a computer is much, much less likely to make a mistake than a human is.
part 1: introduction to concept
Fair warning: this text won’t really be about Haskell. Many other people have already written praises of Haskell as a language or the experience of using it way better than I could have. What I want to talk about is Haskell as a concept, which is to say, use Haskell as an example in order to talk about a concept that would exist regardless of it. Therefore it’s not necessary to have in-depth Haskell knowledge to understand the argument presented here, familiarity with its basic ideas should be enough.
Usually when people praise Haskell’s type system they talk about how it makes sure your code is always correct when you are writing or refactoring it. This is of course true and important, but it’s also somewhat of an understatement. Truly, the type system is not only something Haskell has, it’s what Haskell is. Types are data (it is perhaps even more accurate to say that data has a type and a value and then talk about values, like Rich Hickey did here so brilliantly, but I also don’t think it’s wrong to use data as our core concept, which I will do here), and there is nothing you can do in Haskell that is not data. No logic can be written in Haskell outside of data.
Think about Haskell programming constructs: the often dreaded Haskell monads are really just side effects represented as data. Typeclasses are data that define certain behaviour, yes, but this behaviour, like all behaviour in Haskell, is merely a rule, a declaration, a law, which states how one piece of data is transformed into another. This is also what functions are. You can’t really do things in Haskell except for in an appropriately titled do block which communicates with the outside world. Other than that everything is telling Haskell how to transform one piece of data into another. Higher order functions, currying and function composition are all natural consequences of this: since a function is just a piece of data it makes sense that we can pass it around and combine it with other pieces of data. In other words: we can treat it the same way as we would a variable in mathematics.
part 2: a very simple practical example
If you already understand what I mean by ‘any Haskell program is just a series of data transformations’, you can skip this next part, but I do think a very practical example can be useful to supplement the somewhat philosophical nature of this argument. The example is from a project I did, a semi-random text generator. We will analyse its most important part which is pretty much just a text formatter that formats text with random words. First this will be explained in plain english to illustrate the ‘series of data transformations’ point, then the actual code will be included (which more experiences Haskellers can feel free to critique).
data:
All the necessary inputs to the program are represented as one data type called Generator which has three fields. The first field is of type Map, the second is a FilePath and third is a list of integers.
data Generator = Generator (Map.Map Int [String]) FilePath [Int]
At program execution time, these pieces of data could for example be the following:
-
first field is a data structure containing lists of words:
wordsMap = Map.fromList[(1, ['short', 'simple', 'stupid'], 2, ['our, 'placeholder', 'useless'], 3, ['reading', 'looking at', 'ignoring'])]
-
second field is a filename ‘input.txt’ containing the text:
inputText = 'This is a {1} example of {2} text. Thanks for {3} it.'
-
third field is a list of three integers:
randIntList = [1,2,2]
This list is generated randomly every time the program runs, which is why the final output is also different every time. While the generation of this list and its randomness happen in another part of the program and are out of scope of this example, it is important to note its two constraints. Firstly, the integers it contains must be between 0 and 2, as the possible indexes of the words in the lists in wordsMap are only 0,1 and 2 (since all lists of words are of length 3). Secondly its length must be 3, because it must be equal to the length of wordsMap itself. Read on to understand why these constraints are such.
transformations:
These are the transformations used to generate the final output from the above pieces of data:
-
Transform inputText into a list of words:
listOfWords = ['This', 'is', 'a', '{1}', 'example', 'of', '{2}', 'text.', 'Thanks', 'for', '{3}', 'it'.]
-
Create a list of tuples from randIntList and its lenght (3) like so:
intMap = [(1,1), (2,2), (3,2)]
-
Create a new Map data structure from intMap and wordsMap by replacing integers in intMap with the words of the same index in a list in wordsMap like so:
transformedMap = Map.fromList[(1, 'simple'), (2, 'useless'), (3, 'ignoring')]
-
Create a new list of words by checking every word in listOfWords to see whether the word needs to be replaced (if it is of the form { + integer + }). If yes, use transformedMap to replace the integer inside the curly brackets with the corresponding word. The result will look like this:
finalListOfWords = ['This','is','a','simple','example','of','useless','text.', 'Thanks', 'for', 'ignoring', 'it'.]
-
Transform finalListofWords back to text:
finalText = 'This is a simple example of useless text. Thanks for ignoring it.'
full code:
import System.IO
import qualified Data.Map as Map
import Data.Char (digitToInt)
data Generator = Generator (Map.Map Int [String]) FilePath [Int]
generate :: Generator -> IO String
generate (Generator inputmap filepath randomnumbers) = do
handle <- openFile filepath ReadMode
contents <- hGetContents handle
let wordslist = words contents
let transformedMap = zipWith transformMap (createRandomMap randomnumbers)
let finalString = unwords $ map (getreplacementWord $ Map.fromList $ transformedMap $ Map.toList inputmap) wordslist
return finalString
createRandomMap :: [Int] -> [(Int, Int)]
createRandomMap randList = zip [1..length randList] randList
transformMap :: (Int, Int) -> (Int, [String]) -> (Int, String)
transformMap firstTuple secondTuple = (fst firstTuple, snd secondTuple !! snd firstTuple)
getreplacementWord :: Map.Map Int String -> String -> String
getreplacementWord map word =
case getReplacement word of
Nothing -> word
Just x -> getItemFromMap map (digitToInt x)
getItemFromMap :: Map.Map Int String -> Int -> String
getItemFromMap map intItem =
case Map.lookup intItem map of
Nothing -> " "
Just x -> x
getReplacement:: String -> Maybe Char
getReplacement word
| chars == 3 = getNumOfReplacement word
| otherwise = Nothing
where chars = length word
getNumOfReplacement :: String -> Maybe Char
getNumOfReplacement word
| first_char =='{' && third_char == '}' = Just second_char
| otherwise = Nothing
where first_char = head word
second_char = word !! 1
third_char = word !! 2
part 3: the actual argument
The above algorithm could also be written in a pretty similar way in many other languages, even some non-functional ones. What makes Haskell different however is that this is the only way in which you can do it. It forces you to yes, learn some mathematical concepts, but on a more fundamental level to reason about your code only in terms of data and transformations between it.
In imperative languages, the difference between the simplest and the most complicated program is that the simplest one is just a procedure: do this, then do this, then this. But the most complicated one is an abstraction: almost always a class and often also a design pattern. In Haskell however, both the simplest and the most complicated possible program are transformations of data, the more complicated one will simply have more complex data (such as many levels of nested custom types) and more (complicated) transformations. Where another language would turn to a design pattern to implement a solution to a complex problem, in Haskell we would approach this by asking ourselves ‘How can we break this problem down into smaller pieces, then represent them as data?’ and then design our types in a way that is conducive to solving the problem (the Generator type designed to help hold pieces of data needed for text generation is a very simple example of exactly that).
Before arguing which of those approaches is ‘better’ we must ask ourselves: ‘better’ from which perspective? Because if it’s from the perspective of the person writing code then it’s no surprise that this breeds objections along the lines of ‘well everyone has their favourite programming language or paradigm, stop trying to enforce yours’ and to be fair, that is valid - no one should be forced to work with a technology they don’t like working with. But I believe people trying to argue some objective superiority of the functional paradigm are really arguing that it is better from the perspective of the program execution. Data is predictable, safe, it cannot break, it has no ability to influence anything around it. While behaviour-based abstractions such as classes and design patterns are certainly better than no abstractions at all, they need to be implemented exactly correctly to produce the expected outcome. There is no way to check at the time of program execution whether an implementation of a class or design pattern actually makes sense semantically beyond being syntactically correct. But when data is the only abstraction, we can do that - and that is exactly what the Haskell compiler does and what its type system is for. It can check the correctness of an abstractions’s internal logic, and that is extraordinarily powerful.
An even simpler way to put this (and I’m probably not the first one to make this analogy) would be to think of a behaviour-based abstraction as a recipe while data is just a list of ingredients. While the success of the former depends heavily on the human factor, the latter lays this burden mostly on that which executes the program. And needless to say, a computer is much, much less likely to make a mistake than a human is.