Daily Coding Problem 3

Daily Coding Problem 3

Daily Coding Problem 3

Problem Statement

This problem was asked by Google.

Given the root to a binary tree, implement serialize(root), which serializes the tree into a string, and deserialize(s), which deserializes the string back into the tree.

For example, given the following Node class

class Node:
    def __init__(self, val, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

The following test should pass:

node = Node('root', Node('left', Node('left.left')), Node('right'))
assert deserialize(serialize(node)).left.left.val == 'left.left'

Solution

Each solution requires two algorithms, serialize and deserialize.

First, let's pull in the Nuget packages for the solution.

In [1]:
// Easy way to work with any Nuget packages. Standard packages to load, regardless of solution.
#load "Paket.fsx"
Paket.Version [
    "FSharp.Core", "4.5.4"
    "FParsec", "1.0.4-RC3"
]
Paket.Package [ 
    "Expecto"
    "Xplot.Plotly"
    "XPlot.GoogleCharts"
    "Newtonsoft.Json"
    "Chiron"
]
#load "Paket.Generated.Refs.fsx"
#load "XPlot.GoogleCharts.fsx"

// Load the DLLs
#r "packages/Expecto/lib/net461/Expecto.dll"
#r "packages/FParsec/lib/portable-net45+win8+wp8+wpa81/FParsec.dll"
#r "packages/FParsec/lib/portable-net45+win8+wp8+wpa81/FParsecCS.dll"
#r "packages/Newtonsoft.Json/lib/net45/Newtonsoft.Json.dll"
#r "packages/Chiron/lib/net452/Chiron.dll"

/// Quick debugging
let tee f x = f x |> ignore; x

Strategy Part 1 - Pick the correct data model

The strategy is to provide behavior in F# that is similar to Python (at least, as much as possible given the only example). Since val is a keyword in F#, using it as a field name required wrapping it in 2 back-ticks, aka ``val``. Or renaming it to something like value.

The key here, in my mind, is to pick the appropriate data type to hold the tree. Some choices include:

  1. Discriminated union
  2. Record type
  3. Class
In [2]:
open System

module Model = 
    // Need to define static members to allow Chiron to work
    open Chiron
    open Chiron.Operators

    /// Tree as discriminated union or choice type
    type DUNode = 
    | Node of string * DUNode option * DUNode option
    with
        static member create(v, l, r) = Node(v, l, r)
        static member create(v, l) = DUNode.create(v, l, None)
        static member createOpt(v, l, r) = DUNode.create(v, l, r) |> Some
        static member createOpt(v, l) = DUNode.create(v, l, None) |> Some
        static member createOpt(v) = DUNode.create(v, None, None) |> Some
        member x.``val`` = match x with Node(v, l, r) -> v
        member x.left = match x with Node(_, l, _) -> l
        member x.right = match x with Node(_, _, r) -> r
        // Chiron functions
        static member ToJson (Node(v, l, r)) = Json.write "dunode" (v, l, r)
        static member FromJson (_:DUNode) = Json.read "dunode" |> Json.map Node 

    /// Tree as record type
    type RNode = {
        ``val``: string
        left: RNode option
        right: RNode option
    } with
        // Convenience methods to construct the record
        static member create(v, l, r) = {``val``=v; left=l; right=r}
        static member createOpt(v, l, r) = RNode.create(v, l, r) |> Some
        static member createOpt(v, l) = RNode.create(v, l, None) |> Some
        static member createOpt(v) = RNode.create(v, None, None) |> Some
        // Chiron functions
        static member ToJson (node:RNode) =
            json {
                do! Json.write "rval" node.``val``

                match node.left with
                | Some(l) -> do! Json.write "rleft" l
                | None -> do! Json.writeNone "rleft"

                match node.right with
                | Some(r) -> do! Json.write "rright" r
                | None -> do! Json.writeNone "rright"
            }
        static member FromJson (_:RNode) =
            json {
                let! v = Json.read "rval"
                let! l = Json.read "rleft"
                let! r = Json.read "rright"
                return RNode.create(v, l, r)
            }

    /// Tree as class type
    type CNode(``val``:string, left:CNode option, right:CNode option) =
        // Save the values from the constructor
        member this.``val`` = ``val``
        member this.left = left
        member this.right = right
        // Short-hand constructors, but as static members
        static member create(v, l, r) = CNode(v, l, r)
        static member createOpt(v, l, r) = CNode.create(v, l, r) |> Some
        static member createOpt(v, l) = CNode.create(v, l, None) |> Some
        static member createOpt(v) = CNode.create(v, None, None) |> Some
        // Chiron functions
        static member ToJson (node:CNode) = 
            json {
                do! Json.write "cval" node.``val``

                match node.left with
                | Some(l) -> do! Json.write "cleft" l
                | None -> do! Json.writeNone "cleft"

                match node.right with
                | Some(r) -> do! Json.write "cright" r
                | None -> do! Json.writeNone "cright"
            }
        static member FromJson (_:CNode) = 
            json {
                let! v = Json.read "cval"
                let! l = Json.read "cleft"
                let! r = Json.read "cright"
                return CNode(v, l, r)
            }
        // Expecto `equals` comparison requires overrides for classes
        // Using trick from https://stackoverflow.com/a/38932144 
        // More on tuple equality is at https://fsharpforfunandprofit.com/posts/tuples/ 
        override x.GetHashCode() = hash(x.``val``, x.left, x.right)
        override x.Equals(b) =
            match b with
            | :? CNode as c -> x.GetHashCode() = c.GetHashCode()
            | _ -> false 

Strategy Part 2 - Figure out how to Serialize and Deserialize

The next question is this - how to serialize and deserialize the object into a string. The problem doesn't specify the output string's format - so it could be XML, JSON, etc. or something custom.

Here are three ideas:

  1. Use an auto-JSON serializer and deserializer, like Json.NET.
  2. Use a manual JSON serializer and deserializer, like Chiron.
  3. Use a custom serializer and deserializer, using something like FParsec.

Automatic Serialization using Json.NET

First, we write the functions for Automatic Serialization using Json.NET.

In [3]:
/// Uses Json.NET
module AutoSerializer = 
    open Newtonsoft.Json
    // Convenience methods
    let deserialize<'a> str = JsonConvert.DeserializeObject<'a> str
    
    // DUNode
    let serializeDUNode (n:Model.DUNode) = JsonConvert.SerializeObject n
    let deserializeDUNode = deserialize<Model.DUNode> 
    
    // RNode
    let serializeRNode (n:Model.RNode) = JsonConvert.SerializeObject n
    let deserializeRNode = deserialize<Model.RNode>
    
    // CNode
    let serializeCNode (n:Model.CNode) = JsonConvert.SerializeObject n
    let deserializeCNode = deserialize<Model.CNode>

Manual Serialization using Chiron

Second, we have the functions for Manual Serialization using Chiron. Please note that to use Chiron, the static functions ToJson and FromJson were added to each of the types (see above).

In [4]:
/// Uses Chiron 
module ManualSerializer = 
    open Chiron
    
    // DUNode
    let serializeDUNode (n:Model.DUNode) =
        Json.serialize n
        |> Json.formatWith Chiron.Formatting.JsonFormattingOptions.Compact
    let deserializeDUNode str : Model.DUNode = str |> Json.parse |> Json.deserialize
    
    // RNode
    let serializeRNode (n:Model.RNode) =
        Json.serialize n
        |> Json.formatWith Chiron.Formatting.JsonFormattingOptions.Compact
    let deserializeRNode str : Model.RNode = str |> Json.parse |> Json.deserialize
    
    // CNode
    let serializeCNode (n:Model.CNode) =
        Json.serialize n
        |> Json.formatWith Chiron.Formatting.JsonFormattingOptions.Compact
    let deserializeCNode str : Model.CNode = str |> Json.parse |> Json.deserialize

Custom Serialization using FParsec

Finally, we have the functions for Custom Serialization using FParsec. The strategy here is to serialize nodes with values as Some("val") Child1 Child2 or as None. We will also be using a pre-order traversal of the tree, i.e. process the parent node, then the left child, and finally the right child.

In [5]:
module CustomSerializer = 
    /// Pre-order traversal for serialization, not tail-recursive.
    /// I am using SRTPs here to avoid having to write identical code for the 3 data types.
    /// Advantage of SRTP over higher order function here is that the record's field accessor
    /// matches the `member` constraint - not sure how to do that with functions without 
    /// having to declare a custom member function for each of the record's fields.
    let inline serializeNode (node:^a) =
        let rec helper (node:^a) = 
            let lans = 
                match (^a:(member left : ^a option) (node)) with
                | Some l -> helper l
                | None -> "None"
            let rans = 
                match (^a:(member right : ^a option) (node)) with
                | Some r -> helper r 
                | None -> "None"
            sprintf "Some(\"%s\") %s %s" (^a:(member ``val`` : string) (node)) lans rans
        helper node

    // Why not go all out?! This is my first time trying FParsec.  FParsec will be used
    // to deserialize the string back into the appropriate type.
    open FParsec
    
    /// Pre-order traversal for deserialization
    /// Here, using a higher-order function to avoid SRTP.
    let inline deserializeNode (ctor:string * 'a option * 'a option -> 'a) (str:string) : 'a = 
        // Basic parser based on our serialization strategy
        // We don't allow double-quotes inside strings - KISS
        let properString = 
            pipe3 (pstring "\"") (manySatisfy (fun x -> x <> '\"')) (pstring "\"")
                  (fun x y z -> x + y + z)
        let someValue = spaces >>. pstring "Some(" >>. properString .>> (pstring ")") .>> spaces
        let noneValue = spaces >>. pstring "None" .>> spaces
        let fullParser = many (someValue <|> noneValue)
        
        /// Parser will tokenize the string and extract the values we care about
        /// If it is a real value, it will retain the quotes as part of the string: "value"
        /// If it is not a value, it will look like: None
        /// This is important because the value itself could be the string "None", and we 
        /// must distinguish this from an actual non-value.
        let tokens = 
            run fullParser str
            |> function
                | Success(res,_,_) -> res
                | Failure(err,_,_) -> failwith err
        
        /// Recursive deserializer, not tail recursive. Returns the deserialized value 
        /// and the remaining tokens.
        let rec deser (remTokens:string list) : 'a option * string list = 
            match remTokens with
            | hd::tl when hd.StartsWith("N") -> 
                None, tl
            | hd::tl when hd.StartsWith("\"") -> 
                // Remove the double-quotes from the value
                let v = hd.[1 .. String.length hd - 2]
                let ln, remAfterLeft = deser tl
                let rn, remAfterRight = deser remAfterLeft
                Some (ctor (v, ln, rn)), remAfterRight
            | _ -> failwith "Tree structure is corrupted"
        
        // Call the deserializer, then return the data structure
        // NOTE: Option.get should never fail because, if you start from a tree object
        // instead of some contrived string representation, it is impossible to 
        // create a tree without a root.
        deser tokens
        |> fst 
        |> Option.get
        
    // DUNode
    let serializeDUNode (n:Model.DUNode) = serializeNode n
    let deserializeDUNode str = deserializeNode (Model.DUNode.create) str

    // RNode
    let serializeRNode (n:Model.RNode) = serializeNode n
    let deserializeRNode str = deserializeNode (Model.RNode.create) str
    
    // CNode
    let serializeCNode (n:Model.CNode) = serializeNode n
    let deserializeCNode str = deserializeNode (Model.CNode.create) str
    
Model.DUNode.create("root", Model.DUNode.createOpt("1 left"))
|> tee (printfn "I start as a DUNode:%s%O" Environment.NewLine)
|> CustomSerializer.serializeDUNode
|> tee (printfn "serialized: %O")
|> CustomSerializer.deserializeDUNode
|> tee (printfn "Back to a DUNode:%s%O" Environment.NewLine)
|> CustomSerializer.serializeDUNode
|> tee (printfn "serialized: %O")
|> CustomSerializer.deserializeRNode
|> tee (printfn "Now I'm a RNode:%s%O" Environment.NewLine)
|> CustomSerializer.serializeRNode
|> tee (printfn "serialized: %O")
|> CustomSerializer.deserializeCNode
|> tee (printfn "And now I'm a CNode:%s%O" Environment.NewLine)
Out[5]:
FSI_0009+Model+CNode
I start as a DUNode:
Node ("root",Some (Node ("1 left",None,None)),None)
serialized: Some("root") Some("1 left") None None None
Back to a DUNode:
Node ("root",Some (Node ("1 left",None,None)),None)
serialized: Some("root") Some("1 left") None None None
Now I'm a RNode:
{val = "root";
 left = Some {val = "1 left";
              left = None;
              right = None;};
 right = None;}
serialized: Some("root") Some("1 left") None None None
And now I'm a CNode:
FSI_0009+Model+CNode

Strategy Part 3 - Make sure the De-/Serialization strategies work as expected

Now that we have the 3 sets of de-/serializers written for each of the 3 data types, let's write some basic unit tests to ensure that the functions are behaving as expected.

NOTE: I cannot seem to get FsCheck to run in IfSharp / Jupyter. If anyone has ideas on how to make it work, please let me know!

First, let's setup some common parameters.

In [6]:
// For random tree generation
let random = Random((int) DateTime.Now.Ticks &&& 0x0000FFFF)
let mutable randLock = Object()

/// Convenience method to generate random numbers safely.
let getNextRandLim (min, max) = lock randLock (fun () -> random.Next (min, max))
let getNextRand () = lock randLock (fun () -> random.Next ())

module Test = 
    // Common parameters for the tests
    let numTests = 150 // How many test to run to check results
    let minDepth = 5 // Some complexity
    let maxDepth = 10 // To keep runtimes reasonable
    let percentNones = 10 // 10% of the time (1 out of 10), we can get a None instead of a Some
                          // Allows for unbalanced trees
    let minValueLength = 10 // Minimum length of ``val``
    let maxValueLength = 20 // Maximum length of ``val``
    let rootValue = "root" // ``val`` for root node.
    let nameGen() : string = // Generates a ``val`` for the node
        let len = getNextRandLim(minValueLength, maxValueLength)
        seq {
            for i in 1 .. len do
                let isCapital = getNextRand() % 2
                if isCapital = 0 then yield (getNextRandLim(65, 90) |> char |> string)
                else yield (getNextRandLim(97, 122) |> char |> string)
        }
        |> Seq.fold (fun acc e -> acc + e) ""

    // Generate test cases, limiting depth and complexity
    let CustomTreeGenerator (ctor:string * 'a option * 'a option -> 'a option) : 'a = 
        let rec helper depth = 
            match depth with
            | 0 -> None
            | n when n>0 -> 
                let choice = getNextRand() % percentNones // None or Some
                match choice with 
                | 0 -> None
                | _ -> ctor(nameGen(), helper <| depth - 1, helper <| depth - 1)
            | _ -> invalidArg "depth" (sprintf "Depth is %d but must be >= 0" depth)
        ctor(rootValue, 
            helper <| getNextRandLim(minDepth, maxDepth), 
            helper <| getNextRandLim(minDepth, maxDepth))
        |> Option.get

Equality Tests

Let's ensure that our equality tests are working correctly, especially for CNodes because we had to override Equals and GetHashCode.

In [7]:
module EqualityTest =
    open Expecto
    
    type DU = Model.DUNode
    type R = Model.RNode
    type C = Model.CNode
    
    let tests =
        testList "Equality tests" <|
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "Confirm a DUNode tree is equal to itself %i" i) {
                    let tree = Test.CustomTreeGenerator DU.createOpt
                    Expect.equal tree tree (sprintf "%O" tree)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "Confirm a RNode tree is equal to itself %i" i) {
                    let tree = Test.CustomTreeGenerator R.createOpt
                    Expect.equal tree tree (sprintf "%O" tree)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "Confirm a CNode tree is equal to itself %i" i) {
                    let tree = Test.CustomTreeGenerator C.createOpt
                    Expect.equal tree tree (sprintf "%O" tree)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "Confirm 2 different DUNode trees are not equal %i" i) {
                    let tree = 
                        Test.CustomTreeGenerator DU.createOpt
                        |> fun (DU.Node(v, l, r)) -> DU.create("1", l, r)
                    let tree2 = 
                        Test.CustomTreeGenerator DU.createOpt
                        |> fun (DU.Node(v, l, r)) -> DU.create("2", l, r)
                    Expect.notEqual tree tree2 (sprintf "%O %O" tree tree2)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "Confirm 2 different RNode trees are not equal %i" i) {
                    let tree = 
                        Test.CustomTreeGenerator R.createOpt
                        |> fun t -> { t with ``val`` = "1" }
                    let tree2 = 
                        Test.CustomTreeGenerator R.createOpt
                        |> fun t -> { t with ``val`` = "2" }
                    Expect.notEqual tree tree2 (sprintf "%O %O" tree tree2)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "Confirm 2 different CNode trees are not equal %i" i) {
                    let tree = 
                        Test.CustomTreeGenerator C.createOpt
                        |> fun t -> C.create("1", t.left, t.right)
                    let tree2 = 
                        Test.CustomTreeGenerator C.createOpt
                        |> fun t -> C.create("2", t.left, t.right)
                    Expect.notEqual tree tree2 (sprintf "%O %O" tree tree2)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "Confirm 2 similar DUNode trees are not equal %i" i) {
                    let tr = Test.CustomTreeGenerator DU.createOpt
                    let tree = tr |> fun (DU.Node(v, l, r)) -> DU.create("1", l, r)
                    let tree2 = tr |> fun (DU.Node(v, l, r)) -> DU.create("2", l, r)
                    Expect.notEqual tree tree2 (sprintf "%O %O" tree tree2)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "Confirm 2 similar RNode trees are not equal %i" i) {
                    let tr = Test.CustomTreeGenerator R.createOpt
                    let tree = tr |> fun t -> { t with ``val`` = "1" }
                    let tree2 = tr |> fun t -> { t with ``val`` = "2" }
                    Expect.notEqual tree tree2 (sprintf "%O %O" tree tree2)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "Confirm 2 similar CNode trees are not equal %i" i) {
                    let tr = Test.CustomTreeGenerator C.createOpt
                    let tree = tr |> fun t -> C.create("1", t.left, t.right)
                    let tree2 = tr |> fun t -> C.create("2", t.left, t.right)
                    Expect.notEqual tree tree2 (sprintf "%O %O" tree tree2)
                }
            ]

    runTests { defaultConfig with ``parallel`` = true } tests
[03:47:23 INF] EXPECTO? Running tests... <Expecto>
[03:48:46 INF] EXPECTO! 1,350 tests run in 00:01:23.7584329 for Equality tests1,350 passed, 0 ignored, 0 failed, 0 errored. Success! <Expecto>

Automatic Serialization using Json.NET

For our tests, we need to ensure that the 1 original test case works, and that random test cases also work correctly.

In [8]:
module AutoSerializerTest =
    open Expecto
    open AutoSerializer

    type DU = Model.DUNode
    type R = Model.RNode
    type C = Model.CNode
    
    /// Basic brute-force, randomized tests to ensure the logic works correctly
    let tests =
        testList "Auto Serializer tests" <|
            test "DU provided tests" {
                let tree = 
                    DU.create("root", DU.createOpt("left", DU.createOpt("left.left")), DU.createOpt("right"))
                Expect.equal (tree |> serializeDUNode |> deserializeDUNode) tree (sprintf "%O" tree)
                Expect.equal "left.left" tree.left.Value.left.Value.``val`` "tree's left.left = left.left"
            } ::
            test "R provided tests" {
                let tree = 
                    R.create("root", R.createOpt("left", R.createOpt("left.left")), R.createOpt("right"))
                Expect.equal (tree |> serializeRNode |> deserializeRNode) tree (sprintf "%O" tree)
                Expect.equal "left.left" tree.left.Value.left.Value.``val`` "tree's left.left = left.left"
            } ::
            test "C provided tests" {
                let tree = 
                    C.create("root", C.createOpt("left", C.createOpt("left.left")), C.createOpt("right"))
                Expect.equal (tree |> serializeCNode |> deserializeCNode) tree (sprintf "%O" tree)
                Expect.equal "left.left" tree.left.Value.left.Value.``val`` "tree's left.left = left.left"
            } ::
            [ for i in 1 .. Test.numTests do 
                yield test (sprintf "DUNode generated tests %i" i) {
                    let tree = Test.CustomTreeGenerator DU.createOpt
                    Expect.equal (tree |> serializeDUNode |> deserializeDUNode) tree (sprintf "%O" tree)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "RNode generated tests %i" i) {
                    let tree = Test.CustomTreeGenerator R.createOpt
                    Expect.equal (tree |> serializeRNode |> deserializeRNode) tree (sprintf "%O" tree)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "CNode generated tests %i" i) {
                    let tree = Test.CustomTreeGenerator C.createOpt
                    Expect.equal (tree |> serializeCNode |> deserializeCNode) tree (sprintf "%O" tree)
                }
            ]
            
    runTests { defaultConfig with ``parallel`` = true } tests
[03:48:47 INF] EXPECTO? Running tests... <Expecto>
[03:49:10 INF] EXPECTO! 453 tests run in 00:00:23.9313535 for Auto Serializer tests453 passed, 0 ignored, 0 failed, 0 errored. Success! <Expecto>

Manual Serialization using Chiron

In [9]:
module ManualSerializerTest =
    open Expecto
    open ManualSerializer

    type DU = Model.DUNode
    type R = Model.RNode
    type C = Model.CNode
    
    /// Basic brute-force, randomized tests to ensure the logic works correctly
    let tests =
        testList "Auto Serializer tests" <|
            test "DU provided tests" {
                let tree = 
                    DU.create("root", DU.createOpt("left", DU.createOpt("left.left")), DU.createOpt("right"))
                Expect.equal (tree |> serializeDUNode |> deserializeDUNode) tree (sprintf "%O" tree)
                Expect.equal "left.left" tree.left.Value.left.Value.``val`` "tree's left.left = left.left"
            } ::
            test "R provided tests" {
                let tree = 
                    R.create("root", R.createOpt("left", R.createOpt("left.left")), R.createOpt("right"))
                Expect.equal (tree |> serializeRNode |> deserializeRNode) tree (sprintf "%O" tree)
                Expect.equal "left.left" tree.left.Value.left.Value.``val`` "tree's left.left = left.left"
            } ::
            test "C provided tests" {
                let tree = 
                    C.create("root", C.createOpt("left", C.createOpt("left.left")), C.createOpt("right"))
                Expect.equal (tree |> serializeCNode |> deserializeCNode) tree (sprintf "%O" tree)
                Expect.equal "left.left" tree.left.Value.left.Value.``val`` "tree's left.left = left.left"
            } ::
            [ for i in 1 .. Test.numTests do 
                yield test (sprintf "DUNode generated tests %i" i) {
                    let tree = Test.CustomTreeGenerator DU.createOpt
                    Expect.equal (tree |> serializeDUNode |> deserializeDUNode) tree (sprintf "%O" tree)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "RNode generated tests %i" i) {
                    let tree = Test.CustomTreeGenerator R.createOpt
                    Expect.equal (tree |> serializeRNode |> deserializeRNode) tree (sprintf "%O" tree)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "CNode generated tests %i" i) {
                    let tree = Test.CustomTreeGenerator C.createOpt
                    Expect.equal (tree |> serializeCNode |> deserializeCNode) tree (sprintf "%O" tree)
                }
            ]
            
    runTests { defaultConfig with ``parallel`` = true } tests
[03:49:11 INF] EXPECTO? Running tests... <Expecto>
[03:49:29 INF] EXPECTO! 453 tests run in 00:00:18.1177119 for Auto Serializer tests453 passed, 0 ignored, 0 failed, 0 errored. Success! <Expecto>

Custom Serialization using FParsec

In [10]:
module CustomSerializerTest =
    open Expecto
    open CustomSerializer

    type DU = Model.DUNode
    type R = Model.RNode
    type C = Model.CNode
    
    /// Basic brute-force, randomized tests to ensure the logic works correctly
    let tests =
        testList "Custom Serializer tests" <|
            test "DU provided tests" {
                let tree = 
                    DU.create("root", DU.createOpt("left", DU.createOpt("left.left")), DU.createOpt("right"))
                Expect.equal (tree |> serializeDUNode |> deserializeDUNode) tree (sprintf "%O" tree)
                Expect.equal "left.left" tree.left.Value.left.Value.``val`` "tree's left.left = left.left"
            } ::
            test "R provided tests" {
                let tree = 
                    R.create("root", R.createOpt("left", R.createOpt("left.left")), R.createOpt("right"))
                Expect.equal (tree |> serializeRNode |> deserializeRNode) tree (sprintf "%O" tree)
                Expect.equal "left.left" tree.left.Value.left.Value.``val`` "tree's left.left = left.left"
            } ::
            test "C provided tests" {
                let tree = 
                    C.create("root", C.createOpt("left", C.createOpt("left.left")), C.createOpt("right"))
                Expect.equal (tree |> serializeCNode |> deserializeCNode) tree (sprintf "%O" tree)
                Expect.equal "left.left" tree.left.Value.left.Value.``val`` "tree's left.left = left.left"
            } ::
            [ for i in 1 .. Test.numTests do 
                yield test (sprintf "DUNode generated tests %i" i) {
                    let tree = Test.CustomTreeGenerator DU.createOpt
                    Expect.equal (tree |> serializeDUNode |> deserializeDUNode) tree (sprintf "%O" tree)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "RNode generated tests %i" i) {
                    let tree = Test.CustomTreeGenerator R.createOpt
                    Expect.equal (tree |> serializeRNode |> deserializeRNode) tree (sprintf "%O" tree)
                }
            ] @
            [ for i in 1 .. Test.numTests do
                yield test (sprintf "CNode generated tests %i" i) {
                    let tree = Test.CustomTreeGenerator C.createOpt
                    Expect.equal (tree |> serializeCNode |> deserializeCNode) tree (sprintf "%O" tree)
                }
            ]
            
    runTests { defaultConfig with ``parallel`` = true } tests
[03:49:29 INF] EXPECTO? Running tests... <Expecto>
[03:49:47 INF] EXPECTO! 453 tests run in 00:00:17.7770251 for Custom Serializer tests453 passed, 0 ignored, 0 failed, 0 errored. Success! <Expecto>

Excellent, our logic works!

Now that we have it working, let's try some basic performance tests.

Performance Testing

Here is the strategy for performance testing.

Using random, pre-generated trees, test the following:

  1. Serialize trees (e.g. 1,000 of each model) using each of the 3 serialization methods.
  2. De-serialize those trees using each of the 3 deserialization methods.

Hopefully, this will give us an indication of which combination of de-/serialization method and data structure is the "fastest".

Setup the test data

The first step is to create the trees that will be used for serialization and de-serialization.

In [11]:
module PerformanceTestData =
    type DU = Model.DUNode
    type R = Model.RNode
    type C = Model.CNode
    
    // How many trees to generate
    let dataStructureCount = 2000
    
    // Generate the trees
    let DUTrees = List.init dataStructureCount (fun _ -> Test.CustomTreeGenerator DU.createOpt)
    let RTrees = List.init dataStructureCount (fun _ -> Test.CustomTreeGenerator R.createOpt)
    let CTrees = List.init dataStructureCount (fun _ -> Test.CustomTreeGenerator C.createOpt)

    // To measure each execution
    let stopWatch = System.Diagnostics.Stopwatch()

    // The rows and columns of the result table.
    let rows = [
        "Automatic"
        "Manual"
        "Custom"
    ]
    let columns = ["Discriminated Union"; "Record"; "Class"]
    let cornerText = [@"Algorithm vs. Data Structure"]; 

    // The algorithms, for "somewhat easy" testing
    let duSerializationAlgorithms = 
        [ AutoSerializer.serializeDUNode
          ManualSerializer.serializeDUNode
          CustomSerializer.serializeDUNode ]
    let rSerializationAlgorithms = 
        [ AutoSerializer.serializeRNode
          ManualSerializer.serializeRNode
          CustomSerializer.serializeRNode ]
    let cSerializationAlgorithms = 
        [ AutoSerializer.serializeCNode
          ManualSerializer.serializeCNode
          CustomSerializer.serializeCNode ]
    
    let duDeserializationAlgorithms =
        [ AutoSerializer.deserializeDUNode
          ManualSerializer.deserializeDUNode
          CustomSerializer.deserializeDUNode ]
    let rDeserializationAlgorithms =
        [ AutoSerializer.deserializeRNode
          ManualSerializer.deserializeRNode
          CustomSerializer.deserializeRNode ]
    let cDeserializationAlgorithms =
        [ AutoSerializer.deserializeCNode
          ManualSerializer.deserializeCNode
          CustomSerializer.deserializeCNode ]

Run the Performance Tests

In [12]:
open PerformanceTestData
    
/// Returns an average measurement of 3 consecutive runs.
let measurement numRuns alg lst =
    stopWatch.Reset()
    stopWatch.Start()
    List.init numRuns (fun i -> List.map alg lst)
    |> ignore
    stopWatch.Stop()
    
    TimeSpan.FromTicks(stopWatch.ElapsedTicks / (int64 numRuns)).Ticks
In [13]:
/// Collect the serialization results.
let serializationResults =
    [ yield 
        duSerializationAlgorithms
        |> List.map (fun a -> measurement 3 a DUTrees)
        |> List.zip rows ] @
    [ yield
        rSerializationAlgorithms
        |> List.map (fun a -> measurement 3 a RTrees)
        |> List.zip rows ] @
    [ yield 
        cSerializationAlgorithms
        |> List.map (fun a -> measurement 3 a CTrees)
        |> List.zip rows ]
In [14]:
/// Collect the deserialization results.
let deserializationResults =
    [ let ser = List.map (fun a -> List.map a DUTrees) duSerializationAlgorithms
      yield 
        duDeserializationAlgorithms
        |> List.mapi (fun i a -> measurement 3 a (List.item i ser))
        |> List.zip rows ] @
    [ let ser = List.map (fun a -> List.map a RTrees) rSerializationAlgorithms
      yield
        rDeserializationAlgorithms
        |> List.mapi (fun i a -> measurement 3 a (List.item i ser))
        |> List.zip rows ] @
    [ let ser = List.map (fun a -> List.map a CTrees) cSerializationAlgorithms
      yield 
        cDeserializationAlgorithms
        |> List.mapi (fun i a -> measurement 3 a (List.item i ser))
        |> List.zip rows ]

Plot the Performance Tests

And finally, we can plot the results in a simple table.

Serialization Results

As a data table, showing as hh:mm:ss.ms.

In [15]:
open XPlot.GoogleCharts

serializationResults
|> List.map (fun l -> List.map (fun (x, y) -> (x, TimeSpan.FromTicks(y).ToString("c"))) l)
|> Chart.Table
|> Chart.WithOptions (Options(minColor="green", midColor="yellow", maxColor="red", showRowNumber=true))
|> Chart.WithLabels (cornerText @ columns)
Out[15]:

And as a chart, showing ticks.

In [16]:
serializationResults
|> Chart.Bar
|> Chart.WithOptions (Options())
|> Chart.WithLabels (cornerText @ columns)
Out[16]:

Deserialization Results

Again as a table, with the hh:mm:ss.ms format.

In [17]:
deserializationResults
|> List.map (fun l -> List.map (fun (x, y) -> (x, TimeSpan.FromTicks(y).ToString("c"))) l)
|> Chart.Table
|> Chart.WithOptions (Options(minColor="green", midColor="yellow", maxColor="red", showRowNumber=true))
|> Chart.WithLabels (cornerText @ columns)
Out[17]:

And as a chart, showing ticks.

In [18]:
deserializationResults
|> Chart.Bar
|> Chart.WithOptions (Options())
|> Chart.WithLabels (cornerText @ columns)
Out[18]:

Conclusion

The results are very interesting. From a CPU perspective, FParsec wins hands down regardless of the chosen data structure. There is an order of magnitude difference between the three serialization techniques.

From an SLOC perspective, Json.NET is the clear winner. Serialization and deserialization are both 1 line, each. FParsec is the most complex, requiring the development of a custom de-/serialization format, then ensuring that all 3 data structures can be transformed to that format.

Chiron stays right in the middle, both in terms of performance and SLOC.

The one advantage that both Chiron and FParsec can provide is that, when created the right way, the serialized string can be deserialized into any of the 3 data structures (discriminated union, record, or class). The FParsec implementation above works that way.

Unfortunately, I do not know how to measure memory usage from within IfSharp / Jupyter. So, I can't really speak to that.

From a data structure perspective, records seems to provide very good performance. I was surprised that records performed even better than classes, in general. However, it appears that Json.NET had issues with both discriminated unions and classes, during deserialization and serialization, respectively.

Personally, I am not sure that creating a custom de-/serialization format (aka, going down the FParsec route) is worth the effort, especially given all the edge cases that must be handled. For example, my FParsec implementation cannot handle double-quotes within strings and, I believe (though I didn't test this), cannot handle strings with newline characters in them.

Json.NET and Chiron both use JSON, which is an extremely popular format and enjoys widespread support in development tooling. I believe this is a better route, especially because using such a common, human-readable format would make debugging easier. Maintainability is extremely important and something that is often overlooked.

After testing the system with JSON, it can be shifted to a binary format such as MessagePack, BSON, or Protocol Buffers, among others, if size becomes a concern. Most of these formats have well-tested libraries available in multiple languages.

Comments

  1. Any chance you could try this?
    Manual serialization. A few LOC. It think it will be faster than the options here.
    https://gist.github.com/charlesroddie/2f5cd33a682933b283af2b48ef97e053

    ReplyDelete

Post a Comment

Popular posts from this blog