Introduction to Broadcast Dictionaries#

This notebook takes a closer look at the broadcasting mechanism of BCDict.

Installation#

BCDict is on PyPI, so you can you pip to install it:

pip install bcdict

Getting started#

Let’s start by creating a simple BCDict:

import bcdict
from bcdict import BCDict
d = BCDict({
    "A": 1,
    "B": 2,
})
d
BCDict({'A': 1, 'B': 2})

A broadcast dictionary allows us to perform operations on all values of the dictionary, and return a new dictionary. For example, arithmetics:

d * 2
BCDict({'A': 2, 'B': 4})

Comparisons:

d > 1.5
BCDict({'A': False, 'B': True})

Note: the equality operator == works the same way as the equality operator on the python built-in dict class. Not knowing this, it may lead to seemingly unexpected behavior, because it does not return a dictionary:

d == 2
False

To check for element-wise equality/inequality with broadcast support you can use eq() and ne() functions:

d.eq(2)
BCDict({'A': False, 'B': True})
d.ne(2)
BCDict({'A': True, 'B': False})

Function calls and attribute access#

It possible to call functions on a BCDict, which are called on each value separately.

Let’s first create a string dict:

s = BCDict({
    "A": "Hello {}",
    "B": "{} World!",
})
s
BCDict({'A': 'Hello {}', 'B': '{} World!'})

No we can capitalize all values like this:

s.upper()
BCDict({'A': 'HELLO {}', 'B': '{} WORLD!'})

This is equivalent to:

{k: v.upper() for k, v in s.items()}
{'A': 'HELLO {}', 'B': '{} WORLD!'}

We can also supply arguments:

s.format("X")
BCDict({'A': 'Hello X', 'B': 'X World!'})

Again, this is equivalent to:

{k: v.format("X") for k, v in s.items()}
{'A': 'Hello X', 'B': 'X World!'}

Broadcasting arguments#

If we pass a dictionary with the same keys as s, its values are broadcast to the function calls:

arg = {"A": "Louis", "B": "What a wonderful"}
s.format(arg)
BCDict({'A': 'Hello Louis', 'B': 'What a wonderful World!'})

Broadcasting works if the key set of the BCDicts are identical.

You can mix broadcastable and non-broadcastable arguments.

Applying functions#

Above, we have called functions directly on the BCDict, and in extension, on its values.

We can also call other functions on the dictionary.

Let’s first create a simple function:

def do_math(a, b, c):
    return a * b + c
# quick test
do_math(3, 2, 1)  # 3 * 2 + 1
7

Now let’s take our BCDict d again:

d = BCDict({
    "A": 1,
    "B": 2,
})
d
BCDict({'A': 1, 'B': 2})

With the pipe function we can pipe the values of d through the function. We can also supply addditional arguments:

d.pipe(do_math, 2, 1)
BCDict({'A': 3, 'B': 5})

This passes the values of d to the do_math() function as its first argument, like so:

{k: do_math(v, 2, 1) for k, v in d.items()}
{'A': 3, 'B': 5}

What if we don’t want to use d as the first argument?

Use bcdict.apply() from the the bcdict module:

bcdict.apply(do_math, 3, d, 1)
BCDict({'A': 4, 'B': 7})

The first argument to apply() is a callable. The remaining arguments are passed to the function.

The first BCDict in the argument list determines the structure of the output dict. In this case, the output has the same keys as d.

The above is equivalent to:

{k: do_math(3, v, 1) for k, v in d.items()}
{'A': 4, 'B': 7}

Initializing dictionary from list of keys#

Sometimes you have a list of keys, and want to create a dictionary from calling a function.

Below we initialize a dictionary with random values for each key:

import random
keys = [1, 2]
bcdict.bootstrap(keys, random.random)
BCDict({1: 0.06810319085373251, 2: 0.8278418945612377})

Of course, you can also pass arguments to bootstrap. If applicable, they are broadcast:

upper_limits = {1: 23, 2: 42}
bcdict.bootstrap(keys, random.randint, 7, upper_limits)
BCDict({1: 18, 2: 7})

That’s equivalent to (different output because of randomness):

{k: random.randint(7, upper_limits[k]) for k in keys}
{1: 9, 2: 30}

You can also pass the key itself to the function with broadcast_arg or broadcast_kwarg:

bcdict.bootstrap_arg(keys, do_math, 2, 3)
BCDict({1: 5, 2: 7})

Equivalent to:

{k: do_math(k, 2, 3) for k in keys}
{1: 5, 2: 7}
bcdict.bootstrap_kwarg(keys, do_math, 2, 3, argname="c")
BCDict({1: 7, 2: 8})
{k: do_math(2, 3, c=k) for k in keys}
{1: 7, 2: 8}

bootstrap is handy for initializing the dictionary with random or default values.

bootstrap_arg and bootsrap_kwarg are handy for example for initializing data from a function or even an API call.

Broadcast slicing#

Let’s take our dict of strings again and demonstrate some slicing:

s = BCDict({
    "A": "Hello",
    "B": "World!",
})
s
BCDict({'A': 'Hello', 'B': 'World!'})
s[1]
BCDict({'A': 'e', 'B': 'o'})
char = {"A": 3, "B": 4}
s[char]
BCDict({'A': 'l', 'B': 'd'})

This works the same way with pandas DataFrames etc. So you can select columns and slice dictionaries of DataFrames intuitively.

Broadcast arithmetics#

Arithmetic operations and comparisons also support broadcasting:

d = BCDict({'A': 1, 'B': 4})
fac = BCDict({'A': 2, 'B': 3})
d * fac
BCDict({'A': 2, 'B': 12})
d > fac
BCDict({'A': False, 'B': True})
d.eq(fac)
BCDict({'A': False, 'B': False})

Naming conflicts#

The rules for slicing vs. attribute access are as such:

If there is a key with the value in brackets [] it is returned. Else, it is broadcast, and the values are sliced with it.

s = BCDict({
    0: "Hello",
    1: "World!",
})
s
BCDict({0: 'Hello', 1: 'World!'})
s[3]  # return the 4th letter of each value in dict
BCDict({0: 'l', 1: 'l'})
s[0]  # return value of key `0`
'Hello'

What if you want to slice with a value that is also a key? Use the attribute accessor .a:

s.a[0]  # return first letter of each value
BCDict({0: 'H', 1: 'W'})

This also works for attributes and functions:

s = BCDict({
    "upper": "Hello",
    "lower": "World!",
})
s
BCDict({'upper': 'Hello', 'lower': 'World!'})
s.__len__()  # length of the dictionary
2
s.a.__len__()  # length of each value
BCDict({'upper': 5, 'lower': 6})

Next steps#

You are now ready to use the Broadcast Dictionary package.

If you have any question, you can always get in touch via Github: mariushelf/bcdict.

For a full, actually useful example, check how to train and validate 3 models on three datasets without a single for loop.