{ "cells": [ { "cell_type": "markdown", "id": "db6adaae-12e0-4e0d-a456-b922769d1ead", "metadata": {}, "source": [ "# Introduction to Broadcast Dictionaries\n", "\n", "This notebook takes a closer look at the broadcasting mechanism of BCDict." ] }, { "cell_type": "markdown", "id": "a7e791f7-2f49-4318-80ad-fa97c3063cf7", "metadata": {}, "source": [ "## Installation\n", "\n", "BCDict is on PyPI, so you can you `pip` to install it:\n", "\n", "```bash\n", "pip install bcdict\n", "```" ] }, { "cell_type": "markdown", "id": "1fb9818f-54d9-4b5a-8e23-766cd3c8d41f", "metadata": {}, "source": [ "## Getting started\n", "\n", "Let's start by creating a simple BCDict:" ] }, { "cell_type": "code", "execution_count": 112, "id": "4e611df6-7cc1-4e55-8a91-7117a122ff99", "metadata": {}, "outputs": [], "source": [ "import bcdict\n", "from bcdict import BCDict" ] }, { "cell_type": "code", "execution_count": 113, "id": "a6cc766b-9ba8-44f0-a0b9-5bbb914ac08c", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': 1, 'B': 2})" }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = BCDict({\n", " \"A\": 1,\n", " \"B\": 2,\n", "})\n", "d" ] }, { "cell_type": "markdown", "id": "fed4a121-cec6-4d47-ab4b-5746299abdc2", "metadata": {}, "source": [ "A broadcast dictionary allows us to perform operations on all values of the dictionary, and return a new dictionary. For example, arithmetics:" ] }, { "cell_type": "code", "execution_count": 114, "id": "45699f02-c222-4b56-95c6-a4a933be941c", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': 2, 'B': 4})" }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d * 2" ] }, { "cell_type": "markdown", "id": "16e99a73-ee64-42b9-af7e-e07461c1ee7f", "metadata": {}, "source": [ "Comparisons:" ] }, { "cell_type": "code", "execution_count": 115, "id": "27853a42-5978-466a-a709-810a618ab9ae", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': False, 'B': True})" }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d > 1.5" ] }, { "cell_type": "markdown", "id": "befa16a8-6141-4c7a-b92c-e884c15ba429", "metadata": {}, "source": [ "***Note***: the equality operator `==` works the same way as the equality operator on the python built-in `dict` class. Not knowing this, it may lead to seemingly unexpected behavior, because it does *not* return a dictionary:" ] }, { "cell_type": "code", "execution_count": 116, "id": "260f1b8a-89a2-49b8-9e31-42c8324dc875", "metadata": {}, "outputs": [ { "data": { "text/plain": "False" }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d == 2\n" ] }, { "cell_type": "markdown", "source": [ "To check for element-wise equality/inequality with broadcast support you can use `eq()` and `ne()` functions:" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 150, "outputs": [ { "data": { "text/plain": "BCDict({'A': False, 'B': True})" }, "execution_count": 150, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.eq(2)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 149, "outputs": [ { "data": { "text/plain": "BCDict({'A': True, 'B': False})" }, "execution_count": 149, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.ne(2)\n" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "id": "7b23c9e2-b0f2-41a5-a40d-536507dcaa57", "metadata": {}, "source": [ "## Function calls and attribute access" ] }, { "cell_type": "markdown", "id": "34536a31-007e-4561-b2c8-63cc65b9a3b2", "metadata": {}, "source": [ "It possible to call functions on a BCDict, which are called on each value separately.\n", "\n", "Let's first create a string dict:" ] }, { "cell_type": "code", "execution_count": 119, "id": "5fe18771-4c3a-47cc-9db6-8e910b0ba5c7", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': 'Hello {}', 'B': '{} World!'})" }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = BCDict({\n", " \"A\": \"Hello {}\",\n", " \"B\": \"{} World!\",\n", "})\n", "s" ] }, { "cell_type": "markdown", "id": "3985342a-718e-4e03-8af4-bb0da3f320c3", "metadata": {}, "source": [ "No we can capitalize all values like this:" ] }, { "cell_type": "code", "execution_count": 120, "id": "df7315ed-285e-4071-bcef-ffb3324f1ef5", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': 'HELLO {}', 'B': '{} WORLD!'})" }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.upper()" ] }, { "cell_type": "markdown", "id": "29fe5baa-9a31-4a67-9d16-cc214c8df417", "metadata": {}, "source": [ "This is equivalent to:" ] }, { "cell_type": "code", "execution_count": 121, "id": "df67c884-c1f7-41f2-8487-da91b4da47d2", "metadata": {}, "outputs": [ { "data": { "text/plain": "{'A': 'HELLO {}', 'B': '{} WORLD!'}" }, "execution_count": 121, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{k: v.upper() for k, v in s.items()}" ] }, { "cell_type": "markdown", "id": "dfe850ec-db66-48df-a149-f301fb7e8e25", "metadata": {}, "source": [ "We can also supply arguments:" ] }, { "cell_type": "code", "execution_count": 122, "id": "3f80e203-b906-48df-a6d7-f241d731174b", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': 'Hello X', 'B': 'X World!'})" }, "execution_count": 122, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.format(\"X\")" ] }, { "cell_type": "markdown", "id": "fb47e0cd-dc71-4030-bb57-bd9da1adf394", "metadata": {}, "source": [ "Again, this is equivalent to:" ] }, { "cell_type": "code", "execution_count": 123, "id": "ce183e8c-fcf6-4a3e-a813-fe741bd63775", "metadata": {}, "outputs": [ { "data": { "text/plain": "{'A': 'Hello X', 'B': 'X World!'}" }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{k: v.format(\"X\") for k, v in s.items()}" ] }, { "cell_type": "markdown", "id": "bbee7090-46b1-47f3-8a71-80b7363dc7c9", "metadata": {}, "source": [ "## Broadcasting arguments\n", "\n", "If we pass a dictionary with the same keys as `s`, its values are *broadcast* to the function calls:" ] }, { "cell_type": "code", "execution_count": 124, "id": "658bd484-0dc8-4e1d-87a8-f3264c6cf63b", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': 'Hello Louis', 'B': 'What a wonderful World!'})" }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "arg = {\"A\": \"Louis\", \"B\": \"What a wonderful\"}\n", "s.format(arg)" ] }, { "cell_type": "markdown", "id": "90b56516-e87d-4ed9-a4bd-252626f30423", "metadata": {}, "source": [ "Broadcasting works if the key set of the BCDicts are identical.\n", "\n", "You can mix broadcastable and non-broadcastable arguments." ] }, { "cell_type": "markdown", "id": "3c0e6de8-bffb-4ec4-87e1-d5f99636d084", "metadata": {}, "source": [ "## Applying functions" ] }, { "cell_type": "markdown", "id": "aba1e071-04c5-4fae-a256-ad386b629413", "metadata": {}, "source": [ "Above, we have called functions directly on the BCDict, and in extension, on its values.\n", "\n", "We can also call other functions on the dictionary.\n", "\n", "Let's first create a simple function:" ] }, { "cell_type": "code", "execution_count": 125, "id": "7684d456-f37d-420d-87b1-75d64627fba6", "metadata": {}, "outputs": [], "source": [ "def do_math(a, b, c):\n", " return a * b + c" ] }, { "cell_type": "code", "execution_count": 126, "id": "437a8032-8614-4225-97eb-8d925a13b4ec", "metadata": {}, "outputs": [ { "data": { "text/plain": "7" }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# quick test\n", "do_math(3, 2, 1) # 3 * 2 + 1" ] }, { "cell_type": "markdown", "id": "926da9ea-999b-437c-bd2f-35007df665aa", "metadata": {}, "source": [ "Now let's take our BCDict `d` again:" ] }, { "cell_type": "code", "execution_count": 127, "id": "90f7f047-dc36-45d7-80ca-36e61cdb3459", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': 1, 'B': 2})" }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = BCDict({\n", " \"A\": 1,\n", " \"B\": 2,\n", "})\n", "d" ] }, { "cell_type": "markdown", "id": "f5f84ad1-fa63-45ef-90ef-29c7dd90005a", "metadata": {}, "source": [ "With the `pipe` function we can *pipe* the values of `d` through the function. We can also supply addditional arguments:" ] }, { "cell_type": "code", "execution_count": 128, "id": "dbf25d21-0422-4438-a665-4cd7c41b2867", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': 3, 'B': 5})" }, "execution_count": 128, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.pipe(do_math, 2, 1)" ] }, { "cell_type": "markdown", "id": "d8e4aa1c-6428-455d-aa08-f6d9617d5ffa", "metadata": {}, "source": [ "This passes the values of `d` to the `do_math()` function as its first argument, like so:" ] }, { "cell_type": "code", "execution_count": 129, "id": "9e47c395-7c81-4f6e-aedc-911a41b73f85", "metadata": {}, "outputs": [ { "data": { "text/plain": "{'A': 3, 'B': 5}" }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{k: do_math(v, 2, 1) for k, v in d.items()}" ] }, { "cell_type": "markdown", "id": "d88c58bf-078d-4531-8969-1cf5c8f8d284", "metadata": {}, "source": [ "What if we don't want to use `d` as the *first* argument?\n", "\n", "Use `bcdict.apply()` from the the `bcdict` module:" ] }, { "cell_type": "code", "execution_count": 130, "id": "c9f0370a-b0c7-42aa-84f5-114d7aeb325f", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': 4, 'B': 7})" }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bcdict.apply(do_math, 3, d, 1)" ] }, { "cell_type": "markdown", "id": "f2ca442f-9af7-4e3b-8fc6-4223500f061a", "metadata": {}, "source": [ "The first argument to `apply()` is a callable. The remaining arguments are passed to the function.\n", "\n", "The first BCDict in the argument list determines the structure of the output dict. In this case, the output has the same keys as `d`.\n", "\n", "The above is equivalent to:" ] }, { "cell_type": "code", "execution_count": 131, "id": "d1054615-6e05-4051-87e2-96231cf245f5", "metadata": {}, "outputs": [ { "data": { "text/plain": "{'A': 4, 'B': 7}" }, "execution_count": 131, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{k: do_math(3, v, 1) for k, v in d.items()}" ] }, { "cell_type": "markdown", "id": "6c260073-558c-4a4d-a5e2-ac98d0910465", "metadata": {}, "source": [ "## Initializing dictionary from list of keys\n", "\n", "Sometimes you have a list of keys, and want to create a dictionary from calling a function.\n", "\n", "Below we initialize a dictionary with random values for each key:" ] }, { "cell_type": "code", "execution_count": 132, "id": "7a9a9d59-6ffe-4030-8b4b-520e1e8d4101", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({1: 0.38299690337770986, 2: 0.8085966542678023})" }, "execution_count": 132, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import random\n", "keys = [1, 2]\n", "bcdict.bootstrap(keys, random.random)" ] }, { "cell_type": "markdown", "id": "6cc97865-a459-41df-bffd-3f3bebf19602", "metadata": {}, "source": [ "Of course, you can also pass arguments to `bootstrap`. If applicable, they are broadcast:" ] }, { "cell_type": "code", "execution_count": 133, "id": "7ed81f3a-451a-419a-97b7-9a6bd0ad1c13", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({1: 23, 2: 37})" }, "execution_count": 133, "metadata": {}, "output_type": "execute_result" } ], "source": [ "upper_limits = {1: 23, 2: 42}\n", "bcdict.bootstrap(keys, random.randint, 7, upper_limits)" ] }, { "cell_type": "markdown", "id": "cae19f46-8ac5-48a9-b81b-079bc9951fb4", "metadata": {}, "source": [ "That's equivalent to (different output because of randomness):" ] }, { "cell_type": "code", "execution_count": 134, "id": "cff1b0e6-3154-4c37-ae9a-5c04a19cd21c", "metadata": {}, "outputs": [ { "data": { "text/plain": "{1: 18, 2: 11}" }, "execution_count": 134, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{k: random.randint(7, upper_limits[k]) for k in keys}" ] }, { "cell_type": "markdown", "id": "b5db5e00-ca63-474c-8108-0c771898ee3e", "metadata": {}, "source": [ "You can also pass the key itself to the function with `broadcast_arg` or `broadcast_kwarg`:" ] }, { "cell_type": "code", "execution_count": 135, "id": "f959ae04-dd3b-4e93-86e2-bade0dbd5472", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({1: 5, 2: 7})" }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bcdict.bootstrap_arg(keys, do_math, 2, 3)" ] }, { "cell_type": "markdown", "id": "bf285259-ba32-4899-9bc6-22fbb10ac84f", "metadata": {}, "source": [ "Equivalent to:" ] }, { "cell_type": "code", "execution_count": 136, "id": "c868e73b-a5a9-4821-bf35-d8efecb28f49", "metadata": {}, "outputs": [ { "data": { "text/plain": "{1: 5, 2: 7}" }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{k: do_math(k, 2, 3) for k in keys}" ] }, { "cell_type": "code", "execution_count": 137, "id": "b69a7967-f19a-4f8b-bf61-a2796677ceec", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({1: 7, 2: 8})" }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bcdict.bootstrap_kwarg(keys, do_math, 2, 3, argname=\"c\")" ] }, { "cell_type": "code", "execution_count": 138, "id": "a4c4aa19-ecf0-423b-9842-f5d7a569c5db", "metadata": {}, "outputs": [ { "data": { "text/plain": "{1: 7, 2: 8}" }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "{k: do_math(2, 3, c=k) for k in keys}" ] }, { "cell_type": "markdown", "id": "fa25d683-13d3-4f2f-8cba-e3322dcc1584", "metadata": {}, "source": [ "`bootstrap` is handy for initializing the dictionary with random or default values.\n", "\n", "`bootstrap_arg` and `bootsrap_kwarg` are handy for example for initializing data from a function or even an API call." ] }, { "cell_type": "markdown", "id": "befdac26-127c-4be8-b304-03b73aba984b", "metadata": {}, "source": [ "## Broadcast slicing\n", "\n", "Let's take our dict of strings again and demonstrate some slicing:" ] }, { "cell_type": "code", "execution_count": 139, "id": "d6f5d8c4-6353-44c6-acf5-117413113093", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': 'Hello', 'B': 'World!'})" }, "execution_count": 139, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = BCDict({\n", " \"A\": \"Hello\",\n", " \"B\": \"World!\",\n", "})\n", "s" ] }, { "cell_type": "code", "execution_count": 140, "id": "5acbe8cc-359d-43b3-83ee-20bef5d81611", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': 'e', 'B': 'o'})" }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[1]" ] }, { "cell_type": "code", "execution_count": 141, "id": "712bdd34-b9b1-457c-af63-71d87b947549", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'A': 'l', 'B': 'd'})" }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "char = {\"A\": 3, \"B\": 4}\n", "s[char]" ] }, { "cell_type": "markdown", "id": "6f38a9c0-f68f-4805-a6ec-a95a15134273", "metadata": {}, "source": [ "This works the same way with pandas DataFrames etc. So you can select columns and slice dictionaries of DataFrames intuitively." ] }, { "cell_type": "markdown", "source": [ "## Broadcast arithmetics\n", "\n", "Arithmetic operations and comparisons also support broadcasting:" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 163, "outputs": [ { "data": { "text/plain": "BCDict({'A': 2, 'B': 12})" }, "execution_count": 163, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = BCDict({'A': 1, 'B': 4})\n", "fac = BCDict({'A': 2, 'B': 3})\n", "d * fac" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 164, "outputs": [ { "data": { "text/plain": "BCDict({'A': False, 'B': True})" }, "execution_count": 164, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d > fac\n" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 165, "outputs": [ { "data": { "text/plain": "BCDict({'A': False, 'B': False})" }, "execution_count": 165, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d.eq(fac)\n" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "id": "abc45584-3209-4bc9-a256-1e16d2fa9af7", "metadata": {}, "source": [ "## Naming conflicts" ] }, { "cell_type": "markdown", "id": "683eb750-a1ec-444f-ae90-22d218e842d0", "metadata": {}, "source": [ "The rules for slicing vs. attribute access are as such:\n", "\n", "If there is a key with the value in brackets `[]` it is returned. Else, it is broadcast, and the values are sliced with it." ] }, { "cell_type": "code", "execution_count": 142, "id": "a4a99d16-c7ed-40cf-9891-3c54ed3f3205", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({0: 'Hello', 1: 'World!'})" }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = BCDict({\n", " 0: \"Hello\",\n", " 1: \"World!\",\n", "})\n", "s" ] }, { "cell_type": "code", "execution_count": 143, "id": "839e5738-b9d2-4a4c-923f-b561be8f456e", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({0: 'l', 1: 'l'})" }, "execution_count": 143, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[3] # return the 4th letter of each value in dict" ] }, { "cell_type": "code", "execution_count": 144, "id": "d4685081-0926-4530-9ec3-39d479c1e452", "metadata": {}, "outputs": [ { "data": { "text/plain": "'Hello'" }, "execution_count": 144, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s[0] # return value of key `0`" ] }, { "cell_type": "markdown", "id": "da2e6412-2783-4c4c-bb6b-af63969dd085", "metadata": {}, "source": [ "What if you want to slice with a value that is also a key? Use the attribute accessor `.a`:" ] }, { "cell_type": "code", "execution_count": 145, "id": "a5b71bad-2404-41ff-9cdc-59f4d47db7a3", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({0: 'H', 1: 'W'})" }, "execution_count": 145, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.a[0] # return first letter of each value" ] }, { "cell_type": "markdown", "id": "675d3e9e-8678-4dd5-a96a-5c6d8576279b", "metadata": {}, "source": [ "This also works for attributes and functions:" ] }, { "cell_type": "code", "execution_count": 146, "id": "3653dfe0-811e-49e8-ade0-45fce01913b7", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'upper': 'Hello', 'lower': 'World!'})" }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = BCDict({\n", " \"upper\": \"Hello\",\n", " \"lower\": \"World!\",\n", "})\n", "s" ] }, { "cell_type": "code", "execution_count": 147, "id": "3e53f793-e373-4473-b064-b4a2af1016de", "metadata": {}, "outputs": [ { "data": { "text/plain": "2" }, "execution_count": 147, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.__len__() # length of the dictionary" ] }, { "cell_type": "code", "execution_count": 148, "id": "bd4c182a-2059-4caa-bb1c-ba87dcc51b38", "metadata": {}, "outputs": [ { "data": { "text/plain": "BCDict({'upper': 5, 'lower': 6})" }, "execution_count": 148, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.a.__len__() # length of each value" ] }, { "cell_type": "markdown", "id": "a5736d29-c18c-427e-a7fa-36ea0b016ac2", "metadata": {}, "source": [ "## Next steps\n", "\n", "You are now ready to use the Broadcast Dictionary package.\n", "\n", "If you have any question, you can always get in touch via Github: [mariushelf/bcdict](https://github.com/mariushelf/bcdict).\n", "\n", "For a full, actually useful example, check how to [train and validate 3 models on three datasets](train_test_evaluate.ipynb) without a single for loop." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }