30

I often use pandas groupby to generate stacked tables. But then I often want to output the resulting nested relations to json. Is there any way to extract a nested json filed from the stacked table it produces?

Let's say I have a df like:

year office candidate  amount
2010 mayor  joe smith  100.00
2010 mayor  jay gould   12.00
2010 govnr  pati mara  500.00
2010 govnr  jess rapp   50.00
2010 govnr  jess rapp   30.00

I can do:

grouped = df.groupby('year', 'office', 'candidate').sum()

print grouped
                       amount
year office candidate 
2010 mayor  joe smith   100
            jay gould    12
     govnr  pati mara   500
            jess rapp    80

Beautiful! Of course, what I'd real like to do is get nested json via a command along the lines of grouped.to_json. But that feature isn't available. Any workarounds?

So, what I really want is something like:

{"2010": {"mayor": [
                    {"joe smith": 100},
                    {"jay gould": 12}
                   ]
         }, 
          {"govnr": [
                     {"pati mara":500}, 
                     {"jess rapp": 80}
                    ]
          }
}

Don

4
  • 2
    The code above doesn't actually work as the amount column (e.g. '$30') are strings so are added as strings rather than as numbers. Also, it's unclear what you want in terms of json output, why is to_json working for you? Commented Jun 23, 2014 at 19:50
  • @AndyHayden Good points. I've edited to fix/clarify.
    – Don
    Commented Jun 23, 2014 at 20:32
  • @Don is there any solution?
    – skycrew
    Commented Sep 29, 2015 at 7:02
  • @skycrew See answer from chrisb below.
    – Don
    Commented Sep 29, 2015 at 15:07

4 Answers 4

13

I don't think think there is anything built-in to pandas to create a nested dictionary of the data. Below is some code that should work in general for a series with a MultiIndex, using a defaultdict

The nesting code iterates through each level of the MultIndex, adding layers to the dictionary until the deepest layer is assigned to the Series value.

In  [99]: from collections import defaultdict

In [100]: results = defaultdict(lambda: defaultdict(dict))

In [101]: for index, value in grouped.itertuples():
     ...:     for i, key in enumerate(index):
     ...:         if i == 0:
     ...:             nested = results[key]
     ...:         elif i == len(index) - 1:
     ...:             nested[key] = value
     ...:         else:
     ...:             nested = nested[key]

In [102]: results
Out[102]: defaultdict(<function <lambda> at 0x7ff17c76d1b8>, {2010: defaultdict(<type 'dict'>, {'govnr': {'pati mara': 500.0, 'jess rapp': 80.0}, 'mayor': {'joe smith': 100.0, 'jay gould': 12.0}})})

In [106]: print json.dumps(results, indent=4)
{
    "2010": {
        "govnr": {
            "pati mara": 500.0, 
            "jess rapp": 80.0
        }, 
        "mayor": {
            "joe smith": 100.0, 
            "jay gould": 12.0
        }
    }
}
2
  • 2
    @chrisb I am trying to adapt your answer to a similar problem here, but am tripped up by the grouped.intertuples(): stackoverflow.com/questions/37819622/…
    – spaine
    Commented Jun 14, 2016 at 19:18
  • 1
    this will work only for three levels, what if there are more?
    – skt7
    Commented Jun 8, 2018 at 19:09
11

I had a look at the solution above and figured out that it only works for 3 levels of nesting. This solution will work for any number of levels.

import json
levels = len(grouped.index.levels)
dicts = [{} for i in range(levels)]
last_index = None

for index,value in grouped.itertuples():

    if not last_index:
        last_index = index

    for (ii,(i,j)) in enumerate(zip(index, last_index)):
        if not i == j:
            ii = levels - ii -1
            dicts[:ii] =  [{} for _ in dicts[:ii]]
            break

    for i, key in enumerate(reversed(index)):
        dicts[i][key] = value
        value = dicts[i]

    last_index = index


result = json.dumps(dicts[-1])
1
  • 2
    Love this answer. FYI: the latest versions of pandas replace line 2 with """levels = grouped.ndim""" Commented Oct 23, 2018 at 18:17
2

Here is a generic recursive solution for this problem:

def df_to_dict(df):
    if df.ndim == 1:
        return df.to_dict()

    ret = {}
    for key in df.index.get_level_values(0):
        sub_df = df.xs(key)
        ret[key] = df_to_dict(sub_df)
    return ret
1
  • This solution does not group based on the first column in the data frame
    – viru
    Commented May 21, 2021 at 22:50
1

I'm aware this is an old question, but I came across the same issue recently. Here's my solution. I borrowed a lot of stuff from chrisb's example (Thank you!).

This has the advantage that you can pass a lambda to get the final value from whatever enumerable you want, as well as for each group.

from collections import defaultdict

def dict_from_enumerable(enumerable, final_value, *groups):
    d = defaultdict(lambda: defaultdict(dict))
    group_count = len(groups)
    for item in enumerable:
        nested = d
        item_result = final_value(item) if callable(final_value) else item.get(final_value)
        for i, group in enumerate(groups, start=1):
            group_val = str(group(item) if callable(group) else item.get(group))
            if i == group_count:
                nested[group_val] = item_result
            else:
                nested = nested[group_val]
    return d

In the question, you'd call this function like:

dict_from_enumerable(grouped.itertuples(), 'amount', 'year', 'office', 'candidate')

The first argument can be an array of data as well, not even requiring pandas.

Not the answer you're looking for? Browse other questions tagged or ask your own question.