Handling Hierarchical Tidy Data in Financial Reporting

Views: 9

In this blog post, we will explore how to handle hierarchical tidy data in financial reporting. We will discuss how to merge necessary data into target rows to summarize amounts based on debit/credit codes, account numbers, and months, ensuring efficient data management and avoiding redundancy.

1. Characteristics of a hierarchical tidy data

Hierarchical tidy data reports different levels of observation in different rows, resulting in no redundant report files. However, we need to merge necessary data into a target row to summarize amounts based on debit/credit codes, account numbers, and months.

The entered date is reported in a header entry-level row, and data for debit/credit code, journal account number, and amount is reported in entry line-level rows. We need to merge the date into entry line-level data, which will be explained in the later sections.

The final results, showing the sum of CC05hW_01_target for each combination of JP04a_GL02_03_month, JP05a_GL03_04, JP13a_BS16_01, and JP13a_BS16_02.

7. Final Results DataFrame with Sum by Combination:
  JP04a_GL02_03_month JP05a_GL03_04  JP13a_BS16_01        JP13a_BS16_02  CC05hW_01_target
0             2021-04             D            150      Receivable Note          20392404
1             2021-04             C            152  Accounts Receivable          29217041
2             2021-04             D            131      Regular Deposit           8823062
3             2021-04             D            741         Payment Fees              1575
4             2021-05             D            152  Accounts Receivable           6682852
5             2021-05             C            511                Sales           6540100
6             2021-05             C            521        Sales Returns            142752

2. Source Code and Explanation

import pandas as pd

# Load the CSV file
file_path = '2405/240530/tidyGL05-31_translated.csv'

# Read the CSV file into a DataFrame
df = pd.read_csv(file_path, encoding='utf-8-sig')

# Strip any leading/trailing spaces from column names
df.columns = df.columns.str.strip()

# Convert relevant columns to appropriate data types
df['JP05a'] = pd.to_numeric(df['JP05a'], errors='coerce').astype('Int64')
df['JP12a'] = pd.to_numeric(df['JP12a'], errors='coerce').astype('Int64')
df['BS04cc'] = pd.to_numeric(df['BS04cc'], errors='coerce').astype('Int64')
df['CC05hW_01'] = pd.to_numeric(df['CC05hW_01'], errors='coerce').astype('Int64')
df['JP13a_BS16_01'] = pd.to_numeric(df['JP13a_BS16_01'], errors='coerce').astype('Int64')

# Ensure JP05a_GL03_04 is correctly formatted as a string for comparison
df['JP05a_GL03_04'] = df['JP05a_GL03_04'].astype(str)

# Convert JP04a_GL02_03 to datetime, handle non-standard date formats
df['JP04a_GL02_03'] = pd.to_datetime(df['JP04a_GL02_03'], errors='coerce')

# Extract month from JP04a_GL02_03 and create a new column for it
df['JP04a_GL02_03_month'] = df['JP04a_GL02_03'].dt.to_period('M').astype(str)

# Print initial DataFrame
print("1. Initial DataFrame:")
print(df.head())

# Extract rows where JP12a=0 and BS04cc=0 to collect the target CC05hW_01 values
target_rows = df[(df['JP12a'] == 0) & (df['BS04cc'] == 0) & (df['JP05a'] != 0)][['JP04a', 'JP05a', 'CC05hW_01']]
print("\n2. Target Rows:")
print(target_rows.head())

# Rename the target column for clarity before merging
target_rows = target_rows.rename(columns={'CC05hW_01': 'CC05hW_01_target'})

# Extract rows where only JP04a has value and JP05a, JP12a, BS04cc are all zero
jp04a_rows = df[(df['JP12a'] == 0) & (df['BS04cc'] == 0) & (df['JP05a'] == 0)][['JP04a', 'JP04a_GL02_03', 'JP04a_GL02_03_month']]

# Rename the columns for clarity before merging
jp04a_rows = jp04a_rows.rename(columns={'JP04a_GL02_03': 'JP04a_GL02_03_value', 'JP04a_GL02_03_month': 'JP04a_GL02_03_month_value'})

# Merge the target CC05hW_01 values back into the main DataFrame
df = pd.merge(df, target_rows, on=['JP04a', 'JP05a'], how='left')

# Merge the JP04a_GL02_03 and JP04a_GL02_03_month values back into the main DataFrame
df = pd.merge(df, jp04a_rows, on='JP04a', how='left')

# Update the main DataFrame with the correct JP04a_GL02_03 and JP04a_GL02_03_month values
df['JP04a_GL02_03'] = df['JP04a_GL02_03_value'].combine_first(df['JP04a_GL02_03'])
df['JP04a_GL02_03_month'] = df['JP04a_GL02_03_month_value'].combine_first(df['JP04a_GL02_03_month'])

# Drop the temporary columns used for merging
df.drop(columns=['JP04a_GL02_03_value', 'JP04a_GL02_03_month_value'], inplace=True)

# Print DataFrame after merging and updating
print("\n3. DataFrame after merging and updating:")
print(df.head())

# Collect all unique combinations of JP05a_GL03_04, JP13a_BS16_01, and JP13a_BS16_02
unique_combinations = df[['JP05a_GL03_04', 'JP13a_BS16_01', 'JP13a_BS16_02']].drop_duplicates()
print("\n4. Unique Combinations:")
print(unique_combinations)

# Initialize an empty DataFrame to store the results
results = pd.DataFrame()

# Iterate over each unique combination to filter and aggregate data
for _, row in unique_combinations.iterrows():
    jp05a_gl03_04 = row['JP05a_GL03_04']
    jp13a_bs16_01 = row['JP13a_BS16_01']
    jp13a_bs16_02 = row['JP13a_BS16_02']
    
    # Filter the DataFrame based on the current combination
    filtered_df = df[(df['JP05a_GL03_04'] == jp05a_gl03_04) & 
                     (df['JP13a_BS16_01'] == jp13a_bs16_01) & 
                     (df['JP13a_BS16_02'] == jp13a_bs16_02) & 
                     df['CC05hW_01_target'].notna()]

    print(f"\n5. Filtered DataFrame for combination ({jp05a_gl03_04}, {jp13a_bs16_01}, {jp13a_bs16_02}):")
    print(filtered_df)

    if not filtered_df.empty:
        # Print the results without dropping NaN values
        for _, filtered_row in filtered_df.iterrows():
            date = filtered_row['JP04a_GL02_03'] if pd.notna(filtered_row['JP04a_GL02_03']) else 'N/A'
            debit_credit_code = filtered_row['JP05a_GL03_04'] if pd.notna(filtered_row['JP05a_GL03_04']) else 'N/A'
            ledger_account_number = filtered_row['JP13a_BS16_01'] if pd.notna(filtered_row['JP13a_BS16_01']) else 'N/A'
            ledger_account_name = filtered_row['JP13a_BS16_02'] if pd.notna(filtered_row['JP13a_BS16_02']) else 'N/A'
            amount = filtered_row['CC05hW_01_target'] if pd.notna(filtered_row['CC05hW_01_target']) else 'N/A'

            print(f"Date: {date}, "
                  f"Debit/Credit Code: {debit_credit_code}, "
                  f"Ledger Account Number: {ledger_account_number}, "
                  f"Ledger Account Name: {ledger_account_name}, "
                  f"Amount: {amount}")

        # Ensure columns used for grouping are not null
        filtered_df = filtered_df.fillna('N/A')

        # Group by the required columns and aggregate the sum of 'CC05hW_01_target'
        result = filtered_df.groupby(
            ['JP04a_GL02_03_month', 'JP05a_GL03_04', 'JP13a_BS16_01', 'JP13a_BS16_02']
        )['CC05hW_01_target'].sum().reset_index()

        print("\n6. Intermediate Result:")
        print(result)

        # Append the result to the results DataFrame
        results = pd.concat([results, result], ignore_index=True)

# Display the final results
print("\n7. Final Results DataFrame with Sum by Combination:")
print(results)

# If you need to save the result to a new CSV file, uncomment the following line:
# results.to_csv('data/_PCA/try/sum_by_combination.csv', index=False, encoding='utf-8-sig')

3. Explanation of Each Step

3.1. Load and Read the CSV File:

First, let’s start with our initial data and the setup required to handle it. The source CSV file contains the hierarchical tidy data.

The following Python script reads the CSV file, processes the data, and merges necessary information into the target rows.

file_path = 'tidyGL05-31_translated.csv'
df = pd.read_csv(file_path, encoding='utf-8-sig')

3.2. Strip Leading/Trailing Spaces from Column Names:

df.columns = df.columns.str.strip()

This ensures that any extra spaces in the column names are removed, which helps in avoiding errors during data manipulation.

3.3. Convert Relevant Columns to Appropriate Data Types:

df['JP05a'] = pd.to_numeric(df['JP05a'], errors='coerce').astype('Int64')
df['JP12a'] = pd.to_numeric(df['JP12a'], errors='coerce').astype('Int64')
df['BS04cc'] = pd.to_numeric(df['BS04cc'], errors='coerce').astype('Int64')
df['CC05hW_01'] = pd.to_numeric(df['CC05hW_01'], errors='coerce').astype('Int64')
df['JP13a_BS16_01'] = pd.to_numeric(df['JP13a_BS16_01'], errors='coerce').astype('Int64')

Columns ‘JP05a’, ‘JP12a’, ‘BS04cc’ are dimension sequence number.
Column ‘CC05hW_01’ is monetary amount in JPY which has no minor unit.
Column ‘JP13a_BS16_01’ is a chart of accounts number.
These lines convert specific columns to numeric data types, handling any errors by coercing them to NaN and then converting them to the nullable integer type (Int64).

3.4. Ensure JP05a_GL03_04 is Correctly Formatted as a String:

df['JP05a_GL03_04'] = df['JP05a_GL03_04'].astype(str)

This ensures that JP05a_GL03_04 is a debit/credit code and treated as a string for consistent comparison during filtering and merging operations.

3.5. Convert JP04a_GL02_03 to Datetime:

df['JP04a_GL02_03'] = pd.to_datetime(df['JP04a_GL02_03'], errors='coerce')

This line converts the JP04a_GL02_03 column to datetime format, handling non-standard date formats by coercing errors to NaT.

3.6. Extract Month from JP04a_GL02_03:

df['JP04a_GL02_03_month'] = df['JP04a_GL02_03'].dt.to_period('M').astype(str)

This extracts the month from the JP04a_GL02_03 datetime column and creates a new column JP04a_GL02_03_month.

3.7. Print Initial DataFrame:

print("1. Initial DataFrame:")
print(df.head())

Printing the initial DataFrame allows us to verify the initial data and ensure that the data types and values are correctly formatted.

1. Initial DataFrame:
   JP04a  JP05a  JP12a  BS04cc            CC03dh_01  JP04a_GL02_01  ... JP01c_BS28_01   JP01c_BS28_02  BS04cc_01          BS04cc_02   BS04cc_03  JP04a_GL02_03_month
0      1      0      0       0  2021-02-01T13:55:00          273.0  ...           NaN             NaN        NaN                NaN         NaN              2021-04
1      1      2      0       0                  NaN            NaN  ...            00  Not Applicable        NaN                NaN         NaN                  NaT
2      1      2      0       1                  NaN            NaN  ...           NaN             NaN        0.0  Common Department  Department                  NaT
3      1      3      0       0                  NaN            NaN  ...            00  Not Applicable        NaN                NaN         NaN                  NaT
4      1      3      1       0                  NaN            NaN  ...           NaN             NaN        NaN                NaN         NaN                  NaT

[5 rows x 24 columns]

3.8. Extract Target Rows:

target_rows = df[(df['JP12a'] == 0) & (df['BS04cc'] == 0) & (df['JP05a'] != 0)][['JP04a', 'JP05a', 'CC05hW_01']]
print("\n2. Target Rows:")
print(target_rows.head())

This extracts rows where JP12a and BS04cc are zero, but JP05a is not zero, to collect the CC05hW_01 values.

2. Target Rows:
    JP04a  JP05a  CC05hW_01
1       1      2   20392404
3       1      3   20392404
7       2      2    2186153
10      2      3    2186153
13      2      5        735

3.9. Rename Target Column for Clarity:

target_rows = target_rows.rename(columns={'CC05hW_01': 'CC05hW_01_target'})

Renaming the target column helps in distinguishing it during the merging process.

3.10. Extract Rows with Only JP04a Value:

jp04a_rows = df[(df['JP12a'] == 0) & (df['BS04cc'] == 0) & (df['JP05a'] == 0)][['JP04a', 'JP04a_GL02_03', 'JP04a_GL02_03_month']]
jp04a_rows = jp04a_rows.rename(columns={'JP04a_GL02_03': 'JP04a_GL02_03_value', 'JP04a_GL02_03_month': 'JP04a_GL02_03_month_value'})

This extracts rows where JP04a has a value and other specified columns are zero. The relevant columns are renamed for clarity during merging.

3.11. Merge Target Rows Back into Main DataFrame:

df = pd.merge(df, target_rows, on=['JP04a', 'JP05a'], how='left')

This merges the target CC05hW_01 values back into the main DataFrame based on JP04a and JP05a.

3.12. Merge JP04a Values into Main DataFrame:

df = pd.merge(df, jp04a_rows, on='JP04a', how='left')

This merges the JP04a_GL02_03 and JP04a_GL02_03_month values back into the main DataFrame based on JP04a.

3.13. Update Main DataFrame with Correct Values:

df['JP04a_GL02_03'] = df['JP04a_GL02_03_value'].combine_first(df['JP04a_GL02_03'])
df['JP04a_GL02_03_month'] = df['JP04a_GL02_03_month_value'].combine_first(df['JP04a_GL02_03_month'])
df.drop(columns=['JP04a_GL02_03_value', 'JP04a_GL02_03_month_value'], inplace=True)

This updates the main DataFrame with the correct JP04a_GL02_03 and JP04a_GL02_03_month values using the combine_first method, which prioritizes non-null values from the first Series. The temporary columns used for merging are then dropped.

3.14. Print DataFrame After Merging and Updating:

print("\n3. DataFrame after merging and updating:")
print(df.head())

Printing the DataFrame after merging and updating allows us to verify that the data has been correctly merged and updated.

3. DataFrame after merging and updating:
   JP04a  JP05a  JP12a  BS04cc            CC03dh_01  JP04a_GL02_01  ...   JP01c_BS28_02 BS04cc_01          BS04cc_02   BS04cc_03 JP04a_GL02_03_month  CC05hW_01_target
0      1      0      0       0  2021-02-01T13:55:00          273.0  ...             NaN       NaN                NaN         NaN             2021-04              <NA>
1      1      2      0       0                  NaN            NaN  ...  Not Applicable       NaN                NaN         NaN             2021-04          20392404
2      1      2      0       1                  NaN            NaN  ...             NaN       0.0  Common Department  Department             2021-04          20392404
3      1      3      0       0                  NaN            NaN  ...  Not Applicable       NaN                NaN         NaN             2021-04          20392404
4      1      3      1       0                  NaN            NaN  ...             NaN       NaN                NaN         NaN             2021-04          20392404

[5 rows x 25 columns]

3.15. Collect Unique Combinations:

unique_combinations = df[['JP05a_GL03_04', 'JP13a_BS16_01', 'JP13a_BS16_02']].drop_duplicates()
print("\n4. Unique Combinations:")
print(unique_combinations)

This collects all unique combinations of JP05a_GL03_04, JP13a_BS16_01, and JP13a_BS16_02 to facilitate filtering and aggregating data.

4. Unique Combinations:
   JP05a_GL03_04  JP13a_BS16_01        JP13a_BS16_02
0            nan           <NA>                  NaN
1              D            150      Receivable Note
3              C            152  Accounts Receivable
7              D            131      Regular Deposit
13             D            741         Payment Fees
31             D            152  Accounts Receivable
34             C            511                Sales
49             C            521        Sales Returns

3.16. Initialize Results DataFrame:

results = pd.DataFrame()

An empty DataFrame is initialized to store the results of the aggregation.

3.17. Iterate Over Each Unique Combination to Filter and Aggregate Data:

for _, row in unique_combinations.iterrows():
    jp05a_gl03_04 = row['JP05a_GL03_04']
    jp13a_bs16_01 = row['JP13a_BS16_01']
    jp13a_bs16_02 = row['JP13a_BS16_02']

    filtered_df = df[(df['JP05a_GL03_04'] == jp05a_gl03_04) &
                     (df['JP13a_BS16_01'] == jp13a_bs16_01) &
                     (df['JP13a_BS16_02'] == jp13a_bs16_02) &
                     df['CC05hW_01_target'].notna()]

    print(f"\n5. Filtered DataFrame for combination ({jp05a_gl03_04}, {jp13a_bs16_01}, {jp13a_bs16_02}):")
    print(filtered_df)

    if not filtered_df.empty:
        for _, filtered_row in filtered_df.iterrows():
            date = filtered_row['JP04a_GL02_03'] if pd.notna(filtered_row['JP04a_GL02_03']) else 'N/A'
            debit_credit_code = filtered_row['JP05a_GL03_04'] if pd.notna(filtered_row['JP05a_GL03_04']) else 'N/A'
            ledger_account_number = filtered_row['JP13a_BS16_01'] if pd.notna(filtered_row['JP13a_BS16_01']) else 'N/A'
            ledger_account_name = filtered_row['JP13a_BS16_02'] if pd.notna(filtered_row['JP13a_BS16_02']) else 'N/A'
            amount = filtered_row['CC05hW_01_target'] if pd.notna(filtered_row['CC05hW_01_target']) else 'N/A'

            print(f"Date: {date}, "
                  f"Debit/Credit Code: {debit_credit_code}, "
                  f"Ledger Account Number: {ledger_account_number}, "
                  f"Ledger Account Name: {ledger_account_name}, "
                  f"Amount: {amount}")

        filtered_df = filtered_df.fillna('N/A')

        result = filtered_df.groupby(
            ['JP04a_GL02_03_month', 'JP05a_GL03_04', 'JP13a_BS16_01', 'JP13a_BS16_02']
        )['CC05hW_01_target'].sum().reset_index()

        print("\n6. Intermediate Result:")
        print(result)

        results = pd.concat([results, result], ignore_index=True)

This section iterates over each unique combination, filters the DataFrame, prints intermediate results, and aggregates the sum of CC05hW_01_target.

5. Filtered DataFrame for combination (nan, <NA>, nan):
Empty DataFrame
Columns: [JP04a, JP05a, JP12a, BS04cc, CC03dh_01, JP04a_GL02_01, JP04a_GL02_03, JP04a_GL02_04, GL05c_01, BS15c_01, JP05a_GL03_04, CC05hW_01, JP13a_BS16_01, JP13a_BS16_02, JP12a_01, JP12a_02, JP12a_03, CC05hZ_01, JP01c_BS28_01, JP01c_BS28_02, BS04cc_01, BS04cc_02, BS04cc_03, JP04a_GL02_03_month, CC05hW_01_target]
Index: []

5. Filtered DataFrame for combination (D, 150, Receivable Note):
   JP04a  JP05a  JP12a  BS04cc CC03dh_01  JP04a_GL02_01 JP04a_GL02_03  ... JP01c_BS28_01   JP01c_BS28_02  BS04cc_01 BS04cc_02  BS04cc_03  JP04a_GL02_03_month CC05hW_01_target
1      1      2      0       0       NaN            NaN    2021-04-25  ...            00  Not Applicable        NaN       NaN        NaN              2021-04         20392404

[1 rows x 25 columns]
Date: 2021-04-25 00:00:00, Debit/Credit Code: D, Ledger Account Number: 150, Ledger Account Name: Receivable Note, Amount: 20392404

6. Intermediate Result:
  JP04a_GL02_03_month JP05a_GL03_04  JP13a_BS16_01    JP13a_BS16_02  CC05hW_01_target
0             2021-04             D            150  Receivable Note          20392404

5. Filtered DataFrame for combination (C, 152, Accounts Receivable):
    JP04a  JP05a  JP12a  BS04cc CC03dh_01  JP04a_GL02_01  ...   JP01c_BS28_02 BS04cc_01  BS04cc_02  BS04cc_03 JP04a_GL02_03_month  CC05hW_01_target
3       1      3      0       0       NaN            NaN  ...  Not Applicable       NaN        NaN        NaN             2021-04          20392404
10      2      3      0       0       NaN            NaN  ...  Not Applicable       NaN        NaN        NaN             2021-04           2186153
15      2      6      0       0       NaN            NaN  ...  Not Applicable       NaN        NaN        NaN             2021-04               735
22      3      3      0       0       NaN            NaN  ...  Not Applicable       NaN        NaN        NaN             2021-04           6636909
27      3      6      0       0       NaN            NaN  ...  Not Applicable       NaN        NaN        NaN             2021-04               840

[5 rows x 25 columns]
Date: 2021-04-25 00:00:00, Debit/Credit Code: C, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 20392404
Date: 2021-04-25 00:00:00, Debit/Credit Code: C, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 2186153
Date: 2021-04-25 00:00:00, Debit/Credit Code: C, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 735
Date: 2021-04-25 00:00:00, Debit/Credit Code: C, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 6636909
Date: 2021-04-25 00:00:00, Debit/Credit Code: C, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 840

6. Intermediate Result:
  JP04a_GL02_03_month JP05a_GL03_04  JP13a_BS16_01        JP13a_BS16_02  CC05hW_01_target
0             2021-04             C            152  Accounts Receivable          29217041

5. Filtered DataFrame for combination (D, 131, Regular Deposit):
    JP04a  JP05a  JP12a  BS04cc CC03dh_01  JP04a_GL02_01  ...   JP01c_BS28_02 BS04cc_01  BS04cc_02  BS04cc_03 JP04a_GL02_03_month  CC05hW_01_target
7       2      2      0       0       NaN            NaN  ...  Not Applicable       NaN        NaN        NaN             2021-04           2186153
19      3      2      0       0       NaN            NaN  ...  Not Applicable       NaN        NaN        NaN             2021-04           6636909

[2 rows x 25 columns]
Date: 2021-04-25 00:00:00, Debit/Credit Code: D, Ledger Account Number: 131, Ledger Account Name: Regular Deposit, Amount: 2186153
Date: 2021-04-25 00:00:00, Debit/Credit Code: D, Ledger Account Number: 131, Ledger Account Name: Regular Deposit, Amount: 6636909

6. Intermediate Result:
  JP04a_GL02_03_month JP05a_GL03_04  JP13a_BS16_01    JP13a_BS16_02  CC05hW_01_target
0             2021-04             D            131  Regular Deposit           8823062

5. Filtered DataFrame for combination (D, 741, Payment Fees):
    JP04a  JP05a  JP12a  BS04cc CC03dh_01  JP04a_GL02_01 JP04a_GL02_03  ... JP01c_BS28_01  JP01c_BS28_02  BS04cc_01 BS04cc_02  BS04cc_03  JP04a_GL02_03_month CC05hW_01_target
13      2      5      0       0       NaN            NaN    2021-04-25  ...            Q5   Purchase 10%        NaN       NaN        NaN              2021-04
  735
25      3      5      0       0       NaN            NaN    2021-04-25  ...            Q5   Purchase 10%        NaN       NaN        NaN              2021-04
  840

[2 rows x 25 columns]
Date: 2021-04-25 00:00:00, Debit/Credit Code: D, Ledger Account Number: 741, Ledger Account Name: Payment Fees, Amount: 735
Date: 2021-04-25 00:00:00, Debit/Credit Code: D, Ledger Account Number: 741, Ledger Account Name: Payment Fees, Amount: 840

6. Intermediate Result:
  JP04a_GL02_03_month JP05a_GL03_04  JP13a_BS16_01 JP13a_BS16_02  CC05hW_01_target
0             2021-04             D            741  Payment Fees              1575

5. Filtered DataFrame for combination (D, 152, Accounts Receivable):
    JP04a  JP05a  JP12a  BS04cc CC03dh_01  JP04a_GL02_01  ...   JP01c_BS28_02 BS04cc_01  BS04cc_02  BS04cc_03 JP04a_GL02_03_month  CC05hW_01_target
31      4      2      0       0       NaN            NaN  ...  Not Applicable       NaN        NaN        NaN             2021-05           1245400
36      4      5      0       0       NaN            NaN  ...  Not Applicable       NaN        NaN        NaN             2021-05           1329984
41      4      8      0       0       NaN            NaN  ...  Not Applicable       NaN        NaN        NaN             2021-05           3964716
46      4     11      0       0       NaN            NaN  ...  Not Applicable       NaN        NaN        NaN             2021-05            101952
51      4     14      0       0       NaN            NaN  ...  Not Applicable       NaN        NaN        NaN             2021-05             40800

[5 rows x 25 columns]
Date: 2021-05-31 00:00:00, Debit/Credit Code: D, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 1245400
Date: 2021-05-31 00:00:00, Debit/Credit Code: D, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 1329984
Date: 2021-05-31 00:00:00, Debit/Credit Code: D, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 3964716
Date: 2021-05-31 00:00:00, Debit/Credit Code: D, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 101952
Date: 2021-05-31 00:00:00, Debit/Credit Code: D, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 40800

6. Intermediate Result:
  JP04a_GL02_03_month JP05a_GL03_04  JP13a_BS16_01        JP13a_BS16_02  CC05hW_01_target
0             2021-05             D            152  Accounts Receivable           6682852

5. Filtered DataFrame for combination (C, 511, Sales):
    JP04a  JP05a  JP12a  BS04cc CC03dh_01  JP04a_GL02_01 JP04a_GL02_03  ... JP01c_BS28_01  JP01c_BS28_02  BS04cc_01 BS04cc_02  BS04cc_03  JP04a_GL02_03_month CC05hW_01_target
34      4      3      0       0       NaN            NaN    2021-05-31  ...            B5      Sales 10%        NaN       NaN        NaN              2021-05          1245400
39      4      6      0       0       NaN            NaN    2021-05-31  ...            B5      Sales 10%        NaN       NaN        NaN              2021-05          1329984
44      4      9      0       0       NaN            NaN    2021-05-31  ...            B5      Sales 10%        NaN       NaN        NaN              2021-05          3964716

[3 rows x 25 columns]
Date: 2021-05-31 00:00:00, Debit/Credit Code: C, Ledger Account Number: 511, Ledger Account Name: Sales, Amount: 1245400
Date: 2021-05-31 00:00:00, Debit/Credit Code: C, Ledger Account Number: 511, Ledger Account Name: Sales, Amount: 1329984
Date: 2021-05-31 00:00:00, Debit/Credit Code: C, Ledger Account Number: 511, Ledger Account Name: Sales, Amount: 3964716

6. Intermediate Result:
  JP04a_GL02_03_month JP05a_GL03_04  JP13a_BS16_01 JP13a_BS16_02  CC05hW_01_target
0             2021-05             C            511         Sales           6540100

5. Filtered DataFrame for combination (C, 521, Sales Returns):
    JP04a  JP05a  JP12a  BS04cc CC03dh_01  JP04a_GL02_01 JP04a_GL02_03  ... JP01c_BS28_01  JP01c_BS28_02  BS04cc_01 BS04cc_02  BS04cc_03  JP04a_GL02_03_month CC05hW_01_target
49      4     12      0       0       NaN            NaN    2021-05-31  ...            C5    Revenge 10%        NaN       NaN        NaN              2021-05           101952
54      4     15      0       0       NaN            NaN    2021-05-31  ...            C5    Revenge 10%        NaN       NaN        NaN              2021-05
40800

[2 rows x 25 columns]
Date: 2021-05-31 00:00:00, Debit/Credit Code: C, Ledger Account Number: 521, Ledger Account Name: Sales Returns, Amount: 101952
Date: 2021-05-31 00:00:00, Debit/Credit Code: C, Ledger Account Number: 521, Ledger Account Name: Sales Returns, Amount: 40800

6. Intermediate Result:
  JP04a_GL02_03_month JP05a_GL03_04  JP13a_BS16_01  JP13a_BS16_02  CC05hW_01_target
0             2021-05             C            521  Sales Returns            142752

3.18. Display Final Results:

print("\n7. Final Results DataFrame with Sum by Combination:")
print(results)

Finally, we display the final results, showing the sum of CC05hW_01_target for each combination of JP04a_GL02_03_month, JP05a_GL03_04, JP13a_BS16_01, and JP13a_BS16_02.

7. Final Results DataFrame with Sum by Combination:
  JP04a_GL02_03_month JP05a_GL03_04  JP13a_BS16_01        JP13a_BS16_02  CC05hW_01_target
0             2021-04             D            150      Receivable Note          20392404
1             2021-04             C            152  Accounts Receivable          29217041
2             2021-04             D            131      Regular Deposit           8823062
3             2021-04             D            741         Payment Fees              1575
4             2021-05             D            152  Accounts Receivable           6682852
5             2021-05             C            511                Sales           6540100
6             2021-05             C            521        Sales Returns            142752

By following these steps, we ensure efficient data management in hierarchical tidy data, avoiding redundancy and accurately summarizing financial data.

4. Conclusion

Handling hierarchical tidy data requires careful consideration of how different levels of observation are reported. By merging necessary data into target rows, we can summarize amounts based on debit/credit codes, account numbers, and months, thereby avoiding redundancy and ensuring efficient data management.

The approach outlined in this post demonstrates how to handle such data using Python and pandas, providing a robust solution for financial reporting.

5. Hierarchical tidy data file

JP04a

JP05a

JP12a

BS04cc

CC03dh_01

JP04a_GL02_01

JP04a_GL02_03

JP04a_GL02_04

GL05c_01

BS15c_01

JP05a_GL03_04

CC05hW_01

JP13a_BS16_01

JP13a_BS16_02

JP12a_01

JP12a_02

JP12a_03

CC05hZ_01

JP01c_BS28_01

JP01c_BS28_02

BS04cc_01

BS04cc_02

BS04cc_03

1

0

0

0

2021-02-01T13:55:00

273.0

2021-04-25

April 25, Slip No. 201 Imai Department Store

1.0

21.0

1

2

0

0

D

20392404.0

150.0

Receivable Note

0.0

00

Not Applicable

1

2

0

1

0.0

Common Department

Department

1

3

0

0

C

20392404.0

152.0

Accounts Receivable

0.0

00

Not Applicable

1

3

1

0

2.0

Imai Department Store

Sub-Account

1

3

0

1

0.0

Common Department

Department

2

0

0

0

2021-02-01T13:55:00

274.0

2021-04-25

April 25 Paper NO202 Comprehensive Management Co., Ltd.

1.0

21.0

2

2

0

0

D

2186153.0

131.0

Regular Deposit

0.0

00

Not Applicable

2

2

1

0

30.0

Mii Nagoya

Sub-Account

2

2

0

1

0.0

Common Department

Department

2

3

0

0

C

2186153.0

152.0

Accounts Receivable

0.0

00

Not Applicable

2

3

1

0

7.0

Comprehensive management Co., Ltd.

Sub-Account

2

3

0

1

0.0

Common Department

Department

2

5

0

0

D

735.0

741.0

Payment Fees

66.0

Q5

Purchase 10%

2

5

0

1

30.0

Nagoya Branch

Department

2

6

0

0

C

735.0

152.0

Accounts Receivable

0.0

00

Not Applicable

2

6

1

0

7.0

Comprehensive management Co., Ltd.

Sub-Account

2

6

0

1

0.0

Common Department

Department

3

0

0

0

2021-02-01T13:55:00

275.0

2021-04-25

April 25, Slip No. 203, Kobayashi Market Co., Ltd.

1.0

21.0

3

2

0

0

D

6636909.0

131.0

Regular Deposit

0.0

00

Not Applicable

3

2

1

0

10.0

In front of Hokkaido Station

Sub-Account

3

2

0

1

0.0

Common Department

Department

3

3

0

0

C

6636909.0

152.0

Accounts Receivable

0.0

00

Not Applicable

3

3

1

0

25.0

Kobayashi Market Co., Ltd.

Sub-Account

3

3

0

1

0.0

Common Department

Department

3

5

0

0

D

840.0

741.0

Payment Fees

76.0

Q5

Purchase 10%

3

5

0

1

10.0

Sapporo Sales Office

Department

3

6

0

0

C

840.0

152.0

Accounts Receivable

0.0

00

Not Applicable

3

6

1

0

25.0

Kobayashi Market Co., Ltd.

Sub-Account

3

6

0

1

0.0

Common Department

Department

4

0

0

0

2021-02-01T13:55:00

114.0

2021-05-31

Sale for May 1st to May 31st

1.0

21.0

4

2

0

0

D

1245400.0

152.0

Accounts Receivable

0.0

00

Not Applicable

4

2

1

0

5.0

AABC Co., Ltd.

Sub-Account

4

2

0

1

0.0

Common Department

Department

4

3

0

0

C

1245400.0

511.0

Sales

113218.0

B5

Sales 10%

4

3

0

1

10.0

Sapporo Sales Office

Department

4

5

0

0

D

1329984.0

152.0

Accounts Receivable

0.0

00

Not Applicable

4

5

1

0

14.0

Fujita Co., Ltd.

Sub-Account

4

5

0

1

0.0

Common Department

Department

4

6

0

0

C

1329984.0

511.0

Sales

120907.0

B5

Sales 10%

4

6

0

1

10.0

Sapporo Sales Office

Department

4

8

0

0

D

3964716.0

152.0

Accounts Receivable

0.0

00

Not Applicable

4

8

1

0

25.0

Kobayashi Market Co., Ltd.

Sub-Account

4

8

0

1

0.0

Common Department

Department

4

9

0

0

C

3964716.0

511.0

Sales

360428.0

B5

Sales 10%

4

9

0

1

10.0

Sapporo Sales Office

Department

4

11

0

0

D

101952.0

152.0

Accounts Receivable

0.0

00

Not Applicable

4

11

1

0

25.0

Kobayashi Market Co., Ltd.

Sub-Account

4

11

0

1

0.0

Common Department

Department

4

12

0

0

C

101952.0

521.0

Sales Returns

9268.0

C5

Revenge 10%

4

12

0

1

10.0

Sapporo Sales Office

Department

4

14

0

0

D

40800.0

152.0

Accounts Receivable

0.0

00

Not Applicable

4

14

1

0

5.0

AABC Co., Ltd.

Sub-Account

4

14

0

1

0.0

Common Department

Department

4

15

0

0

C

40800.0

521.0

Sales Returns

3709.0

C5

Revenge 10%

4

15

0

1

10.0

Sapporo Sales Office

Department


Posted

in

, , ,

by

Tags: