Views: 9
Handling Hierarchical Tidy Data in Financial Reporting
In this blog post, we will explore how to handle hierarchical tidy data in financial reporting. We will discuss how to merge necessary data into target rows to summarize amounts based on debit/credit codes, account numbers, and months, ensuring efficient data management and avoiding redundancy.
1. Characteristics of a hierarchical tidy data
Hierarchical tidy data reports different levels of observation in different rows, resulting in no redundant report files. However, we need to merge necessary data into a target row to summarize amounts based on debit/credit codes, account numbers, and months.
The entered date is reported in a header entry-level row, and data for debit/credit code, journal account number, and amount is reported in entry line-level rows. We need to merge the date into entry line-level data, which will be explained in the later sections.
The final results, showing the sum of CC05hW_01_target
for each combination of JP04a_GL02_03_month
, JP05a_GL03_04
, JP13a_BS16_01
, and JP13a_BS16_02
.
7. Final Results DataFrame with Sum by Combination:
JP04a_GL02_03_month JP05a_GL03_04 JP13a_BS16_01 JP13a_BS16_02 CC05hW_01_target
0 2021-04 D 150 Receivable Note 20392404
1 2021-04 C 152 Accounts Receivable 29217041
2 2021-04 D 131 Regular Deposit 8823062
3 2021-04 D 741 Payment Fees 1575
4 2021-05 D 152 Accounts Receivable 6682852
5 2021-05 C 511 Sales 6540100
6 2021-05 C 521 Sales Returns 142752
2. Source Code and Explanation
import pandas as pd
# Load the CSV file
file_path = '2405/240530/tidyGL05-31_translated.csv'
# Read the CSV file into a DataFrame
df = pd.read_csv(file_path, encoding='utf-8-sig')
# Strip any leading/trailing spaces from column names
df.columns = df.columns.str.strip()
# Convert relevant columns to appropriate data types
df['JP05a'] = pd.to_numeric(df['JP05a'], errors='coerce').astype('Int64')
df['JP12a'] = pd.to_numeric(df['JP12a'], errors='coerce').astype('Int64')
df['BS04cc'] = pd.to_numeric(df['BS04cc'], errors='coerce').astype('Int64')
df['CC05hW_01'] = pd.to_numeric(df['CC05hW_01'], errors='coerce').astype('Int64')
df['JP13a_BS16_01'] = pd.to_numeric(df['JP13a_BS16_01'], errors='coerce').astype('Int64')
# Ensure JP05a_GL03_04 is correctly formatted as a string for comparison
df['JP05a_GL03_04'] = df['JP05a_GL03_04'].astype(str)
# Convert JP04a_GL02_03 to datetime, handle non-standard date formats
df['JP04a_GL02_03'] = pd.to_datetime(df['JP04a_GL02_03'], errors='coerce')
# Extract month from JP04a_GL02_03 and create a new column for it
df['JP04a_GL02_03_month'] = df['JP04a_GL02_03'].dt.to_period('M').astype(str)
# Print initial DataFrame
print("1. Initial DataFrame:")
print(df.head())
# Extract rows where JP12a=0 and BS04cc=0 to collect the target CC05hW_01 values
target_rows = df[(df['JP12a'] == 0) & (df['BS04cc'] == 0) & (df['JP05a'] != 0)][['JP04a', 'JP05a', 'CC05hW_01']]
print("\n2. Target Rows:")
print(target_rows.head())
# Rename the target column for clarity before merging
target_rows = target_rows.rename(columns={'CC05hW_01': 'CC05hW_01_target'})
# Extract rows where only JP04a has value and JP05a, JP12a, BS04cc are all zero
jp04a_rows = df[(df['JP12a'] == 0) & (df['BS04cc'] == 0) & (df['JP05a'] == 0)][['JP04a', 'JP04a_GL02_03', 'JP04a_GL02_03_month']]
# Rename the columns for clarity before merging
jp04a_rows = jp04a_rows.rename(columns={'JP04a_GL02_03': 'JP04a_GL02_03_value', 'JP04a_GL02_03_month': 'JP04a_GL02_03_month_value'})
# Merge the target CC05hW_01 values back into the main DataFrame
df = pd.merge(df, target_rows, on=['JP04a', 'JP05a'], how='left')
# Merge the JP04a_GL02_03 and JP04a_GL02_03_month values back into the main DataFrame
df = pd.merge(df, jp04a_rows, on='JP04a', how='left')
# Update the main DataFrame with the correct JP04a_GL02_03 and JP04a_GL02_03_month values
df['JP04a_GL02_03'] = df['JP04a_GL02_03_value'].combine_first(df['JP04a_GL02_03'])
df['JP04a_GL02_03_month'] = df['JP04a_GL02_03_month_value'].combine_first(df['JP04a_GL02_03_month'])
# Drop the temporary columns used for merging
df.drop(columns=['JP04a_GL02_03_value', 'JP04a_GL02_03_month_value'], inplace=True)
# Print DataFrame after merging and updating
print("\n3. DataFrame after merging and updating:")
print(df.head())
# Collect all unique combinations of JP05a_GL03_04, JP13a_BS16_01, and JP13a_BS16_02
unique_combinations = df[['JP05a_GL03_04', 'JP13a_BS16_01', 'JP13a_BS16_02']].drop_duplicates()
print("\n4. Unique Combinations:")
print(unique_combinations)
# Initialize an empty DataFrame to store the results
results = pd.DataFrame()
# Iterate over each unique combination to filter and aggregate data
for _, row in unique_combinations.iterrows():
jp05a_gl03_04 = row['JP05a_GL03_04']
jp13a_bs16_01 = row['JP13a_BS16_01']
jp13a_bs16_02 = row['JP13a_BS16_02']
# Filter the DataFrame based on the current combination
filtered_df = df[(df['JP05a_GL03_04'] == jp05a_gl03_04) &
(df['JP13a_BS16_01'] == jp13a_bs16_01) &
(df['JP13a_BS16_02'] == jp13a_bs16_02) &
df['CC05hW_01_target'].notna()]
print(f"\n5. Filtered DataFrame for combination ({jp05a_gl03_04}, {jp13a_bs16_01}, {jp13a_bs16_02}):")
print(filtered_df)
if not filtered_df.empty:
# Print the results without dropping NaN values
for _, filtered_row in filtered_df.iterrows():
date = filtered_row['JP04a_GL02_03'] if pd.notna(filtered_row['JP04a_GL02_03']) else 'N/A'
debit_credit_code = filtered_row['JP05a_GL03_04'] if pd.notna(filtered_row['JP05a_GL03_04']) else 'N/A'
ledger_account_number = filtered_row['JP13a_BS16_01'] if pd.notna(filtered_row['JP13a_BS16_01']) else 'N/A'
ledger_account_name = filtered_row['JP13a_BS16_02'] if pd.notna(filtered_row['JP13a_BS16_02']) else 'N/A'
amount = filtered_row['CC05hW_01_target'] if pd.notna(filtered_row['CC05hW_01_target']) else 'N/A'
print(f"Date: {date}, "
f"Debit/Credit Code: {debit_credit_code}, "
f"Ledger Account Number: {ledger_account_number}, "
f"Ledger Account Name: {ledger_account_name}, "
f"Amount: {amount}")
# Ensure columns used for grouping are not null
filtered_df = filtered_df.fillna('N/A')
# Group by the required columns and aggregate the sum of 'CC05hW_01_target'
result = filtered_df.groupby(
['JP04a_GL02_03_month', 'JP05a_GL03_04', 'JP13a_BS16_01', 'JP13a_BS16_02']
)['CC05hW_01_target'].sum().reset_index()
print("\n6. Intermediate Result:")
print(result)
# Append the result to the results DataFrame
results = pd.concat([results, result], ignore_index=True)
# Display the final results
print("\n7. Final Results DataFrame with Sum by Combination:")
print(results)
# If you need to save the result to a new CSV file, uncomment the following line:
# results.to_csv('data/_PCA/try/sum_by_combination.csv', index=False, encoding='utf-8-sig')
3. Explanation of Each Step
3.1. Load and Read the CSV File:
First, let’s start with our initial data and the setup required to handle it. The source CSV file contains the hierarchical tidy data.
The following Python script reads the CSV file, processes the data, and merges necessary information into the target rows.
file_path = 'tidyGL05-31_translated.csv'
df = pd.read_csv(file_path, encoding='utf-8-sig')
3.2. Strip Leading/Trailing Spaces from Column Names:
df.columns = df.columns.str.strip()
This ensures that any extra spaces in the column names are removed, which helps in avoiding errors during data manipulation.
3.3. Convert Relevant Columns to Appropriate Data Types:
df['JP05a'] = pd.to_numeric(df['JP05a'], errors='coerce').astype('Int64')
df['JP12a'] = pd.to_numeric(df['JP12a'], errors='coerce').astype('Int64')
df['BS04cc'] = pd.to_numeric(df['BS04cc'], errors='coerce').astype('Int64')
df['CC05hW_01'] = pd.to_numeric(df['CC05hW_01'], errors='coerce').astype('Int64')
df['JP13a_BS16_01'] = pd.to_numeric(df['JP13a_BS16_01'], errors='coerce').astype('Int64')
Columns ‘JP05a’, ‘JP12a’, ‘BS04cc’ are dimension sequence number.
Column ‘CC05hW_01’ is monetary amount in JPY which has no minor unit.
Column ‘JP13a_BS16_01’ is a chart of accounts number.
These lines convert specific columns to numeric data types, handling any errors by coercing them to NaN and then converting them to the nullable integer type (Int64
).
3.4. Ensure JP05a_GL03_04 is Correctly Formatted as a String:
df['JP05a_GL03_04'] = df['JP05a_GL03_04'].astype(str)
This ensures that JP05a_GL03_04
is a debit/credit code and treated as a string for consistent comparison during filtering and merging operations.
3.5. Convert JP04a_GL02_03 to Datetime:
df['JP04a_GL02_03'] = pd.to_datetime(df['JP04a_GL02_03'], errors='coerce')
This line converts the JP04a_GL02_03
column to datetime format, handling non-standard date formats by coercing errors to NaT.
3.6. Extract Month from JP04a_GL02_03:
df['JP04a_GL02_03_month'] = df['JP04a_GL02_03'].dt.to_period('M').astype(str)
This extracts the month from the JP04a_GL02_03
datetime column and creates a new column JP04a_GL02_03_month
.
3.7. Print Initial DataFrame:
print("1. Initial DataFrame:")
print(df.head())
Printing the initial DataFrame allows us to verify the initial data and ensure that the data types and values are correctly formatted.
1. Initial DataFrame:
JP04a JP05a JP12a BS04cc CC03dh_01 JP04a_GL02_01 ... JP01c_BS28_01 JP01c_BS28_02 BS04cc_01 BS04cc_02 BS04cc_03 JP04a_GL02_03_month
0 1 0 0 0 2021-02-01T13:55:00 273.0 ... NaN NaN NaN NaN NaN 2021-04
1 1 2 0 0 NaN NaN ... 00 Not Applicable NaN NaN NaN NaT
2 1 2 0 1 NaN NaN ... NaN NaN 0.0 Common Department Department NaT
3 1 3 0 0 NaN NaN ... 00 Not Applicable NaN NaN NaN NaT
4 1 3 1 0 NaN NaN ... NaN NaN NaN NaN NaN NaT
[5 rows x 24 columns]
3.8. Extract Target Rows:
target_rows = df[(df['JP12a'] == 0) & (df['BS04cc'] == 0) & (df['JP05a'] != 0)][['JP04a', 'JP05a', 'CC05hW_01']]
print("\n2. Target Rows:")
print(target_rows.head())
This extracts rows where JP12a
and BS04cc
are zero, but JP05a
is not zero, to collect the CC05hW_01
values.
2. Target Rows:
JP04a JP05a CC05hW_01
1 1 2 20392404
3 1 3 20392404
7 2 2 2186153
10 2 3 2186153
13 2 5 735
3.9. Rename Target Column for Clarity:
target_rows = target_rows.rename(columns={'CC05hW_01': 'CC05hW_01_target'})
Renaming the target column helps in distinguishing it during the merging process.
3.10. Extract Rows with Only JP04a Value:
jp04a_rows = df[(df['JP12a'] == 0) & (df['BS04cc'] == 0) & (df['JP05a'] == 0)][['JP04a', 'JP04a_GL02_03', 'JP04a_GL02_03_month']]
jp04a_rows = jp04a_rows.rename(columns={'JP04a_GL02_03': 'JP04a_GL02_03_value', 'JP04a_GL02_03_month': 'JP04a_GL02_03_month_value'})
This extracts rows where JP04a
has a value and other specified columns are zero. The relevant columns are renamed for clarity during merging.
3.11. Merge Target Rows Back into Main DataFrame:
df = pd.merge(df, target_rows, on=['JP04a', 'JP05a'], how='left')
This merges the target CC05hW_01
values back into the main DataFrame based on JP04a
and JP05a
.
3.12. Merge JP04a Values into Main DataFrame:
df = pd.merge(df, jp04a_rows, on='JP04a', how='left')
This merges the JP04a_GL02_03
and JP04a_GL02_03_month
values back into the main DataFrame based on JP04a
.
3.13. Update Main DataFrame with Correct Values:
df['JP04a_GL02_03'] = df['JP04a_GL02_03_value'].combine_first(df['JP04a_GL02_03'])
df['JP04a_GL02_03_month'] = df['JP04a_GL02_03_month_value'].combine_first(df['JP04a_GL02_03_month'])
df.drop(columns=['JP04a_GL02_03_value', 'JP04a_GL02_03_month_value'], inplace=True)
This updates the main DataFrame with the correct JP04a_GL02_03
and JP04a_GL02_03_month
values using the combine_first
method, which prioritizes non-null values from the first Series. The temporary columns used for merging are then dropped.
3.14. Print DataFrame After Merging and Updating:
print("\n3. DataFrame after merging and updating:")
print(df.head())
Printing the DataFrame after merging and updating allows us to verify that the data has been correctly merged and updated.
3. DataFrame after merging and updating:
JP04a JP05a JP12a BS04cc CC03dh_01 JP04a_GL02_01 ... JP01c_BS28_02 BS04cc_01 BS04cc_02 BS04cc_03 JP04a_GL02_03_month CC05hW_01_target
0 1 0 0 0 2021-02-01T13:55:00 273.0 ... NaN NaN NaN NaN 2021-04 <NA>
1 1 2 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-04 20392404
2 1 2 0 1 NaN NaN ... NaN 0.0 Common Department Department 2021-04 20392404
3 1 3 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-04 20392404
4 1 3 1 0 NaN NaN ... NaN NaN NaN NaN 2021-04 20392404
[5 rows x 25 columns]
3.15. Collect Unique Combinations:
unique_combinations = df[['JP05a_GL03_04', 'JP13a_BS16_01', 'JP13a_BS16_02']].drop_duplicates()
print("\n4. Unique Combinations:")
print(unique_combinations)
This collects all unique combinations of JP05a_GL03_04
, JP13a_BS16_01
, and JP13a_BS16_02
to facilitate filtering and aggregating data.
4. Unique Combinations:
JP05a_GL03_04 JP13a_BS16_01 JP13a_BS16_02
0 nan <NA> NaN
1 D 150 Receivable Note
3 C 152 Accounts Receivable
7 D 131 Regular Deposit
13 D 741 Payment Fees
31 D 152 Accounts Receivable
34 C 511 Sales
49 C 521 Sales Returns
3.16. Initialize Results DataFrame:
results = pd.DataFrame()
An empty DataFrame is initialized to store the results of the aggregation.
3.17. Iterate Over Each Unique Combination to Filter and Aggregate Data:
for _, row in unique_combinations.iterrows():
jp05a_gl03_04 = row['JP05a_GL03_04']
jp13a_bs16_01 = row['JP13a_BS16_01']
jp13a_bs16_02 = row['JP13a_BS16_02']
filtered_df = df[(df['JP05a_GL03_04'] == jp05a_gl03_04) &
(df['JP13a_BS16_01'] == jp13a_bs16_01) &
(df['JP13a_BS16_02'] == jp13a_bs16_02) &
df['CC05hW_01_target'].notna()]
print(f"\n5. Filtered DataFrame for combination ({jp05a_gl03_04}, {jp13a_bs16_01}, {jp13a_bs16_02}):")
print(filtered_df)
if not filtered_df.empty:
for _, filtered_row in filtered_df.iterrows():
date = filtered_row['JP04a_GL02_03'] if pd.notna(filtered_row['JP04a_GL02_03']) else 'N/A'
debit_credit_code = filtered_row['JP05a_GL03_04'] if pd.notna(filtered_row['JP05a_GL03_04']) else 'N/A'
ledger_account_number = filtered_row['JP13a_BS16_01'] if pd.notna(filtered_row['JP13a_BS16_01']) else 'N/A'
ledger_account_name = filtered_row['JP13a_BS16_02'] if pd.notna(filtered_row['JP13a_BS16_02']) else 'N/A'
amount = filtered_row['CC05hW_01_target'] if pd.notna(filtered_row['CC05hW_01_target']) else 'N/A'
print(f"Date: {date}, "
f"Debit/Credit Code: {debit_credit_code}, "
f"Ledger Account Number: {ledger_account_number}, "
f"Ledger Account Name: {ledger_account_name}, "
f"Amount: {amount}")
filtered_df = filtered_df.fillna('N/A')
result = filtered_df.groupby(
['JP04a_GL02_03_month', 'JP05a_GL03_04', 'JP13a_BS16_01', 'JP13a_BS16_02']
)['CC05hW_01_target'].sum().reset_index()
print("\n6. Intermediate Result:")
print(result)
results = pd.concat([results, result], ignore_index=True)
This section iterates over each unique combination, filters the DataFrame, prints intermediate results, and aggregates the sum of CC05hW_01_target
.
5. Filtered DataFrame for combination (nan, <NA>, nan):
Empty DataFrame
Columns: [JP04a, JP05a, JP12a, BS04cc, CC03dh_01, JP04a_GL02_01, JP04a_GL02_03, JP04a_GL02_04, GL05c_01, BS15c_01, JP05a_GL03_04, CC05hW_01, JP13a_BS16_01, JP13a_BS16_02, JP12a_01, JP12a_02, JP12a_03, CC05hZ_01, JP01c_BS28_01, JP01c_BS28_02, BS04cc_01, BS04cc_02, BS04cc_03, JP04a_GL02_03_month, CC05hW_01_target]
Index: []
5. Filtered DataFrame for combination (D, 150, Receivable Note):
JP04a JP05a JP12a BS04cc CC03dh_01 JP04a_GL02_01 JP04a_GL02_03 ... JP01c_BS28_01 JP01c_BS28_02 BS04cc_01 BS04cc_02 BS04cc_03 JP04a_GL02_03_month CC05hW_01_target
1 1 2 0 0 NaN NaN 2021-04-25 ... 00 Not Applicable NaN NaN NaN 2021-04 20392404
[1 rows x 25 columns]
Date: 2021-04-25 00:00:00, Debit/Credit Code: D, Ledger Account Number: 150, Ledger Account Name: Receivable Note, Amount: 20392404
6. Intermediate Result:
JP04a_GL02_03_month JP05a_GL03_04 JP13a_BS16_01 JP13a_BS16_02 CC05hW_01_target
0 2021-04 D 150 Receivable Note 20392404
5. Filtered DataFrame for combination (C, 152, Accounts Receivable):
JP04a JP05a JP12a BS04cc CC03dh_01 JP04a_GL02_01 ... JP01c_BS28_02 BS04cc_01 BS04cc_02 BS04cc_03 JP04a_GL02_03_month CC05hW_01_target
3 1 3 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-04 20392404
10 2 3 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-04 2186153
15 2 6 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-04 735
22 3 3 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-04 6636909
27 3 6 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-04 840
[5 rows x 25 columns]
Date: 2021-04-25 00:00:00, Debit/Credit Code: C, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 20392404
Date: 2021-04-25 00:00:00, Debit/Credit Code: C, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 2186153
Date: 2021-04-25 00:00:00, Debit/Credit Code: C, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 735
Date: 2021-04-25 00:00:00, Debit/Credit Code: C, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 6636909
Date: 2021-04-25 00:00:00, Debit/Credit Code: C, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 840
6. Intermediate Result:
JP04a_GL02_03_month JP05a_GL03_04 JP13a_BS16_01 JP13a_BS16_02 CC05hW_01_target
0 2021-04 C 152 Accounts Receivable 29217041
5. Filtered DataFrame for combination (D, 131, Regular Deposit):
JP04a JP05a JP12a BS04cc CC03dh_01 JP04a_GL02_01 ... JP01c_BS28_02 BS04cc_01 BS04cc_02 BS04cc_03 JP04a_GL02_03_month CC05hW_01_target
7 2 2 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-04 2186153
19 3 2 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-04 6636909
[2 rows x 25 columns]
Date: 2021-04-25 00:00:00, Debit/Credit Code: D, Ledger Account Number: 131, Ledger Account Name: Regular Deposit, Amount: 2186153
Date: 2021-04-25 00:00:00, Debit/Credit Code: D, Ledger Account Number: 131, Ledger Account Name: Regular Deposit, Amount: 6636909
6. Intermediate Result:
JP04a_GL02_03_month JP05a_GL03_04 JP13a_BS16_01 JP13a_BS16_02 CC05hW_01_target
0 2021-04 D 131 Regular Deposit 8823062
5. Filtered DataFrame for combination (D, 741, Payment Fees):
JP04a JP05a JP12a BS04cc CC03dh_01 JP04a_GL02_01 JP04a_GL02_03 ... JP01c_BS28_01 JP01c_BS28_02 BS04cc_01 BS04cc_02 BS04cc_03 JP04a_GL02_03_month CC05hW_01_target
13 2 5 0 0 NaN NaN 2021-04-25 ... Q5 Purchase 10% NaN NaN NaN 2021-04
735
25 3 5 0 0 NaN NaN 2021-04-25 ... Q5 Purchase 10% NaN NaN NaN 2021-04
840
[2 rows x 25 columns]
Date: 2021-04-25 00:00:00, Debit/Credit Code: D, Ledger Account Number: 741, Ledger Account Name: Payment Fees, Amount: 735
Date: 2021-04-25 00:00:00, Debit/Credit Code: D, Ledger Account Number: 741, Ledger Account Name: Payment Fees, Amount: 840
6. Intermediate Result:
JP04a_GL02_03_month JP05a_GL03_04 JP13a_BS16_01 JP13a_BS16_02 CC05hW_01_target
0 2021-04 D 741 Payment Fees 1575
5. Filtered DataFrame for combination (D, 152, Accounts Receivable):
JP04a JP05a JP12a BS04cc CC03dh_01 JP04a_GL02_01 ... JP01c_BS28_02 BS04cc_01 BS04cc_02 BS04cc_03 JP04a_GL02_03_month CC05hW_01_target
31 4 2 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-05 1245400
36 4 5 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-05 1329984
41 4 8 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-05 3964716
46 4 11 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-05 101952
51 4 14 0 0 NaN NaN ... Not Applicable NaN NaN NaN 2021-05 40800
[5 rows x 25 columns]
Date: 2021-05-31 00:00:00, Debit/Credit Code: D, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 1245400
Date: 2021-05-31 00:00:00, Debit/Credit Code: D, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 1329984
Date: 2021-05-31 00:00:00, Debit/Credit Code: D, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 3964716
Date: 2021-05-31 00:00:00, Debit/Credit Code: D, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 101952
Date: 2021-05-31 00:00:00, Debit/Credit Code: D, Ledger Account Number: 152, Ledger Account Name: Accounts Receivable, Amount: 40800
6. Intermediate Result:
JP04a_GL02_03_month JP05a_GL03_04 JP13a_BS16_01 JP13a_BS16_02 CC05hW_01_target
0 2021-05 D 152 Accounts Receivable 6682852
5. Filtered DataFrame for combination (C, 511, Sales):
JP04a JP05a JP12a BS04cc CC03dh_01 JP04a_GL02_01 JP04a_GL02_03 ... JP01c_BS28_01 JP01c_BS28_02 BS04cc_01 BS04cc_02 BS04cc_03 JP04a_GL02_03_month CC05hW_01_target
34 4 3 0 0 NaN NaN 2021-05-31 ... B5 Sales 10% NaN NaN NaN 2021-05 1245400
39 4 6 0 0 NaN NaN 2021-05-31 ... B5 Sales 10% NaN NaN NaN 2021-05 1329984
44 4 9 0 0 NaN NaN 2021-05-31 ... B5 Sales 10% NaN NaN NaN 2021-05 3964716
[3 rows x 25 columns]
Date: 2021-05-31 00:00:00, Debit/Credit Code: C, Ledger Account Number: 511, Ledger Account Name: Sales, Amount: 1245400
Date: 2021-05-31 00:00:00, Debit/Credit Code: C, Ledger Account Number: 511, Ledger Account Name: Sales, Amount: 1329984
Date: 2021-05-31 00:00:00, Debit/Credit Code: C, Ledger Account Number: 511, Ledger Account Name: Sales, Amount: 3964716
6. Intermediate Result:
JP04a_GL02_03_month JP05a_GL03_04 JP13a_BS16_01 JP13a_BS16_02 CC05hW_01_target
0 2021-05 C 511 Sales 6540100
5. Filtered DataFrame for combination (C, 521, Sales Returns):
JP04a JP05a JP12a BS04cc CC03dh_01 JP04a_GL02_01 JP04a_GL02_03 ... JP01c_BS28_01 JP01c_BS28_02 BS04cc_01 BS04cc_02 BS04cc_03 JP04a_GL02_03_month CC05hW_01_target
49 4 12 0 0 NaN NaN 2021-05-31 ... C5 Revenge 10% NaN NaN NaN 2021-05 101952
54 4 15 0 0 NaN NaN 2021-05-31 ... C5 Revenge 10% NaN NaN NaN 2021-05
40800
[2 rows x 25 columns]
Date: 2021-05-31 00:00:00, Debit/Credit Code: C, Ledger Account Number: 521, Ledger Account Name: Sales Returns, Amount: 101952
Date: 2021-05-31 00:00:00, Debit/Credit Code: C, Ledger Account Number: 521, Ledger Account Name: Sales Returns, Amount: 40800
6. Intermediate Result:
JP04a_GL02_03_month JP05a_GL03_04 JP13a_BS16_01 JP13a_BS16_02 CC05hW_01_target
0 2021-05 C 521 Sales Returns 142752
3.18. Display Final Results:
print("\n7. Final Results DataFrame with Sum by Combination:")
print(results)
Finally, we display the final results, showing the sum of CC05hW_01_target
for each combination of JP04a_GL02_03_month
, JP05a_GL03_04
, JP13a_BS16_01
, and JP13a_BS16_02
.
7. Final Results DataFrame with Sum by Combination:
JP04a_GL02_03_month JP05a_GL03_04 JP13a_BS16_01 JP13a_BS16_02 CC05hW_01_target
0 2021-04 D 150 Receivable Note 20392404
1 2021-04 C 152 Accounts Receivable 29217041
2 2021-04 D 131 Regular Deposit 8823062
3 2021-04 D 741 Payment Fees 1575
4 2021-05 D 152 Accounts Receivable 6682852
5 2021-05 C 511 Sales 6540100
6 2021-05 C 521 Sales Returns 142752
By following these steps, we ensure efficient data management in hierarchical tidy data, avoiding redundancy and accurately summarizing financial data.
4. Conclusion
Handling hierarchical tidy data requires careful consideration of how different levels of observation are reported. By merging necessary data into target rows, we can summarize amounts based on debit/credit codes, account numbers, and months, thereby avoiding redundancy and ensuring efficient data management.
The approach outlined in this post demonstrates how to handle such data using Python and pandas, providing a robust solution for financial reporting.
5. Hierarchical tidy data file
JP04a |
JP05a |
JP12a |
BS04cc |
CC03dh_01 |
JP04a_GL02_01 |
JP04a_GL02_03 |
JP04a_GL02_04 |
GL05c_01 |
BS15c_01 |
JP05a_GL03_04 |
CC05hW_01 |
JP13a_BS16_01 |
JP13a_BS16_02 |
JP12a_01 |
JP12a_02 |
JP12a_03 |
CC05hZ_01 |
JP01c_BS28_01 |
JP01c_BS28_02 |
BS04cc_01 |
BS04cc_02 |
BS04cc_03 |
1 |
0 |
0 |
0 |
2021-02-01T13:55:00 |
273.0 |
2021-04-25 |
April 25, Slip No. 201 Imai Department Store |
1.0 |
21.0 |
|||||||||||||
1 |
2 |
0 |
0 |
D |
20392404.0 |
150.0 |
Receivable Note |
0.0 |
00 |
Not Applicable |
||||||||||||
1 |
2 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
1 |
3 |
0 |
0 |
C |
20392404.0 |
152.0 |
Accounts Receivable |
0.0 |
00 |
Not Applicable |
||||||||||||
1 |
3 |
1 |
0 |
2.0 |
Imai Department Store |
Sub-Account |
||||||||||||||||
1 |
3 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
2 |
0 |
0 |
0 |
2021-02-01T13:55:00 |
274.0 |
2021-04-25 |
April 25 Paper NO202 Comprehensive Management Co., Ltd. |
1.0 |
21.0 |
|||||||||||||
2 |
2 |
0 |
0 |
D |
2186153.0 |
131.0 |
Regular Deposit |
0.0 |
00 |
Not Applicable |
||||||||||||
2 |
2 |
1 |
0 |
30.0 |
Mii Nagoya |
Sub-Account |
||||||||||||||||
2 |
2 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
2 |
3 |
0 |
0 |
C |
2186153.0 |
152.0 |
Accounts Receivable |
0.0 |
00 |
Not Applicable |
||||||||||||
2 |
3 |
1 |
0 |
7.0 |
Comprehensive management Co., Ltd. |
Sub-Account |
||||||||||||||||
2 |
3 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
2 |
5 |
0 |
0 |
D |
735.0 |
741.0 |
Payment Fees |
66.0 |
Q5 |
Purchase 10% |
||||||||||||
2 |
5 |
0 |
1 |
30.0 |
Nagoya Branch |
Department |
||||||||||||||||
2 |
6 |
0 |
0 |
C |
735.0 |
152.0 |
Accounts Receivable |
0.0 |
00 |
Not Applicable |
||||||||||||
2 |
6 |
1 |
0 |
7.0 |
Comprehensive management Co., Ltd. |
Sub-Account |
||||||||||||||||
2 |
6 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
3 |
0 |
0 |
0 |
2021-02-01T13:55:00 |
275.0 |
2021-04-25 |
April 25, Slip No. 203, Kobayashi Market Co., Ltd. |
1.0 |
21.0 |
|||||||||||||
3 |
2 |
0 |
0 |
D |
6636909.0 |
131.0 |
Regular Deposit |
0.0 |
00 |
Not Applicable |
||||||||||||
3 |
2 |
1 |
0 |
10.0 |
In front of Hokkaido Station |
Sub-Account |
||||||||||||||||
3 |
2 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
3 |
3 |
0 |
0 |
C |
6636909.0 |
152.0 |
Accounts Receivable |
0.0 |
00 |
Not Applicable |
||||||||||||
3 |
3 |
1 |
0 |
25.0 |
Kobayashi Market Co., Ltd. |
Sub-Account |
||||||||||||||||
3 |
3 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
3 |
5 |
0 |
0 |
D |
840.0 |
741.0 |
Payment Fees |
76.0 |
Q5 |
Purchase 10% |
||||||||||||
3 |
5 |
0 |
1 |
10.0 |
Sapporo Sales Office |
Department |
||||||||||||||||
3 |
6 |
0 |
0 |
C |
840.0 |
152.0 |
Accounts Receivable |
0.0 |
00 |
Not Applicable |
||||||||||||
3 |
6 |
1 |
0 |
25.0 |
Kobayashi Market Co., Ltd. |
Sub-Account |
||||||||||||||||
3 |
6 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
4 |
0 |
0 |
0 |
2021-02-01T13:55:00 |
114.0 |
2021-05-31 |
Sale for May 1st to May 31st |
1.0 |
21.0 |
|||||||||||||
4 |
2 |
0 |
0 |
D |
1245400.0 |
152.0 |
Accounts Receivable |
0.0 |
00 |
Not Applicable |
||||||||||||
4 |
2 |
1 |
0 |
5.0 |
AABC Co., Ltd. |
Sub-Account |
||||||||||||||||
4 |
2 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
4 |
3 |
0 |
0 |
C |
1245400.0 |
511.0 |
Sales |
113218.0 |
B5 |
Sales 10% |
||||||||||||
4 |
3 |
0 |
1 |
10.0 |
Sapporo Sales Office |
Department |
||||||||||||||||
4 |
5 |
0 |
0 |
D |
1329984.0 |
152.0 |
Accounts Receivable |
0.0 |
00 |
Not Applicable |
||||||||||||
4 |
5 |
1 |
0 |
14.0 |
Fujita Co., Ltd. |
Sub-Account |
||||||||||||||||
4 |
5 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
4 |
6 |
0 |
0 |
C |
1329984.0 |
511.0 |
Sales |
120907.0 |
B5 |
Sales 10% |
||||||||||||
4 |
6 |
0 |
1 |
10.0 |
Sapporo Sales Office |
Department |
||||||||||||||||
4 |
8 |
0 |
0 |
D |
3964716.0 |
152.0 |
Accounts Receivable |
0.0 |
00 |
Not Applicable |
||||||||||||
4 |
8 |
1 |
0 |
25.0 |
Kobayashi Market Co., Ltd. |
Sub-Account |
||||||||||||||||
4 |
8 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
4 |
9 |
0 |
0 |
C |
3964716.0 |
511.0 |
Sales |
360428.0 |
B5 |
Sales 10% |
||||||||||||
4 |
9 |
0 |
1 |
10.0 |
Sapporo Sales Office |
Department |
||||||||||||||||
4 |
11 |
0 |
0 |
D |
101952.0 |
152.0 |
Accounts Receivable |
0.0 |
00 |
Not Applicable |
||||||||||||
4 |
11 |
1 |
0 |
25.0 |
Kobayashi Market Co., Ltd. |
Sub-Account |
||||||||||||||||
4 |
11 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
4 |
12 |
0 |
0 |
C |
101952.0 |
521.0 |
Sales Returns |
9268.0 |
C5 |
Revenge 10% |
||||||||||||
4 |
12 |
0 |
1 |
10.0 |
Sapporo Sales Office |
Department |
||||||||||||||||
4 |
14 |
0 |
0 |
D |
40800.0 |
152.0 |
Accounts Receivable |
0.0 |
00 |
Not Applicable |
||||||||||||
4 |
14 |
1 |
0 |
5.0 |
AABC Co., Ltd. |
Sub-Account |
||||||||||||||||
4 |
14 |
0 |
1 |
0.0 |
Common Department |
Department |
||||||||||||||||
4 |
15 |
0 |
0 |
C |
40800.0 |
521.0 |
Sales Returns |
3709.0 |
C5 |
Revenge 10% |
||||||||||||
4 |
15 |
0 |
1 |
10.0 |
Sapporo Sales Office |
Department |
- 1. Characteristics of a hierarchical tidy data
- 2. Source Code and Explanation
- 3. Explanation of Each Step
- 3.1. Load and Read the CSV File:
- 3.2. Strip Leading/Trailing Spaces from Column Names:
- 3.3. Convert Relevant Columns to Appropriate Data Types:
- 3.4. Ensure JP05a_GL03_04 is Correctly Formatted as a String:
- 3.5. Convert JP04a_GL02_03 to Datetime:
- 3.6. Extract Month from JP04a_GL02_03:
- 3.7. Print Initial DataFrame:
- 3.8. Extract Target Rows:
- 3.9. Rename Target Column for Clarity:
- 3.10. Extract Rows with Only JP04a Value:
- 3.11. Merge Target Rows Back into Main DataFrame:
- 3.12. Merge JP04a Values into Main DataFrame:
- 3.13. Update Main DataFrame with Correct Values:
- 3.14. Print DataFrame After Merging and Updating:
- 3.15. Collect Unique Combinations:
- 3.16. Initialize Results DataFrame:
- 3.17. Iterate Over Each Unique Combination to Filter and Aggregate Data:
- 3.18. Display Final Results:
- 4. Conclusion
- 5. Hierarchical tidy data file