How to Create a New DataFrame from an Existing DataFrame in Python
Creating a new DataFrame from an existing one is a fundamental operation in data manipulation using Python’s pandas library.
In this article, we cover two methods:
- Using the = Assignment Operator
- Using the copy() Function
The following examples show how to create a new dataframe from an existing dataframe in Python using two different methods.
Using = Assignment Operator
The = operator allows you to assign the content of an existing DataFrame to a new variable, effectively creating a reference to the same object.
Here’s how you can create a new DataFrame using the = operator:
# Import pandas library
import pandas as pd
# Create dataframe
old_df = pd.DataFrame({
'Date': ['01-03-2023', '01-03-2023', '01-03-2023', '01-03-2023', '02-03-2023', '02-03-2023'],
'Product_Code': ['A-101', 'A-102', 'A-103', 'B-101', 'B-102', 'B-104'],
'Product_Name': ['Laptop', 'Mobile', 'Printer', 'Keyboard', 'Scanner', 'Mouse'],
'Price': [4500, 550, 250, 50, 350, 50],
'Status': [1, 1, 1, 0, 1, 1]
})
# Create new dataframe
new_df = old_df
# Show new dataframe
print(new_df)
Output: 👇️
Date Product_Code Product_Name Price Status
0 01-03-2023 A-101 Laptop 4500 1
1 01-03-2023 A-102 Mobile 550 1
2 01-03-2023 A-103 Printer 250 1
3 01-03-2023 B-101 Keyboard 50 0
4 02-03-2023 B-102 Scanner 350 1
5 02-03-2023 B-104 Mouse 50 1
As the output shows, a new dataframe is created from the existing dataframe using the = operator.
Note: This method does not create an independent copy of the DataFrame. Changes to new_df will also affect old_df, as they reference the same object in memory.
Using copy() Function
The copy() function creates an independent duplicate of the original DataFrame, ensuring changes to the new DataFrame do not affect the original.
Here’s how to use the copy() function:
# Import pandas library
import pandas as pd
# Create dataframe
old_df = pd.DataFrame({
'Date': ['01-03-2023', '01-03-2023', '01-03-2023', '01-03-2023', '02-03-2023', '02-03-2023'],
'Product_Code': ['A-101', 'A-102', 'A-103', 'B-101', 'B-102', 'B-104'],
'Product_Name': ['Laptop', 'Mobile', 'Printer', 'Keyboard', 'Scanner', 'Mouse'],
'Price': [4500, 550, 250, 50, 350, 50],
'Status': [1, 1, 1, 0, 1, 1]
})
# Create new dataframe
new_df = old_df.copy()
# Show new dataframe
print(new_df)
Output: 👇️
Date Product_Code Product_Name Price Status
0 01-03-2023 A-101 Laptop 4500 1
1 01-03-2023 A-102 Mobile 550 1
2 01-03-2023 A-103 Printer 250 1
3 01-03-2023 B-101 Keyboard 50 0
4 02-03-2023 B-102 Scanner 350 1
5 02-03-2023 B-104 Mouse 50 1
In the above code, the copy() function is used to create a new dataframe from the old dataframe.
Finally, we show the new dataframe to check whether it is created correctly or not.
Note: This method ensures complete independence between old_df and new_df, ensures the original DataFrame from unintended modifications.
Conclusion
To create a new DataFrame from an existing one:
- Use the = operator when you don’t need an independent copy and are working with the same data.
- Use the copy() function when independence is critical, especially in pipelines or when the DataFrame will undergo modifications.