How to Create a New DataFrame from an Existing DataFrame in Python

Creating a new DataFrame from an existing one is a fundamental operation in data manipulation using Python’s pandas library.

In this article, we cover two methods:

  1. Using the = Assignment Operator
  2. Using the copy() Function

The following examples show how to create a new dataframe from an existing dataframe in Python using two different methods.

Using = Assignment Operator

The = operator allows you to assign the content of an existing DataFrame to a new variable, effectively creating a reference to the same object.

Here’s how you can create a new DataFrame using the = operator:

# Import pandas library
import pandas as pd

# Create dataframe
old_df = pd.DataFrame({
    'Date': ['01-03-2023', '01-03-2023', '01-03-2023', '01-03-2023', '02-03-2023', '02-03-2023'],
    'Product_Code': ['A-101', 'A-102', 'A-103', 'B-101', 'B-102', 'B-104'],
    'Product_Name': ['Laptop', 'Mobile', 'Printer', 'Keyboard', 'Scanner', 'Mouse'],
    'Price': [4500, 550, 250, 50, 350, 50],
    'Status': [1, 1, 1, 0, 1, 1]
})

# Create new dataframe
new_df = old_df

# Show new dataframe
print(new_df)

Output: 👇️

         Date Product_Code Product_Name  Price  Status
0  01-03-2023        A-101       Laptop   4500       1
1  01-03-2023        A-102       Mobile    550       1
2  01-03-2023        A-103      Printer    250       1
3  01-03-2023        B-101     Keyboard     50       0
4  02-03-2023        B-102      Scanner    350       1
5  02-03-2023        B-104        Mouse     50       1

As the output shows, a new dataframe is created from the existing dataframe using the = operator.

Note: This method does not create an independent copy of the DataFrame. Changes to new_df will also affect old_df, as they reference the same object in memory.

Using copy() Function

The copy() function creates an independent duplicate of the original DataFrame, ensuring changes to the new DataFrame do not affect the original.

Here’s how to use the copy() function:

# Import pandas library
import pandas as pd

# Create dataframe
old_df = pd.DataFrame({
    'Date': ['01-03-2023', '01-03-2023', '01-03-2023', '01-03-2023', '02-03-2023', '02-03-2023'],
    'Product_Code': ['A-101', 'A-102', 'A-103', 'B-101', 'B-102', 'B-104'],
    'Product_Name': ['Laptop', 'Mobile', 'Printer', 'Keyboard', 'Scanner', 'Mouse'],
    'Price': [4500, 550, 250, 50, 350, 50],
    'Status': [1, 1, 1, 0, 1, 1]
})

# Create new dataframe
new_df = old_df.copy()

# Show new dataframe
print(new_df)

Output: 👇️

         Date Product_Code Product_Name  Price  Status
0  01-03-2023        A-101       Laptop   4500       1
1  01-03-2023        A-102       Mobile    550       1
2  01-03-2023        A-103      Printer    250       1
3  01-03-2023        B-101     Keyboard     50       0
4  02-03-2023        B-102      Scanner    350       1
5  02-03-2023        B-104        Mouse     50       1

In the above code, the copy() function is used to create a new dataframe from the old dataframe.

Finally, we show the new dataframe to check whether it is created correctly or not.

Note: This method ensures complete independence between old_df and new_df, ensures the original DataFrame from unintended modifications.

Conclusion

To create a new DataFrame from an existing one:

  • Use the = operator when you don’t need an independent copy and are working with the same data.
  • Use the copy() function when independence is critical, especially in pipelines or when the DataFrame will undergo modifications.