How to format a column to numeric with Pandas
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
It is highly probable that you'll end up one day with a column of integer that Pandas understands as Strings.
To fix this problem here are two solutions.
Using .to_numeric()
Using the .to_numeric() method and apply it to a column we can transform all that strings into numbers.
import pandas as pd
# We create the example dataframe
df = pd.DataFrame({"col1" : ['1','2','3','4','5','6','7']})
df["col1"] = pd.to_numeric(df["col1"])
print(df["col1"])
But sometimes it doesn't work....
Using .apply()
If we take another example
import pandas as pd
# We create the example dataframe
df = pd.DataFrame({"col1" : ['1','2','3','4','Unkown','6','7']})
We can see that the column contains a 'Unkown' string in this case.
Now the DataFrame will give an error when using the .to_numeric() method. Because not all are numbers as strings. Pandas doesn't understand the "Unkown" string.
To fix this problem we can use a lambda function.
import pandas as pd
def return_if_number(x):
"""Check wether there is a number otherwise return None"""
try:
return int(x)
except Exception as e:
print(e)
return None
# We create the example dataframe
df = pd.DataFrame({"col1" : ['1','2','3','4','Unkown','6','7']})
df["col1"].apply(return_if_number)
More on DataFrames
If you want to know more about DataFrame and Pandas. Checkout the other articles I wrote on the topic, just here :
Land Your First Data Science Job
A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.
Related Articles
Continue your learning journey with these related topics
Master Data Science in Days, Not Months 🚀
Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons – Approach used by real data scientist. Not bookworms. 📚