Dealing with Missing Values in Python

Python pandas offers a few different options to deal with null values. Based on your dataset, there will likely be a preferred method to account for null values that 1. accurately represents your data and 2. preserves a decent sample size for rigorous analyses

Option 1: Remove the null columns

We can use the following command, dropna, to remove columns that have either have all null values or any null values. Be careful when dropping columns that have any null values - there may be cases where your remaining data set would have very few results to analyze!

df=df.dropna(axis=1, how='all')
df=df.dropna(axis=1, how='any')

Option 2: Remove the null rows

Alternatively, we can use dropna to remove rows with all or any null values. This looks just like dropping columns, except the axis parameter is set to 0. Again, use discretion when dropping null rows to ensure your remaining results are representative of the larger set of data.

Option 3: Replace the null values

We can also pick a value that replaces the missing values. For this, we use the fillna function:

df[col]=df[col].fillna(value)

"Value" can either be a static number (such as 0), or it can just as easily be a summary metric that best represents your data, such as a median or a mean.

How does your data team handle null values? Share your use cases below!

1reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • There may be times when backfilling or using a static value isn't sufficient for handling null values. In the cases that the missing values are numeric, the interpolate function can be used!

    For example, let's say this is our data:
     

    We can use python to fill in those three blank values with the following code:

     df["y"] = df["y"].interpolate(method="quadratic")
    

    This will give the following result:

     

    Pretty good!! We can round this by appending .round() to the end of the line:

     df["y"] = df["y"].interpolate(method="quadratic").round()
    

    Quadratic interpolation is just one of the many ways the values can be interpolated. See the Pandas Documentation for more, including cubic and polynomial!

    Reply Like
Like2 Follow
  • 2 Likes
  • 9 mths agoLast active
  • 1Replies
  • 449Views
  • 4 Following