How to smooth a line using Python - 4 Methods

7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

What is Line Smoothing?

Line smoothing is a technique used to reduce noise and highlight trends in a line plot. A line plot is a graph that shows the relationship between two variables as a series of data points connected by a line. However, real-world data is often noisy, making it difficult to identify trends in the plot.

Smoothing the line can help to remove the noise and make the trends more visible.

There are many methods for smoothing a line, but in this article, we'll focus on four popular approaches: using a rolling window, using fewer observations, using a spline, and using savgol_filter() from the SciPy library.

1. Using a Rolling Window

One of the simplest methods for smoothing a line is to use a rolling window. This involves taking a window of a fixed size and sliding it along the data points in the plot. For each window position, the data points within the window are averaged to create a smoothed data point.

Here's an example script that demonstrates how to use a rolling window to smooth a line using the rolling() method from the Pandas library:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Generate some noisy data
data = pd.DataFrame({'x': range(100), 'y': np.random.normal(0, 0.1, 100)})

# Apply a rolling window to smooth the data
window_size = 10
data['y_smoothed'] = data['y'].rolling(window_size).mean()

# Plot the original data and the smoothed data
plt.plot(data['x'], data['y'], label='Original')
plt.plot(data['x'], data['y_smoothed'], label=f'Smoothed (window size = {window_size})')
plt.legend()
plt.show()
Using a rolling window
Using a rolling window

In this example, we generate some noisy data using np.random.normal(), and then we use the rolling() method from the Pandas library to apply a rolling window to the data. The window_size parameter specifies the size of the rolling window used for smoothing.

Finally, we plot the original data and the smoothed data using plt.plot(). You can adjust the window_size parameter to achieve the desired level of smoothing for your data.

2. Using Fewer Observations

Another method for smoothing a line is to use fewer observations. This involves sampling the data points at a lower frequency to remove some of the noise.

Here's an example script that demonstrates how to use fewer observations to smooth a line using the resample() method from the Pandas library:

import pandas as pd
import matplotlib.pyplot as plt

# DataFrame size
size = 100

# Generate some noisy data
data = pd.DataFrame({'x': range(size), 'y': np.random.normal(0, 0.1, size)})

# Resample the data to smooth it
sample_size = 10
data_resampled = data.iloc[::(size//sample_size)]

# Plot the original data and the smoothed data
plt.plot(data['x'], data['y'], label='Original')
plt.plot(data_resampled['x'], data_resampled['y'], label=f'Smoothed (sample size = {sample_size})')
plt.legend()
plt.show()
Using fewer observations
Using fewer observations

This script uses Pandas and Matplotlib to generate some noisy data, resample it to smooth it, and plot the original and smoothed data.

First, the script sets the size of the DataFrame to 100 and generates some noisy data using the np.random.normal() function. The pd.DataFrame() function is used to create a DataFrame with two columns, x and y.

Next, the sample size is set to 10 and the data is resampled to smooth it by selecting every size//sample_sizeth row using the iloc[] method and :: slicing. The result is a new DataFrame with fewer observations than the original data.

Finally, the script plots the original data and the smoothed data using Matplotlib's plt.plot() function. The original data is plotted using the x and y columns of the original DataFrame, while the smoothed data is plotted using the x and y columns of the resampled DataFrame. A legend is added using plt.legend() and the plot is displayed using plt.show().

3. Using a Spline

A spline is a mathematical curve that passes through a set of data points and can be used to interpolate or smooth the line. In Python, you can use the make_interp_spline() function from the SciPy library to create a spline interpolation of the data.

Here's an example script that demonstrates how to use a spline to smooth a line:

import numpy as np
from scipy.interpolate import make_interp_spline
import matplotlib.pyplot as plt
import numpy as np

# Generate some noisy data
x = np.linspace(0, 10, 100)
y = np.sin(x) + np.random.normal(0, 0.1, 100)

# Create a spline interpolation of the data
spl = make_interp_spline(x, y)

# Create a new set of x values to evaluate the spline at
x_new = np.linspace(x.min(), x.max(), 300)

# Evaluate the spline at the new x values
y_smoothed = spl(x_new)

# Plot the original data and the smoothed data
plt.plot(x, y, label='Original')
plt.plot(x_new, y_smoothed, label='Smoothed')
plt.legend()
plt.show()
Using splines
Using splines

In this example, we generate some noisy data using np.sin() and np.random.normal(), and then we use make_interp_spline() to create a spline interpolation of the data. The make_interp_spline() function takes two arguments, x and y, which are the original data points.

Next, we create a new set of x values using np.linspace() to evaluate the spline at. We then use the spline object spl to evaluate the y values at the new x values using spl(x_new).

Finally, we plot the original data and the smoothed data using plt.plot(). You can adjust the number of points in the x_new array to control the level of smoothing.

4. Using savgol_filter()

savgol_filter() is a function from the SciPy library that performs a polynomial smoothing of a signal, which can be used to smooth out the noise in a line. Here's an example script that demonstrates how to use savgol_filter() to smooth a line:

import numpy as np
from scipy.signal import savgol_filter
import matplotlib.pyplot as plt
import numpy as np

# Generate some noisy data
x = np.linspace(0, 10, 100)
y = np.sin(x) + np.random.normal(0, 0.1, 100)

# Apply a Savitzky-Golay filter to smooth the data
y_smoothed = savgol_filter(y, window_length=11, polyorder=2)

# Plot the original data and the smoothed data
plt.plot(x, y, label='Original')
plt.plot(x, y_smoothed, label='Smoothed')
plt.legend()
plt.show()
Using a Savgol Filter
Using a Savgol Filter

In this example, we generate some noisy data using np.sin() and np.random.normal(), and then we use savgol_filter() to apply a Savitzky-Golay filter to the data. The window_length parameter specifies the length of the window used for smoothing, and the polyorder parameter specifies the order of the polynomial used for the fitting. In this case, we're using a window length of 11 and a polynomial order of 2.

Finally, we plot the original data and the smoothed data using plt.plot(). You can adjust the window_length and polyorder parameters to achieve the desired level of smoothing for your data.

What is the best smoothing method?

There is no single "best" method for smoothing a line as it depends on the characteristics of the data and the specific goals of the analysis.

The rolling window method is simple and easy to implement, but it may not be effective for all types of data and can introduce edge effects.

Taking fewer observations is also simple, but it may not be suitable for data that requires a high level of precision and accuracy.

Spline interpolation can be effective for smoothing out data with complex patterns, but it can also introduce artifacts if the data is not well-suited for interpolation.

Savitzky-Golay filtering can be a good choice for data with a lot of noise, but the choice of window length and polynomial order can be a challenge and may require some trial and error.

Ultimately, the best method for smoothing a line will depend on the specific characteristics of the data and the goals of the analysis. It may be necessary to try out different methods and compare their results to determine the most appropriate approach for a given situation.

More on statistics

If you liked what you read and want to know more about how to apply Statistics in Python and avoid a few headaches... check out the other articles I wrote by clicking just here:

Financial Analysis
Learn the tools that big banks and quants companies are using for complex financial modeling.
7-Day Challenge

Land Your First Data Science Job

A proven roadmap to prepare for $75K+ entry-level data roles. Perfect for Data Scientist ready to level up their career.

Build portfolios that hiring managers love
Master the Python and SQL essentials to be industry-ready
Practice with real interview questions from tech companies
Access to the $100k/y Data Scientist Cheatsheet

Join thousands of developers who transformed their careers through our challenge. Unsubscribe anytime.

Free Newsletter

Master Data Science in Days, Not Months 🚀

Skip the theoretical rabbit holes. Get practical data science skills delivered in bite-sized lessons – Approach used by real data scientist. Not bookworms. 📚

Weekly simple and practical lessons
Access to ready to use code examples
Skip the math, focus on results
Learn while drinking your coffee

By subscribing, you agree to receive our newsletter. You can unsubscribe at any time.