Data Fitting with Bezier curve

Click For Summary

Discussion Overview

The discussion centers on the methodology for fitting a cubic Bezier curve to a set of data points, exploring various approaches to establish a least squares fitting technique. Participants delve into the mathematical formulation, challenges in determining parameter values, and alternative strategies for refining control points.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant outlines the parametric equations for cubic Bezier curves and expresses difficulty in setting up a least squares problem due to challenges in determining the value of t for given x values.
  • Another participant suggests two methods for initial guesses of t values: uniform distribution and chord-length parameterization, referencing a paper by Koichi Itoh.
  • A method involving the solution of a 5th degree equation for refining t values is proposed, along with an alternative approach using numerical gradient descent to minimize total square error.
  • One participant describes a constrained least squares problem using Lagrange multipliers, noting the tedious nature of the calculations but asserting that it leads to decoupled equations for the unknown coefficients.
  • There is mention of a stochastic gradient method yielding good results, with a focus on the need for a suitable starting point for control points.
  • Another participant notes that a previously shared link for a fitting program is no longer accessible and seeks alternatives.

Areas of Agreement / Disagreement

Participants express various methods and approaches to the problem, indicating that multiple competing views remain without a consensus on a single best method for fitting Bezier curves to data points.

Contextual Notes

Challenges include determining the appropriate values of t for data points, the complexity of the mathematical formulations, and the need for iterative refinement of control points. Some methods proposed may depend on specific assumptions or initial conditions.

Who May Find This Useful

This discussion may be of interest to those involved in computational geometry, data fitting, and curve modeling, particularly in contexts where Bezier curves are applicable.

hotvette
Homework Helper
Messages
1,001
Reaction score
11
Does anyone know how to fit a cubic Bezier curve to a given set of data points? If so, I'd appreciate some coaching on the methodology.

Bezier curves have separate equations for x and y in a parametric variable t that varies from 0 to 1:

x = at3 + bt2 + ct + d

y = et3 + ft2 + gt + h

the 8 unknowns are a function of 4 control points:

xc = (x0, x1, x2, x3)
yc = (y0, y1, y2, y3)

thus, for a given set of values for the 4 control points, all 8 coefficients are determined and the x,y values of the Bezier curve can be calculated.

But, the equations are de-coupled in the sense that a,b,c,d depend ONLY on the x-values of the 4 control points (xc) and e,f,g,h depend ONLY on the y-values of the 4 control points (yc).

I've been trying (in vain so far) to set up the curve fitting as a least squares problem. Part of the problem is the difficulty of finding the right value of t for a given x of a data point being fitted, since x is cubic in t. One approach is to solve the cubic equation, but I've found that Newton-Raphson works quite well.

It turns out that solving for the y-values of the 4 control points (yc) is a straightforward linear least squares problem because the equations are linear in yc. Thus, for an initial guess of the x-values of the control points (xc) the values of t for each data point can be determined (e.g. by Newton-Raphson), and in a single least squares calculation, the y-values of the control points (yc) can be determined to minimize the square error in vertical distance between the data points and the calculated curve.

The problem is how to get updated x-values of the control points (xc). :frown:

I've searched the internet extensively and though have found references to least squares solutions to Bezier curves, I've never found enough detail to figure out how to do it. :confused:

Any/all hints/tips would be appreciated. Perhaps an approach other than least squares is better. Thanks!
 
Last edited:
Physics news on Phys.org
Last edited by a moderator:
That link you posted doesn't exist anymore. Does anyone know where to find that code now?
 
Hello:

I'm not sure if you still have interest in this. I was looking into this problem and found this thread. Now that I've made up my mind on the subject, I thought I would share it here.

As stated, the problem is to find the t_i values for the input data points, so that the LS formula can be used to compute the control points.

Two alternatives for initial guesses:

1. Distribute them uniformly in [0,1] according to the number of data
points N: 0, 1/N, 2/N, ...
2. Use chord-length parameterization, which seems to involve a polynomial fit
up to point ith, then measuring the arc length
See Koichi Itoh: "A curve fitting algorithm for character fonts"
ELECTRONIC PUBLISHING, VOL. 6(3), 195–205 (SEPTEMBER 1993)

Now on how to refine the t_i values.

In Mr Itoh's paper, the suggested strategy for updating t_i is:
solve the 5th degree equation in t_i (e.g., with Newton-Raphson):
[B(t)-P_i]B'(t) = 0
with B'(t) the tangent to the Bezier B() at point t
Then the new t_i values are used to generate the LS control points. Iterate

Or, an alternative approach to obtain refined control points can be used.
This deviates somewhat from the original problem, but it results in a good fit nevertheless.

One can treat the total square error as a function of the control points, and use the control points from the initial guess (obtained with either 1. or 2. above) to start a numerical gradient descent. There exists more than one choice of gradient descent.
For example, a signed stochastic gradient approximation iteration reads:

- Let the current control points be C = (x0, x1, x2, x3, y0, y1, y2, y3)
- Perturb them, e.g., with a single random perturbation vector we generate diametrical
candidate vectors:

Cp = C + mu*delta
Cm = C - mu*delta
where mu > 0 is a step size parameter
- Measure the square error for Cp and for Cm
For each data point:
* Compute t_i, e.g. with the cubic root or with Newton-Raphson.
This can be done with any coordinate of the data point
* Find the values B(t_i) and compute the total square error
- Update the control points:
If Cp gives smaller square error, then do C = Cp and iterate
Otherwise do C = Cm and iterate
The iterations go until maximum of iterations is reached or the
square error changes less than a prescribed threshold

You can also use more than two perturbations, e.g., generate 8 random perturbation
vectors and do the plus-minus perturbations for each
Then you need to evaluate the cost function 16 times, but the convergence
may take less iterations

Alternative stochastic methods in the same spirit include the
"random walk with direction exploitation" method, see e.g.,
S. Rao, Optimization: Theory and Applications

I've been using the sotchastic signed gradient method with good results.
The only problem was that I had to supply the starting point manually.
From now on, I will use the control points from the
LS estimate with uniform allocation as starting point

One last thing. I believe that the aforementioned dead link refers to a file in the
Matlab Central: "cubic Bezier least square fitting"

Thanks for the original post and good luck!



Eduardo
 
  • Like
Likes   Reactions: hunt_mat
It took a long time to figure out how to approach it, but I was able to solve the problem (with brief guidance from a statistics professor friend). Let's say that y = f(a,b,t) and x = g(c,d,t) and the data points are (xi,yi). The least squares problem can be stated as the following constrained problem:

min \sum(f-y_i)^2 s.t. g-x_i=0

Using the langrange multiplier method (one constraint for each data point) the math is straightforward though tedious and results in a set of decoupled equations for the unknown coefficients (a,b,c,d using the above notation) and the ti. Also, since the ti are unknown, a nonlinear approach is needed (i.e. linearizing the equations before taking partial derivatives and setting them equal to zero). The lagrange multipliers conveniently fall out and never have to be computed.

I've since convinced myself that orthogonal least squares makes more sense when using parametric equations since the curves can wrap (e.g. general conic like a rotated ellipse).
 
Last edited:

Similar threads

  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 6 ·
Replies
6
Views
6K
  • · Replies 14 ·
Replies
14
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 12 ·
Replies
12
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 10 ·
Replies
10
Views
3K
Replies
2
Views
2K