Choosing the optimal curve from a discrete dataset

• MATLAB

Summary:

Problem with choosing choosing an optimal curve based on integral, length and curvature, when it's created from descrete data points and not a analytical function

Main Question or Discussion Point

Hello,

I'm currently working on an assignment which requires me to choose an optimal curve of power generation based on data points generated by a script I wrote (attached for reference, TideHeight1s is the source data for the script, the txt file contains the code for the .m script).
The purpose is to present the potential for power generation from a tidal lagoon.
The power is therefore derived from the water discharge, which I calculate by iterating over a number of values for turbine shafts and height differences between lagoon water level and sea level, given in the form of discrete data points per 1 second interval.

The problem I have is that this generates several thousand possible power generation curves .
What I'm trying to do is choose one that is a balance between maximum total energy (area under the graph) delivered for the longest possible time most consistently (graph with the lowest curvature).

I have made attempts at that, as shown in the script itself, by playing around with taking a sum of all point and multiplying it by the time they're non-zero, taking the mode and multiplying that by the time and trying to include deviation from average for a flatter graph (that doesn't really work, because it preferres smallest graphs).

So basically, I don't know how to optimise for curvature based on discrete data.

Attachments

• 5.9 KB Views: 21
• 271.9 KB Views: 17
Last edited:

Related MATLAB, Maple, Mathematica, LaTeX News on Phys.org
Are you saying that the data in TideHeight1s needs the curve fit? It looks very smooth for data. Rather than being data, which I would expect to have some spread, it looks like it was generated as a trig function.

Are you saying that the data in TideHeight1s needs the curve fit? It looks very smooth for data. Rather than being data, which I would expect to have some spread, it looks like it was generated as a trig function.
Sorry, I now realise it wasn't clear. TideHeight1s is source data for the .m script, the code for which I've attached in the .txt file. The data I want to optimise is generated by the script itself as a 3D array.
I'll edit it into the OP.

I can export the end data if you think it'll make more sense.

Last edited:
O.k., it might be better to simply post the data you want to fit itself, because there are some missing functions that I don't have so your script won't run. I.e.,

Matlab:
Undefined function or variable 'rmmissing'.

Since your question is really about the data... post the data in an csv or xlsx format.

O.k., it might be better to simply post the data you want to fit itself, because there are some missing functions that I don't have so your script won't run. I.e.,

Matlab:
Undefined function or variable 'rmmissing'.

Since your question is really about the data... post the data in an csv or xlsx format.
Ok, can't do it right now, but I will in a few hours.

As to the function, I wrote the script in MATLAB R2018a, is that the issue? Because rmmissing just deletes NaN values from the matrix.

Dale
Mentor
I am not a matlab guy so I can’t help with your code, but it sounds like you have a multivariate optimization problem, which I do know a bit about. You want to both maximize total energy and minimize variability. Is that correct?

I am not a matlab guy so I can’t help with your code, but it sounds like you have a multivariate optimization problem, which I do know a bit about. You want to both maximize total energy and minimize variability. Is that correct?
Since the problem is about power generation (hydroelectricity), time is also a factor, i.e. if delivering a lot of power over short time gives a lot of total energy with low variability, but is not fit for purpose.

That is actually a problem I run into, mostly because of working with an ideal situation (i.e. in real life there would be engineering and/or economical constraints putting a ceiling to the power ~ discharge relation).

The input data is close to, if not exactly a sinusoid, modeling a tide. The "water" is kept in the "lagoon" until an arbitrary difference in height. The power generated is a sum of cubes of the water discharged per "turbine shaft", which in turn is a function of water level height difference.

That means if I wait until the very lowest point of the sinusoid and "let all the water out" through a large number of "turbine shafts" (I can go into thousands if I wish, but my computer run out of memory to process it), I generate a significant power spike with a lot of total energy, and it's relatively flat. But it's also a very short.

Those were "optimal" options my script spat out when I took the area under each curve and divided it by it's deviation from average (to choose for flattest), but did not multiply it by it's total time.

That's why in the OP I explained how I intended to instead optimise it by maximising total energy and time, while also minimising the curvature.

I hope that sufficiently explains the problem as I see it. However, I'm open to suggestions if you have an idea how to formulate it more efficiently.

EDIT: I don't know a lot about fluid dynamics, so it only occurred to me now to see if there would be some maximum discharge or water velocity dictated by physics to put SOME constraint on the max power generated.

Last edited:
O.k., it might be better to simply post the data you want to fit itself, because there are some missing functions that I don't have so your script won't run. I.e.,

Matlab:
Undefined function or variable 'rmmissing'.

Since your question is really about the data... post the data in an csv or xlsx format.
I'm sorry, I got back home and was trying to export the data to either an xlsx or txt, but the data consists of two 21601x40x50 arrays. Each file ends up being about 500 megabytes.
And it's either that or 100 ~10MB files. Either way it's about a gigabyte of data.

Dale
Mentor
I hope that sufficiently explains the problem as I see it. However, I'm open to suggestions if you have an idea how to formulate it more efficiently.
So what are your optimization criteria and constraints? I.e. the set of quantities that you want to minimize or maximize and the set of quantities that you don’t want to either minimize or maximize but which must be within some range?

So what are your optimization criteria and constraints? I.e. the set of quantities that you want to minimize or maximize and the set of quantities that you don’t want to either minimize or maximize but which must be within some range?
I'm not sure if I have the right words to put it in proper terms out of context, but perhaps I can illustrate it.

Below are two charts my script generates (the timesteps for the x axis are 1s)

This graph shows the modeled level of the sea vs. level of water in a lagoon basin. The water drops according to the level of water discharge (m^3/s) at each timestep. Discharge depends on the level difference and numer of "turbine shafts" - higher difference or more turbines, more discharge which also gives us the slope.

The starting points are determined by an arbitrary start height difference, which is iterated from 0.1 m, by 0.1m to (currently) 4m. The number of turbines is also iterated.

From the value of each momentary discharge, I can calculate the velocity per turbine (the discharge divided by the number of turbines and the crossection of a single shaft, which is constant) per timestep.

I have to calculate velocity per turbine, because the Power ~ V^3. From the velocity per turbine I calculate the power per turbine, then multiply by the number of turbines to get the value of Power at each timestep.

Because I iterate over 40 values of start heights and 50 turbine shafts, that gives me 2000 power curves (and the water graph corresponding to them).

The different curves currently shown were chosen from the overall dataset based on the following criteria:

Sum P - sum of all values of Power multiplied by the square of the non-zero time (sum P * time^2)

Mode P - the mode of all non-zero values of Power multiplied by the square of the non-zero time (mode P * time^2)

Sum/Dev P - the sum divided by the deviation from mean on all non-zero values multiplies by time squared (sum P / dev P * time^2)

Mode/Dev P - the mode divided by the deviation from mean by time squared (Mode P / Dev P * time^2)

Mode More turbines is me checking for an arbitrary number of turbines at the head hight for Mode to see what will happen.

The details for them are as follows:
Optimal height to open sluice for Sum is 0.1meters
Optimal number of turbines for Sum is 13
Total energy (GJ) 450.715059
Optimal height to open sluice for Mode is 2.9meters
Optimal number of turbines for Mode is 13
Total energy (GJ) 574.033078
Optimal height to open sluice for Sum/Dev is 4.0meters
Optimal number of turbines for Sum/Dev is 10
Total energy (GJ) 524.702127
Optimal height to open sluice for Mode/Dev is 2.2meters
Optimal number of turbines for Mode/Dev is 17
Total energy (GJ) 564.558588
Optimal height to open sluice for Mode More turbines is is 2.9meters
Optimal number of turbines for Mode More turbines is 20
Total energy (GJ) 610.48443

Currently, the most optimal setup seems to be Mode/Dev, as while it does not collect the most total energy or has the highest peak power, it delivers reasonable power for the longest time.

What I'm currently trying to achieve, is to flatten those curves somehow so that the power is delivered more consistently, but without losing too much total energy (altough I do assume it will be lower), i.e. raise the initial and end power, even if peak power drops. That is something I don't know how to do, what I've so far done is the limit of my skill.

I can elaborate on further specifics if you require.

As a note to the previous post, I've noticed that Sum/Dev and Mode/Dev should be swapped, as I mixed them up in the code.

Also, on second though, I'd rather lenghten and flatten the very last one, "Mode More turbines" rather than just flattening Sum/Dev, as it starts at higher power and already has less curvature, while ultimately dropping to the same level as Sum/Dev