How to calculate arithmetic sum of multiple strings

Monique · May 22, 2013

I have a MySQL database and I have the problem that queries take way too much time. I want to optimize the database and one way would be to save data into a string, instead of in rows (the string would be replicates of the same condition).

Each string is an interval of 4 seconds, to reduce the number of rows further I want to sum the data into 30-minute intervals. How can I sum strings? I'd like to use PHP for that. Shall I push replicates into an array?

NULL values may be present

To illustrate:
time 1: 0,1,0,1,1,1,0
time 2: 1,1,0,,0,1,1

sum time 1+2: 1,2,0,1,1,2,1

Kholdstare · May 22, 2013

You can not add strings together. There will never be a data structure that supports addition of strings. It will be very confusing for fixed and floating point arithmetic to handle strings.

Also storing data in strings will not save time or space. In fact it will take additional space of a NULL character. The best way would be using dynamic memory allocation and pointers.

Also if NULL value has the same effect as 0 why not replace it with a 0?

I'd rather write a C program to operate on the arrays.

Monique · May 22, 2013

Thanks for your reply! It would save space in that I only have to declare the conditions once, as in:
A B C D E #1
A B C D E #2
A B C D E #3
A B C D E #4

would become
A B C D E #1#2#3#4 (SELECT will be a lot faster)

I've dumped everything in one table, since otherwise I'd end up with at least 350 tables that would slow down queries as well.

NULL won't have the same effect as 0, if there are NULL values in a 30-minute segment I need to exclude that segment due to missing data (I'm monitoring activity over time and sometimes test subjects (fruitflies) die).

Dynamic memory allocation... I did change the MySQL config settings between normal, large and huge database. I think that changes the way memory is handled? It didn't make a difference. Also memory is not limiting in the query, the CPU does go up considerably.

Pointers... I must admit I'm not sure what you mean with that. I've worked with arrays, but I've never really mastered them and I've never used them as a database field.

Kholdstare · May 22, 2013

You can never add strings, anyway. So you can put your data into arrays and operate on them. You need pointers to do it efficiently. But I think you can do without it.

However, if the data is meant for permanent storage, why not do all the processing on it before putting it into SQL. You might write a small program with an input and output file. Just dump the data in the input file with some predefined formatting (standard ones are CSV or tab separated), run the program once and get the processed output file and copy it into SQL database. If you can write wrappers and have SQL libraries you can even automate the process of copying output data into database.

Dynamic memory allocation has nothing to do with MySQL configuration settings. It is a programming style where you create variable when you need it and delete them from memory when you do not need it any longer. Every sophisticated program has it. That's why Microsoft Word does not reserve RAM for 300 pages of data beforehand when you start it. But once you write 300 pages the RAM used by Microsoft Word grows to that size.

And so when you encounter a NULL value you discard the whole set. Right? If you describe the operation you are supposed to do and the format of data thoroughly I can come up with a more robust solution.

Monique · May 23, 2013

Kholdstare said:

And so when you encounter a NULL value you discard the whole set. Right?

Yes, a NULL value means the set is not complete.

If you describe the operation you are supposed to do and the format of data thoroughly I can come up with a more robust solution.

Well, it's complicated...

I have 8 columns that specify the experiment
1. date of start experiment
2. monitor# (data is collected in separate devices)
3. date/time (data collection time, 4-s interval) <- 302400 time points (two weeks)
4. promotor
5. lesion
6. type
7. replicate#
8. light (on/off)

Column 9 is then "activity": the actual measurement that is taken. There are 302400 measurements for each experiment.

I have 275 distinct experiments (column 1, 4, 5, 6), over 4 monitors = 1100 conditions.
I have 2905 replicates, with 302400 timepoints = 966 billion records.

I'm no expert, but I think that's a lot..

I can't even begin to describe the queries that need to be done on it, after converting to a 30-min time interval, it involves checking whether lights are on or off, find experiments where light has been off for two days, from those check how many days are alternating light on/off, take the average of those of the time period to recreate an "average" day, append the days of complete darkness, extract sub-timepoints, then do the whole thing again but for control experiments, with the condition that the data must be collected within the same experiment as the mutant.

I've tried that on a database with 9.5 million records (1 monitor file) and it takes 20 minutes to complete. Ack.

I could use an array to fill the activity and use a separate table that defines the array values. For instance the first array value would be a certain: promotor, lesion, type, replicate#. That would introduce more NULLs, but hopefully doesn't slow things down?

I'm meeting with an information manager next week, hopefully that will help...

Monique · May 23, 2013

For the 4-second time interval I convert them to 0s 1s (I've observed that anything higher is a measurement error). The larger time intervals would be integers, the averages of those intervals would be decimals.

Considering that I'm approaching a Trillion rows I think I'll have to step off the idea of putting all raw data in a database.

DavidSnider · May 23, 2013

Monique said:

For the 4-second time interval I convert them to 0s 1s (I've observed that anything higher is a measurement error). The larger time intervals would be integers, the averages of those intervals would be decimals.

Considering that I'm approaching a Trillion rows I think I'll have to step off the idea of putting all raw data in a database.

Well, once you normalize the condition data the size won't be that big of a deal. 966billion bits is 112GB which is totally manageable.

nvn · May 25, 2013

Monique: (1) If your example data in post 1 is in a file named in01, which therefore contains lines such as the following, then go to item 2, below.

time 1: 0,1,0,1,1,1,0
time 2: 1,1,0,,0,1,1

(2) Type the following code into a file named myprog01.sh.

#!/bin/sh
# Created: 2013-05-25, nvn.
sed -e 's/://' -e 's/ /,/g' in01 \
| awk -F, '/^time/ {c3=c3+$3;c4=c4+$4;c5=c5+$5;c6=c6+$6;
c7=c7+$7;c8=c8+$8;c9=c9+$9;rcount=rcount+1;
if(nullflag==""){for(ii=1;ii<=NF;ii++){if($ii=="")nullflag="1"}};
if(rcount==450){halfh=halfh+1;
if(nullflag=="")printf("half_hour,%d,%d,%d,%d,%d,%d,%d,%d\n",
halfh,c3,c4,c5,c6,c7,c8,c9);
rcount=0;nullflag="";
c3=0;c4=0;c5=0;c6=0;c7=0;c8=0;c9=0}}' > out01

(3) Issue the following command at the linux bash shell prompt: sh myprog01.sh

See if file out01 now contains what you want.

Note that the above command skips any lines that do not contain the label "time" beginning in column 1. If your data does not really contain "time 1: " labels, as shown in post 1, then let us know what is literally in the input file.

I think the above currently will omit incomplete interval segments containing null. I will not be able to test it until 2013-05-27.

nvn · May 27, 2013

Monique: I now tested the code in post 8, and it worked fine.

Monique · May 28, 2013

Hi nvn, thank you for your input. I'm not so familiar with your code, but I'll try to digest it.

I had a meeting with an information manager today, his suggestion was to step off the idea of storing information in a database for later retrieval. His suggestion was to do all processing on the raw files (similar to Kholdstare's suggestion) and use MATLAB for it. I've never worked with MATLAB before, so I'd have to teach myself first how to handle information with it :cry:

Does any of you have experience with it? My data is in 120 separate files, the data structure does not allow for the creation of one big file. Can I program MATLAB is such a way that it will search for the relevant data structures in each file (both in the columns and rows) and put all the information together in one table to be analyzed?

Kholdstare · May 28, 2013

The first thing you need to do is non-uniqify the storage method of the data, to reduce the size requirement significantly. The trick is to use tagged filename/foldername to achieve this.

e.g. If I have two files in /home/Kholdstare/ (BTW. you use Linux or Windows?)

cat /home/Kholdstare/exp1.txt
28/May/2013 monitor1 11:01 xyz1 xyz2 1
28/May/2013 monitor1 12:24 xyz1 xyz2 0
28/May/2013 monitor1 13:05 xyz1 xyz2 1
28/May/2013 monitor1 14:55 xyz1 xyz2 2
28/May/2013 monitor1 17:12 xyz1 xyz2 0
28/May/2013 monitor1 17:20 xyz1 xyz2 1
28/May/2013 monitor1 18:03 xyz1 xyz2 1
cat /home/Kholdstare/exp2.txt
27/Apr/2013 monitor1 02:14 xyz3 xyz2 2
27/Apr/2013 monitor1 08:23 xyz3 xyz2 1
27/Apr/2013 monitor1 09:44 xyz3 xyz2 0
27/Apr/2013 monitor1 10:25 xyz3 xyz2 2
27/Apr/2013 monitor1 12:00 xyz3 xyz2 0
27/Apr/2013 monitor1 13:02 xyz3 xyz2 1
27/Apr/2013 monitor1 15:32 xyz3 xyz2 3

Convert them to either 1)

cat /home/Kholdstare/28May2013_monitor1_timecolumn_xyz1_xyz2.txt
11:01 1
12:24 0
13:05 1
14:55 2
17:12 0
17:20 1
18:03 1
cat /home/Kholdstare/27Apr2013_monitor1_timecolumn_xyz3_xyz2.txt
02:14 2
08:23 1
09:44 0
10:25 2
12:00 0
13:02 1
15:32 3

Or 2)

cat /home/Kholdstare/28May2013/monitor1/timecolumn/xyz1/xyz2.txt
11:01 1
12:24 0
13:05 1
14:55 2
17:12 0
17:20 1
18:03 1
cat /home/Kholdstare/27Apr2013/monitor1/timecolumn/xyz3/xyz2.txt
02:14 2
08:23 1
09:44 0
10:25 2
12:00 0
13:02 1
15:32 3

Kholdstare · May 28, 2013

The 1) or 2) can be easily done from the initial file by using 'gawk' command of linux.

Code:

cat exp1.txt | gawk '{ print $2 $4 }' > 28May2013_monitor1_timecolumn_xyz1_xyz2.txt

Will produce 28May2013_monitor1_timecolumn_xyz1_xyz2.txt from exp1.txt

Compare the file size and see that new one is significantly lower than the other as you don't have to store redundant information in every row like. xyz1, xyz3 etc. This is called de-uniqification. Because in exp1.txt every row entry was unique and could be identified by the whole line. But in the new file row entries are not unique as they share common filename/foldername tags. To uniquely identify a row from the new file, one has to use the tags used in the filename/foldername.

Now delete the exp1.txt file.

Monique · May 28, 2013

Thanks for your thoughts, I wasn't aware of that system (I use a Mac).

nvn · May 28, 2013

Monique: I am not sure I would recommend Matlab for this application. Do all rows and columns of your 120 files have the same format? Would you be able to post example lines from four or six of your files (preferably inside [noparse]

Code:

[/noparse] delimiters)? And describe a given example of the columns and rows you want to search for in each file, and how you want the data put together into one table to be analyzed? And then we might be able to show you commands you could try, to see if it gives you what you want or helps you.

Monique · May 29, 2013

Ok, directly working from files would be nice. I have a program that can convert the 4-second file into any other interval. For simplicity let's look at the 30-minute interval now.

Columns A-J are always in the same format, with column B/C containing the date/time and column J containing the light information on(1)/off(0). The 32 columns after that contain different genotypes tested.

The 2nd and 3rd code contain two monitors that collected data on the same days. Incidentally the genotypes look to be the same in both monitors, but this is not usual.

Code:

A	B		C	D	E	F	G	H	I	J	2096_2056_0	2096_8012_0	2096_8038_0	2096_6040_0	2096_8012_0	2096_8038_1N	2096_6040_1N	2096_8038_1N	2096_6040_1N	2096_2056_0	2096_8012_0	2096_8038_0	2096_6040_0	2096_8012_0	2096_8038_1N	2096_6040_1N	2096_8012_1N	2096_6040_1N	2096_2056_1N	2096_8012_0	2096_8038_0	2096_6040_0	2096_8012_0	2096_8038_1N	2096_8038_1N	2096_8012_1N	2096_6040_1N	2096_2056_1N	2096_8012_0	2096_8038_0	2096_6040_0	2096_8012_0
36090	19-Apr-13	0:00:00	1	449	0	0	0	0	1	152	25	94	224	48	79	137	54	58	23	18	118	119	73	88	171	89	69	50	162	69	169	58	26	62	89	117	38	38	127	93	75
36540	19-Apr-13	0:30:00	1	449	0	0	0	0	1	132	15	74	167	41	20	102	64	75	39	14	39	100	0	72	175	44	53	12	88	9	110	18	8	97	18	97	6	23	124	82	0
36990	19-Apr-13	1:00:00	1	449	0	0	0	0	1	62	13	62	157	14	5	80	54	63	78	0	6	68	0	12	173	24	30	18	97	1	6	2	10	85	6	83	2	12	94	84	0

Code:

A	B		C	D	E	F	G	H	I	J	2096_8012_0	2096_2056_1N	2096_8038_1N	2096_8012_0	2096_2056_0	2096_8012_1N	2096_8038_0	2096_8012_1N	2096_6040_1N	2096_8012_0	2096_2056_1N	2096_8038_1N	2096_8012_0	2096_2056_0	2096_8012_1N	2096_8038_0	2096_8012_1N	2096_6040_1N	2096_8012_0	2096_2056_1N	2096_8038_1N	2096_8012_0	2096_6040_0	2096_8012_1N	2096_6040_0	2096_8012_1N	2096_6040_1N	2096_8012_0	2096_2056_0	2096_8038_0	2096_8012_1N	2096_6040_0
38340	15-May-13	0:00:00	1	449	0	0	0	0	1	110	92	68	88	102	84	37	31	111	85	86	126	145	116	30	58	89	134	102	54	92	86	128	92	173	88	59	72	99	43	73	129
38790	15-May-13	0:30:00	1	449	0	0	0	0	1	63	45	59	39	81	75	21	25	95	34	113	112	116	64	8	53	64	97	78	53	56	45	77	62	166	36	41	38	87	62	27	86
39240	15-May-13	1:00:00	1	449	0	0	0	0	1	44	28	54	24	65	22	23	6	53	24	97	63	96	40	26	1	43	46	48	46	58	26	48	44	150	29	47	27	63	50	20	101

Code:

A	B		C	D	E	F	G	H	I	J	2096_8012_0	2096_2056_1N	2096_8038_1N	2096_8012_0	2096_2056_0	2096_8012_1N	2096_8038_0	2096_8012_1N	2096_6040_1N	2096_8012_0	2096_2056_1N	2096_8038_1N	2096_8012_0	2096_2056_0	2096_8012_1N	2096_8038_0	2096_8012_1N	2096_6040_1N	2096_8012_0	2096_2056_1N	2096_8038_1N	2096_8012_0	2096_6040_0	2096_8012_1N	2096_6040_0	2096_8012_1N	2096_6040_1N	2096_8012_0	2096_2056_0	2096_8038_0	2096_8012_1N	2096_6040_0
38340	15-May-13	0:00:00	1	449	0	0	0	0	1	156	103	90	37	77	86	29	74	161	157	93	37	31	107	45	66	76	83	51	106	54	72	91	70	147	62	75	154	81	106	65	80
38790	15-May-13	0:30:00	1	449	0	0	0	0	1	78	75	93	31	71	41	14	43	143	122	72	49	7	87	14	71	45	86	40	93	29	81	88	56	108	28	52	75	31	72	24	62
39240	15-May-13	1:00:00	1	449	0	0	0	0	1	44	25	47	44	33	27	2	38	147	18	38	40	0	68	3	44	19	71	12	82	12	42	79	45	107	16	24	49	8	33	9	48

Code:

A	B		C	D	E	F	G	H	I	J	2096_8012_0	2096_8013_0	2096_2056_0	2096_8012_1N	2096_8013_1N	2906_2056_1N	2096_8012_0	2096_8013_0	2096_2056_0	2096_8012_1N	2096_8013_1N	2906_2056_1N	2096_8012_0	2096_8013_0	2096_2056_0	2096_8012_1N	2096_8013_1N	2906_2056_1N	2096_8012_0	2096_8013_0	2096_2056_0	2096_8012_1N	2096_8013_1N	2906_2056_1N	2096_8012_0	2096_8013_0	2096_2056_0	2096_8012_1N	2096_8013_1N	2906_2056_1N	2096_8013_1N	2096_8012_1N
12156	1-Jan-11	0:00:00	1	449	0	0	0	0	1	144	7	55	30	13	92	100	59	59	40	34	56	113	105	48	62	102	73	2	93	42	22	37	83	31	95	68	96	36	83	30	36
12606	1-Jan-11	0:30:00	1	449	0	0	0	0	1	126	9	38	4	25	24	72	58	26	64	37	32	51	113	26	66	93	48	0	98	8	17	39	119	35	70	50	91	24	49	48	98
13056	1-Jan-11	1:00:00	1	449	0	0	0	0	1	120	6	2	8	6	10	5	35	5	42	0	0	45	98	14	9	98	0	3	102	22	3	30	110	29	14	15	71	7	1	9	48

So how I'd like to pull out information:
Does the file contain 2096_8012_0?
(possibly remove information when the activity remains 0 for >2 hours = convert to NULL)
Does the file contain two days where the lights are off? (best done by looking at the rounded average light in column J from 0:00:00-23:30:30, which should be 0).
Count how many days are before that, for each time-point take the average over 4 days (if there are 4 days, otherwise less)
Return the average of the time points of the day, also return the information of the two days of darkness (so that would yield one column representing 3 days)
Do that for each column that matches 2096_8012_0 (the averaging and returning of information)
Do that for all the files, make note of the first date of the experiment.

Now, for each date that contained 2096_8012_0, also search for 2096_2056_0.
Repeat extract data, this is the control.

For all the replicates of 2096_8012_0 and 2096_2056_0, perform an analysis
For each column sum the activity from 2:00:00 to 8:00:00 (day 1) and 2:00:00 to 8:00:00 (day 2)
Calculate the T-test statistic between the case and control
Calculate the average sum of activity for all time points of 2096_8012_0 and 2096_2056_0
Calculate the 95% confidence interval -> make graph that represents the three days

nvn · May 29, 2013

Monique: What are the file names of the four files in post 15? Do the files have a naming convention? Secondly I notice, all four files in post 15 contain only one date. Is this always true? Or do some files contain more than one date?

Monique · May 29, 2013

For each experimental start date there can be eight file names: Monitor1.csv – 8.csv
The files contain more than one date, usually 7-14 days.

nvn · May 29, 2013

Monique: I worked on it awhile, and I have it automatically extracting data for the given genotype. Over the next four days, as time permits, I next plan to implement processing of the lighting data. I plan to get back with you then, in about four days, to show you the code, so far, so you can start testing it, to see if it gives you what you want.

nvn · May 31, 2013

Monique said:

Does the file contain two days where the lights are off?

Monique: (1) Let's take a hypothetical example, and say you have four days of complete darkness, occurring in days 5, 6, 8, and 9. Do you want to find the first two days of darkness (days 5 and 6), or do you want to find the last two days of darkness (days 8 and 9)?

(2) Same question, except the four days of complete darkness are now days 5, 6, 7, and 8?

(3) Same question, except the four days of complete darkness are now days 5, 6, 12, and 13?

Monique said:

Return the average of the time points of the day, also return the information of the two days of darkness (so that would yield one column representing 3 days).

(4) What are "the time points of the day"? I.e., what is "time points" in "the time points of the day"?

(5) In the second and third row of the "one column representing 3 days," why do we return the average lighting for the two days of complete darkness, since we already know the two days of complete darkness have an average of 0 % lights on?

Monique · May 31, 2013

nvn said:

Monique: (1) Let's take a hypothetical example, and say you have four days of complete darkness, occurring in days 5, 6, 8, and 9. Do you want to find the first two days of darkness (days 5 and 6), or do you want to find the last two days of darkness (days 8 and 9)?

The first two days, so 5 and 6.

(2) Same question, except the four days of complete darkness are now days 5, 6, 7, and 8?

The same, 5 and 6.

(3) Same question, except the four days of complete darkness are now days 5, 6, 12, and 13?

The same, 5 and 6. Always the earliest two days.

(4) What are "the time points of the day"? I.e., what is "time points" in "the time points of the day"?

Timepoint 0:00:00 in the experiment is the moment that lights turn on, 12:00:00 lights turn off. This is not with atomic precision, so if the lights are not off at 12:00:00, they might go off at 12:00:08.

(5) In the second and third row of the "one column representing 3 days," why do we return the average lighting for the two days of complete darkness, since we already know the two days of complete darkness have an average of 0 % lights on?

It's the activity (the actual data) that need to be returned, column K and higher. So it's the average activity of the first normal days, then the activity of the next two days of darkness (that data can't be averaged, since there is a circadian shift in darkness).

nvn · Jun 1, 2013

Monique: I now have it computing the genotype 2096_8012_0 and 2096_2056_0 average activity for "day" 1, day 2, and day 3. For day 1, "day" is in quotes, because day 1 consists of four days, or less. I created a test case; and some intermediate results it returned are listed below. The first record gives a label of what the following record contains. The following record gives the average activity for the stated genotype, in each column that contained that genotype. Any column that did not contain that genotype contains -1. Therefore, you have, e.g., column 12, which is one column with three rows, representing three days (day 1, 2, and 3), for genotype 2096_8012_0. Day 2 and 3 are complete darkness, defined as the earliest-occurring 96 consecutive time points having lights off.

Code:

monitor1.csv.in03 2096_8012_0 "day" 1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 16.31 -1 -1 34.05 -1 -1 -1 -1 -1 10.47 -1 -1 14.17 -1 -1 -1 -1 -1 105.03 -1 -1 20.79 -1 -1 -1 -1 -1 22.51 -1 -1 14.45
monitor1.csv.in04 2096_8012_0 day 2
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 14.04 -1 -1 28.06 -1 -1 -1 -1 -1 7.29 -1 -1 0.00 -1 -1 -1 -1 -1 92.31 -1 -1 10.33 -1 -1 -1 -1 -1 17.73 -1 -1 1.10
monitor1.csv.in04 2096_8012_0 day 3
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 18.58 -1 -1 27.50 -1 -1 -1 -1 -1 7.00 -1 -1 0.21 -1 -1 -1 -1 -1 92.50 -1 -1 10.00 -1 -1 -1 -1 -1 17.50 -1 -1 0.00
monitor1.csv.in03 2096_2056_0 "day" 1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 114.34 -1 -1 -1 -1 -1 -1 -1 -1 47.90 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
monitor1.csv.in04 2096_2056_0 day 2
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 98.46 -1 -1 -1 -1 -1 -1 -1 -1 57.69 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
monitor1.csv.in04 2096_2056_0 day 3
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 97.00 -1 -1 -1 -1 -1 -1 -1 -1 58.50 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
1 files generated.
Experiment start date (found in monitor4.csv.in01):  2011-01-01 at 0000.

The reason it says "1 files generated" is because, in my test case, I used the four files you posted in post 15 (monitor1.csv, monitor2.csv, monitor3.csv, and monitor4.csv). I then fabricated a lot of fictitious records in file monitor1.csv, so it would be long enough for testing. Notice, it found two days of complete darkness in file monitor1.csv, but did not find two days of darkness in the other three files, because the other three files did not contain two days of darkness. It worked correctly, as it should, generating output for 1 file (monitor1.csv).

The control genotype (2096_2056_0) average activity is also listed above, which you can see in the labels.

The averages are computed as follows. For day 1, avg_day1 = summation(activity)/(192 or less). For day 2, avg_day2 = summation(activity)/48. For day 3, avg_day3 = summation(activity)/48. A day of complete darkness consists of 48 thirty-minute time points. The reason avg_day1 says "192 or less" in the denominator is because day 1 consists of a maximum of four days (192 thirty-minute time points), or less (if less than 192 time points occurred before the two days of complete darkness).

Monique said:

Now, for each date that contained 2096_8012_0, also search for 2096_2056_0.

Monique: For each date that contained 2096_8012_0, will 2096_2056_0 always be in the same files where 2096_8012_0 was found? Or can 2096_2056_0 occur (on that same date) in other files that do not contain 2096_8012_0?

Monique said:

For each column, sum the activity from 2:00:00 to 8:00:00 (day 1) and 2:00:00 to 8:00:00 (day 2).

I am working on this now. But could you elaborate on how your "day 1" and "day 2," here, compare to my day 1, 2, and 3, listed above?

Monique said:

For all the replicates of 2096_8012_0 and 2096_2056_0, perform an analysis.

For all occurrences of 2096_8012_0 and 2096_2056_0? Or only those found in my "days 1, 2, and 3," described above in paragraphs 1 and 4?

nvn · Jun 2, 2013

Monique said:

For each column, sum the activity from 2:00:00 to 8:00:00 (day 1) and 2:00:00 to 8:00:00 (day 2).

For now, I will assume "day 1" and "day 2" in your above sentence have the same meaning as "day 1" and "day 2" in the beginning of posts 15 and 21. I.e., "day 1" is the four days before the two days of complete darkness. "Day 2" is the first day of complete darkness (48 consecutive thirty-minute time points of darkness, 24 hours). Footnote: Day 3, which is the second 24-hour day of complete darkness, is omitted for the summation described in your above sentence.

nvn · Jun 2, 2013

Monique: Here are some intermediate results my test case automatically returns now, as described in posts 21 and 22. "Mode 1" denotes average activity for all values, for days 1, 2, and 3. "Mode 2" denotes average activity from time 2:00 to 8:00, for days 1 and 2. The summation of activity (even though I did not output it below) is just the average value multiplied by rcount. rcount is the number of thirty-minute time points used to compute the average.

This is just intermediate data I am obtaining. If this looks correct, and looks like what you want, then I guess the next step would be to compute the T-test statistic? If so, how would that be computed?

Code:

monitor1.csv.in03 2096_8012_0 "day" 1 mode= 1 rcount= 192
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 16.31 -1 -1 34.05 -1 -1 -1 -1 -1 10.47 -1 -1 14.17 -1 -1 -1 -1 -1 105.03 -1 -1 20.79 -1 -1 -1 -1 -1 22.51 -1 -1 14.45
monitor1.csv.in04 2096_8012_0 day 2 mode= 1 rcount= 48
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 14.04 -1 -1 28.06 -1 -1 -1 -1 -1 7.29 -1 -1 0.00 -1 -1 -1 -1 -1 92.31 -1 -1 10.33 -1 -1 -1 -1 -1 17.73 -1 -1 1.10
monitor1.csv.in05 2096_8012_0 day 3 mode= 1 rcount= 48
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 23.17 -1 -1 27.50 -1 -1 -1 -1 -1 7.00 -1 -1 0.21 -1 -1 -1 -1 -1 92.50 -1 -1 10.00 -1 -1 -1 -1 -1 17.50 -1 -1 0.00
monitor1.csv.in03 2096_2056_0 "day" 1 mode= 1 rcount= 192
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 114.34 -1 -1 -1 -1 -1 -1 -1 -1 47.90 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
monitor1.csv.in04 2096_2056_0 day 2 mode= 1 rcount= 48
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 98.46 -1 -1 -1 -1 -1 -1 -1 -1 57.69 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
monitor1.csv.in05 2096_2056_0 day 3 mode= 1 rcount= 48
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 97.00 -1 -1 -1 -1 -1 -1 -1 -1 58.50 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
monitor1.csv.in03 2096_8012_0 "day" 1 mode= 2 rcount= 52
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 16.46 -1 -1 33.65 -1 -1 -1 -1 -1 10.27 -1 -1 15.63 -1 -1 -1 -1 -1 106.60 -1 -1 21.23 -1 -1 -1 -1 -1 22.58 -1 -1 15.87
monitor1.csv.in04 2096_8012_0 day 2 mode= 2 rcount= 13
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 13.92 -1 -1 26.46 -1 -1 -1 -1 -1 6.46 -1 -1 0.00 -1 -1 -1 -1 -1 92.85 -1 -1 9.38 -1 -1 -1 -1 -1 17.08 -1 -1 0.00
monitor1.csv.in03 2096_2056_0 "day" 1 mode= 2 rcount= 52
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 113.35 -1 -1 -1 -1 -1 -1 -1 -1 48.37 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
monitor1.csv.in04 2096_2056_0 day 2 mode= 2 rcount= 13
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 94.31 -1 -1 -1 -1 -1 -1 -1 -1 60.00 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
1 files generated.
Experiment start date (found in monitor4.csv.in01):  2011-01-01 at 0000.

How to calculate arithmetic sum of multiple strings

Similar threads

Hot Threads

Hackathon ideas?

Touch-typing for programmers

How to calculate Tension for a series of connected points?

Trying To Debug A Python File

Python Complaining About Python

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective