Wednesday, January 10, 2018

Coughs and Sneezes


In the data set above, there are two 5-day measurements, which have the maximum for the number of students with a cold. Because the number of students with a cold was not measured inbetween 20 and 25 days, I will make the assumption that at 23 days is the maximum with 47 students with a cold because otherwise it makes the modelling difficult because there are two days with the same cases of colds. The maximum value for number of students with a cold will be taken as 48 because sine curves do not have a flat top so there must be a peak, so we assume that the peak of the curve is 48 students.

To find out what the value is, you need to do the following sum. The Pi value is not 3.1415 as used for calculating circumferences and areas of a circle in this case but is used for expressing radians. Pi/2 radian refers to 90o on a sine curve. A sine curve looks like this with Pi/2 radians at the bottom of the left-hand curve and Pi radians at the end on the right-hand side. 2Pi Radians are 360o. In our sine curve, the graph does not go past Pi because with the type of data that we are using, it would not be possible to have a negative number of students having coughs and sneezes. The diagram above is a full sine curve but for our data, we will only be using part of the sine curve.

For model 5, which is the trial and improvement model, the way that this was setup was done using Cell Referencing when you reference the cell but you put a dollar sign so when you copy and paste the formula the cell does not change. The purpose of the dollar sign is to lock the cell reference, so that even when you copy a formula it refers to the same cell. It is therefore called "Absolute Cell Reference". If you wish to lock both the Column and the Row you will need two-dollar signs -one before the column reference (the letter), and one before the row reference (the number). Without the dollar sign it is a "Relative Cell Reference", and you will know that if you copy a formula without a dollar sign down one row it changes the row referred to in the formula.

This is useful when doing trial and improvement. For reasons not wanting to have a project a hundred pages long, I will only be showing one or two of the equations that I used and what the sum of the differences squared were. This way, I will be able to find what parameters can be changed to get the best set of results. The values started on what they were for the first model being 0.07, 23 and 25. They were changed until a suitable combination had been found where the sum of the differences squared was almost 0.

Results of the Models

1. Model 1, which had an equation of S=23Sin(0.07T)+25 had a sum of the differences to be -42.387279 and a sum of the differences squared to be 370.2480622. This means that the equation had most of the values are below the actual values because of the negative value for the sum of the differences. There was a very high sum of the differences squared, which means that this equation was not very accurate.

2. Model 2, which had an equation of S=23Sin(0.07T+0.2)+25 had a sum of the differences to be 53.201062 and a sum of the differences squared to be 779.3462931. This means that the equation had most of the values are above the actual values because of the positive value for the sum of the differences. There was a much higher sum of the differences squared than Model 1, which means that this equation was even less accurate than Model 1, which was not very accurate.

3. Model 3, which had an equation of S=0.0005T 3-0.0716T 2+2.4154T + 22.551 had a sum of the differences to be 19.5625 and a sum of the differences squared to be 82.19396375. This means that the equation had most of the values are above the actual values because of the positive value for the sum of the differences. There was a very small sum of the differences squared compared to Model 1, meaning that so far, Model 3 is the best model.

4. Model 4, which had an equations of S=-0.0398T 2+1.8988T+23.792 and S=-0.9738T+74 had a sum of the differences to be 0.675 and a sum of the differences squared to be 17.838055. This means that the equation has just about all of the values are above the actual values because of the very small positive value for the sum of the differences. There was a very, very small sum of the differences squared compared to Model 3, meaning that so far, Model 4 is the best model.

5. Model 5, which had an equation of S=19Sin(0.0644T)+26.5 had a sum of the differences to be 0.43399 and a sum of the differences squared to be 16.81097. This means that the equation has just about all of the values are above the actual values because of the very small positive value for the sum of the differences. There was a very, very small sum of the differences squared compared to Model 4, meaning that Model 5 is the best model overall.

Therefore, looking at the results above, I can tell that where the parameters are not set and can be changed to anything in an attempt to get the sum of the differences squared to equal 0. Realistically, this is not possible because there is not enough variables present and too many will result in confusion. Model 5 is the best model because the sum of the differences squared is the closest number to 0. The sum of the differences does not really tell me anything, only that when that equals 0 either the model is a perfect fit, or more likely that the differences above and below the data cancel each other out.

A trigonometrical model is not the best model type for this data because this type is only really suitable when the data forms a perfect sine wave and in this investigation it doesn't. This can be seen by the high sum of the differences and sum of the differences squared. A polynomial model is not the best model type for this data because of the same reasons as above. This can be seen by the high sum of the differences and sum of the differences squared.

Two models is quite a good representation of the data and is probably the most accurate way of finding an equation because the 'trial and improvement' method is not very good mathematically because you cant really say how you got those values apart from you had a guess and then changed it. The 'Trial and Improvement' method is the best but is not mathematically suitable for all the types of data that is available. The Trial and Improvement method will produce better results because the parameters are not just linked to the data.

No comments:

Post a Comment