Search

# Star Field Experiment

Updated: Apr 30

Purpose: Understanding the Chi-squared test to determine whether there is a significant difference between the expected frequencies and the observed frequencies of a star field using Python Astropy.

1. Image data:

2. How does the Python-generated image compare with the original jpeg above? We are beginning to get an idea of how many stars are too dim for us to see in the first image, but if we use another visual like a scatter plot it will show us many more star positions.

3. What differences are apparent between the scatter plot and the grey-scale image? The scatter plot reveals points of light too dim for us to see in the image data. There is more here than the eye can see, so using a larger sampling of data will improve our Chi-squared test.

Test the randomness of the star field:

1. Are there any important visual differences between the two histograms of the image data versus the Python-generated?

The important visual differences are the presence of the large spike at the origin of the plot. This signifies that there are many stars close together, perhaps they are in binary systems.

2. What are your conclusions from the result of the Chi-square test?

The Chi-square value that I would expect from this difference in the histograms would be large. This is because the observations poorly match the expectation. The lowest value of the test means that the model was successful with high probability. (Green is the artificial star field, I accidentally left the data from the other star field on this histogram.)

Thought Questions:

1. What kinds of image processing may be appropriate for aesthetics vs. scientific analysis? If an image is intended for scientific use, what kind of information must accompany it? For example if you used the images produced above in a scientific paper, what would you have to tell the reader?

You’d need to tell the reader that dimensions of the image are log-scaled if not in the plot legend. For aesthetics, color imaging should be used. For scientific analysis, grey-scale is probably better due to the fact that it accentuates the contrast between the darkness of the background, to the brightness of the stars. If used in scientific publications the caption should say something about the extraction of color for the purpose of getting as many data points as possible. The more data you collect, the closer you can come to your predicted number. (Like in the 2 dice example when the more rolls you get, the closer you get to the hypothesis and increase the probability of rolling all twos.

2. Explain what you are doing (what it means) when you compute a Chi-square value and look it up in a table of use scipy.stats.chisquare to determine a confidence.

When you compute the Chi-square test we want a smaller value because you want your observations to closely match what you expected. That’s why you sum over all the differences and get fractional difference of the square of the (observed value(s), degrees freedom in each bin – expected value(s), how many bins) over the expected value. If your observations were proven statistically to be close to your expected values, it means you did a good job reducing random and systematic errors.

3. Are you confident of your conclusion about the randomness or non-randomness of the star field in Part B? What would you do to increase your confidence in the result?

I’m confident that the Chi-square value would be large so that means I should create as many randomly generated star fields as possible so that way I can come closer to the observed values.