Lines Analysis1. Mick Popp from University of Florida sent twenty-six quantified image files on 12/13-14/04. The files are:
US22502637_251279810021_S01_A01.txt
US22502637_251279810022_S01_A01.txt
US22502637_251279810028_S01_A01.txt
US22502637_251279810029_S01_A01.txt
US22502637_251279810030_S01_A01.txt
US22502637_251279810031_S01_A01.txt
US22502637_251279810032_S01_A01.txt
US22502637_251279810033_S01_A01.txt
US22502637_251279810034_S01_A01.txt
US22502637_251279810035_S01_A01.txt
US22502637_251279810036_S01_A01.txt
US22502637_251279810001_S01_A01.txt
US22502637_251279810004_S01_A01.txt
US22502637_251279810005_S01_A01.txt
US22502637_251279810006_S01_A01.txt
US22502637_251279810007_S01_A01.txt
US22502637_251279810009_S01_A01.txt
US22502637_251279810010_S01_A01.txt
US22502637_251279810011_S01_A01.txt
US22502637_251279810012_S01_A01.txt
US22502637_251279810013_S01_A01.txt
US22502637_251279810014_S01_A01.txt
US22502637_251279810015_S01_A01.txt
US22502637_251279810016_S01_A01.txt
US22502637_251279810017_S01_A01.txt
US22502637_251279810018_S01_A01.txt
2. Damion Junk adapted the above files slightly to be compatible with Karthik's stacking program without changing the actual quantified results. The altered files are saved under the same name with a “_jdfix” appended to the end of the name and saved under UFL_data_column_modified_to_run_in_karthiks_program directory.Using the information provided in Nuzhybcodes.xls, Karthik’s data packaging programs created stacked and side-by-side files saved as tabsep_26files_sbs.txt and tabsep_26files_stacked.txt, respectively, using file_list_for_karthiks_program.csv to add the slide, dye, treatment, and rep.
3. SAS program, makedata.sas, takes the stacked data supplied by Damion and imports it into SAS. The slide numbers were translated from the automatic numbering system used by the stacking program to the numbers supplied by Sergey Nuzhdin in Nuzhybcodes.xls. The design file Purdue_McIntyre-001a.csv information was merged into the quantified results by row and col. Our negative controls were identified and flagged by a 1 in the our_neg_con_flag column. Otherwise, a 0 was in the our_neg_con_flag column. The variables sex , line , rep , and sex_line_rep , which is the contenation of the previous three variables, were added according to Nuzhybcodes.xls. The annotation information was merged into the stacked data by sequence to add the probeuid column into the stacked data. The resulting file was called data_stacked_anno.sas7bdat.
4. The program macro_find_off_anne.sas finds which probes are effectively off and removes them from the data set. This is accomplished by finding the 90 th percentile of our negative controls per slide per dye. If a probe was less than the 90 th percentile for at least 50% of the replicates of the line, then the probe was considered to be off for that line. The percent that a probe is off for a particular line is saved in the column percent_off_line, where line is the specific line. If the probe is off for all of the lines, then that probe is considered to always be off and is assigned a value of 1 for the gene_off column. Otherwise, the probe is given a value of 0 for the gene_off column. Probes that are found to be off, including our negative controls, are removed from the data set and saved in off_list_anne.sas7bdat and exported to off_list_anne.csv. The data for the probes that are found to be on are saved as in anova_nooff_anne.sas7bdat.5. The program normalize_all.sas normalized the data anova_nooff_anne.sas7bdat. The data was normalized in a variety of fashions:
Bgsubsignal_quartile - the quartile of the bgsubsignal of the particular slide/dye
Log_bgsubsignal - the natural log transform of bgsubsignal
Sqrt_bgsubsignal - the square root of bgsubsignal
Log10_bgsubsignal - the log base 10 transform of bgsubsignal
Bgsubsignal_med - bgsubsignal divided by the median of its respective slide/dye combination.
Log_bgsubsignal_med - the natural log transform of bgsubsignal_med
Bgsubsignal_rank - the rank normalized bgsubsignal performed per slide per dye
The results are saved as anova_normalizations.sas7bdat
6. The program looking_for_transforms.sas was used to determine which normalization and/or transform technique would be the best for the data is anova_nooff_anne.sas7bdat. It was determined that the log base 10 was the optimum transformation and that line 78 may be more variable than the others.
7. The mean of each of the lines for every probe was taken based upon log10_bgsubsignal in expression_means_lines.sas and saved as exp_means_line.sas7bdat. The mean of each the lines by sex for every probe was taken based upon log10_bgsubsignal in expression_means.sas. The means of each sex were ranked for each probe and listed in order from least to greatest in the male_means_order and female_means_order columns, and the results were saved as line_means_order.sas7bdat.
8. The ANOVA was performed on anova_normalizations.sas7bdat in log10_anova.sas on the following model:
where Y = log10_bgsubsignal
μ = overall mean of the normalized values for that probeuid
d = dye and i = Cy3, Cy5
l = line and j = ore, 2b3, 09, 12, 15, 38, 70, 78
s = sex and k = male, female
(sl) = interaction effects of sex and line
ε = error
The effects and interactions from the above model were saved and flagged with a 1 if any of the p-values were less than 0.05. Otherwise, they were flagged as 0. The names are as follows:
Effect or interaction
Name with all lines
Name without line 78
Dye
dye_log10_flag
pdye_log10dye_log10_no78_flag pdye_log10_no78
Line
line_log10_flag
pline_log10line_log10_no78_flag pline_log10_no78
Line by sex interaction
linebysex_log10_flag plinebysex_log10
linebysex_log10_no78_flag plinebysex_log10_no78
sex
psex_log10
sex_log10_flagsex_log10_no78_flag psex_log10_no78
Several tests were run on the residuals. The results were output as follows:
Test or statistic
Name with all lines
Name without line 78
Mean
mean_log10_bgsubsignal
mean_log10_bgsub_no78
median
median_log10_bgsubsignal
median_log10_bgsub_no78
Sign statistic
msign_log10_bgsubsignal
msign_log10_bgsub_no78
Test statistic for normality
normal_log10_bgsubsignal
normal_log10_bgsub_no78
Flag for normality test; 0 if > than 0.05, 1 if ≤ 0.05
norm_flag_log10_bgsub
norm_flag_log10_bgsub_no78
Probability of a greater absolute value for the sign statistic
probm_log10_bgsub_no78
probm_log10_bgsubsignal
Probability value for the test of normality
probn_log10_bgsub_no78
probn_log10_bgsubsignal
Probability value for the signed rank test
probs_log10_bgsub_no78
probs_log10_bgsubsignal
Probability value for the Student's t test
probt_log10_bgsub_no78
probt_log10_bgsubsignal
Statistic for the Student's t test
t_log10_bgsubsignal
t_log10_bgsub_no78
Signed rank statistic
signrank_log10_bgsubsignal
signrank_log10_bgsub_no78
Contrasts were run to test specific lines against each other. The p-values of the contrast results saved as follows:
Contrast
Name with all lines
Name without line 78
Line ore vs. line 2b3
Contrast_orevs2b3
Contrast_orevs2b3_no78
Parents vs. offspring
Contrast_parvsoff
N/A
Line 09 vs. line 2b3
Contrast_09vs2b3
Contrast_09vs2b3_no78
Line 12 vs. line 2b3
Contrast_12vs2b3
Contrast_12vs2b3_no78
Line 15 vs. line 2b3
Contrast_15vs2b3
Contrast_15vs2b3_no78
Line 38 vs. line 2b3
Contrast_38vs2b3
Contrast_38vs2b3_no78
Line 70 vs. line 2b3
Contrast_70vs2b3
Contrast_70vs2b3_no78
Line 78 vs. line 2b3
Contrast_78vs2b3
N/A
Line 09 vs. line ore
Contrast_09vsore
Contrast_09vsore_no78
Line 12 vs. line ore
Contrast_12vsore
Contrast_12vsore_no78
Line 15 vs. line ore
Contrast_15vsore
Contrast_15vsore_no78
Line 38 vs. line ore
Contrast_38vsore
Contrast_38vsore_no78
Line 70 vs. line ore
Contrast_70vsore
Contrast_70vsore_no78
Line 78 vs. line ore
Contrast_78vsore
N/A
Flags for the contrasts above were added. If the p-value for the contrast met a flat threshold of 0.05, then the values were flagged and saved under the same name as their contrast with “_flag” appended to the end of it. For instance, the flag values for contrast_orevs2b3 are labeled as contrast_orevs2b3_flag and contrast_orevs2b3_no78 as contrast_orevs2b3_flag_no78. The effects and interactions, results of the residual tests, and the contrasts as well as all of their respective flags were merged into a single file and saved as sig_norm_flags.sas7bdat.
9. The program extremes_influential.sas checks if there is any correlation between extreme bgsubsignal values and normality problems and Cook’s D.
10. The program check_missings.sas checks results_all_geno2_all0123.sas7bd at if there is any correlation between missing values of bgsubsignal values due and normality problems, which does not exist.
11. The FDRs were calculated for three different threshold levels for the data in sig_norm_flags.sas7bdat by macro_fdr_orig.sas. If a probe meets the 0.05 threshold, then it is designated as red in the flag column, 0.2 then orange, 0.5 then yellow. If probe fails to meet any of those thresholds, then it is designated as tan. The flag columns are labeled as follows:
p-value
Name of FDR column
Pline_log10
Fdr_pline_log10
Pline_log10_no78
Fdr_pline_log10_no78
Psex_log10
Fdr_psex_log10
Psex_log10_no78
Fdr_psex_log10_no78
Pdye_log10
Fdr_pdye_log10
Pdye_log10_no78
Fdr_pdye_log10_no78
Plinebysex_log10
Fdr_plinebysex_log10
Plinebysex_log10_no78
Fdr_plinebysex_log10_no78
12. The expression means, ANOVA results and flags, the FDR flags, and the annotation information were merged into a single file in results_all_log10.sas. The files sig_norm_flags.sas7bdat, expression_means.sas7bdat, results_fdr.sas7bdat, and anno.sas7bdat were merged together by probeuid, saved as results_all_log10.sas7bdat, and exported to results_all_log10.csv.
13. The results of results_all_log10.sas7bdat for the probes that Larry Harshman provided are subsetted in ladder_subset.sas and saved as ladder.sas7bdat
![]()
Parental Lines Analysis
The parental lines analysis uses steps 1-6 from the lines analysis.
14. The program log10_anova_parents.sas subsets the parental lines, ore and 2b3, from W:\anne\data\SAS data\anova_normalizations.sas7bdat. The ANOVA was performed according to the following model:
where Y = log10_bgsubsignal
μ = overall mean of the normalized values for that probeuid
d = dye and i = Cy3, Cy5
l = line and j = ore, 2b3, 09, 12, 15, 38, 70, 78
s = sex and k = male, female
(sl) = interaction effects of sex and line
(dl) = interaction effects of dye and line
ε = error
The effects and interactions from the above model were saved and flagged with a 1 if any of the p-values were less than 0.05. Otherwise, they were flagged as 0. The names are as follows:
Effect or interaction
Name with all lines
Name without line 78
Dye
dye_log10_flag pdye_log10
dye_log10_no78_flag pdye_log10_no78
Line
line_log10_flag pline_log10
line_log10_no78_flag pline_log10_no78
Line by sex interaction
linebysex_log10_flag plinebysex_log10
linebysex_log10_no78_flag plinebysex_log10_no78
Line by dye interaction
dyebyline_log10_flag pdyebyline_log10
dyebyline_log10_no78_flag p dyebyline_log10_no78
sex
psex_log10 sex_log10_flag
sex_log10_no78_flag psex_log10_no78
Several tests were run on the residuals. The results were output as follows:
Test or statistic
Name with all lines
Name without line 78
Mean
mean_log10_bgsubsignal
mean_log10_bgsub_no78
median
median_log10_bgsubsignal
median_log10_bgsub_no78
Sign statistic
msign_log10_bgsubsignal
msign_log10_bgsub_no78
Test statistic for normality
normal_log10_bgsubsignal
normal_log10_bgsub_no78
Flag for normality test; 0 if > than 0.05, 1 if ≤ 0.05
norm_flag_log10_bgsub
norm_flag_log10_bgsub_no78
Probability of a greater absolute value for the sign statistic
probm_log10_bgsub_no78
probm_log10_bgsubsignal
Probability value for the test of normality
probn_log10_bgsub_no78
probn_log10_bgsubsignal
Probability value for the signed rank test
probs_log10_bgsub_no78
probs_log10_bgsubsignal
Probability value for the Student's t test
probt_log10_bgsub_no78
probt_log10_bgsubsignal
Statistic for the Student's t test
t_log10_bgsubsignal
t_log10_bgsub_no78
Signed rank statistic
signrank_log10_bgsubsignal
signrank_log10_bgsub_no78
The effects, interactions, tests, and statistics above are saved sign_norm_flags_parents.sas7bdat.
15. The FDRs were calculated by macro_fdr_orig_parents.sas on sig_norm_flags_parents.sas7bdat. If a probe meets the 0.05 threshold, then it is designated as red in the flag column, 0.2 then orange, 0.5 then yellow. If probe fails to meet any of those thresholds, then it is designated as tan. The flag columns are labeled as follows:
p-value
Name of FDR column
Pline_log10
Fdr_pline_log10
Psex_log10
Fdr_psex_log10
Pdye_log10
Fdr_pdye_log10
Plinebysex_log10
Fdr_plinebysex_log10
Pdyebyline_log10
Fdr_pdyebyline_log10
The results were saved as results_fdr_parents.sas7bdat.
16. The expression means, ANOVA results and flags, the FDR flags, and the annotation information were merged into a single file in results_all_parents.sas. The files sig_norm_flags_parents.sas7bdat, expression_means.sas7bdat, results_fdr_parents.sas7bdat, and anno.sas7bdat were merged together by probeuid, saved as results_all_parents.sas7bdat, and exported to results_all_parents.csv.
![]()
Genotype2 Analysis
The genotype2 analysis uses steps 1-6 from the lines analysis.
17. The program troubleshoot_means.v4.sas merges the data set anova_normalizations.sas7bdat with data from genotypes.corrected4.csv, the file supplied to us by Anne Genissel. The program transforms the 8 line columns from genotypes.corrected4.csv into a single stacked column called genotype. If genotype is equal to “7” or “99”, then genotype1 is given the missing value “.”. Otherwise, genotype1 is equal to genotype. Genotype2 weights the offspring lines, 09, 12, 15, 38, 70, and 78, differently than the parental lines, ore and 2b3. This is done by adding 2 to the genotype1 value for the offspring lines. Genotype2 values for the parental lines are equal to their respective genotype1 values. Missing values for genotype1 remain missing values in genotype2. The means for the four non-missing values of genotype2, 0, 1, 2, and 3, for both sexes are calculated in separate columns and saved as follows:
Mean_0f à mean of the reps when genotype2 = 0 for the females within a probeuid
Mean_0m à mean of the reps when genotype2 = 0 for the males within a probeuid
Mean_1f à mean of the reps when genotype2 = 1 for the females within a probeuid
Mean_1m à mean of the reps when genotype2 = 1 for the males within a probeuid
Mean_2f à mean of the reps when genotype2 = 2 for the females within a probeuid
Mean_2m à mean of the reps when genotype2 = 2 for the males within a probeuid
Mean_3f à mean of the reps when genotype2 = 3 for the females within a probeuid
Mean_3m à mean of the reps when genotype2 = 3 for the males within a probeuid
The means were merged back into the full data set along with the genotype, genotype1, and genotype2 indicator variables and saved as norm_hope_means_v4.sas7bdat.
18. The program subset_means_v4.sas subsets data for probes that are missing or not missing different classes of genotype2 from the data norm_hope_means_v4.sas7bdat.
Classes of genotype2 present
Number of probes
File name
0123
6562
012
4032
013
705
023
0
123
0
01
0
02
0
03
0
12
0
13
0
23
0
0
0
1
0
2
0
3
0
none
816
The means of the different classes within a sex were ranked. The classes were then listed in rank order in the columns female_means_order and male_means_order. These two columns were combined with the means calculated in the previous step and saved as means_order_v4.sas7bdat and exported to means_order_v4.csv.
19. The ANOVA was performed on norm_hope_means_v4.sas7bdat in genotype2_anova_class0123_v4.sas on the following model:
where Y = log10_bgsubsignal
μ = overall mean of the normalized values for that probeuid
d = dye and i = Cy3, Cy5
l = genotype2 and j = 0, 1, 2, 3
s = sex and k = male, female
(sl) = interaction effects of sex and line
ε = error
The effects and interactions from the above model were saved and flagged with a 1 if any of the p-values were less than 0.05. Otherwise, they were flagged as 0. The names are as follows:
Effect or interaction
Name with all lines
Name without line 78
Dye
dye_log10_flag
pdye_log10dye_log10_no78_flag pdye_log10_no78
Genotype2
genotype2_log10_flag pgenotype2_log10
genotype2_log10_no78_flag pgenotype2_log10_no78
Genotype2 by sex interaction
genotype2bysex_log10_flag pgenotype2bysex_log10
genotype2bysex_log10_no78_flag pgenotype2bysex_log10_no78
sex
psex_log10
sex_log10_flagsex_log10_no78_flag psex_log10_no78
Several tests were run on the residuals. The results were output as follows:
Test or statistic
Name with all lines
Name without line 78
Mean
mean_log10_bgsubsignal
mean_log10_bgsub_no78
median
median_log10_bgsubsignal
median_log10_bgsub_no78
Sign statistic
msign_log10_bgsubsignal
msign_log10_bgsub_no78
Test statistic for normality
normal_log10_bgsubsignal
normal_log10_bgsub_no78
Flag for normality test; 0 if > than 0.05, 1 if ≤ 0.05
norm_flag_log10_bgsub
norm_flag_log10_bgsub_no78
Probability of a greater absolute value for the sign statistic
probm_log10_bgsub_no78
probm_log10_bgsubsignal
Probability value for the test of normality
probn_log10_bgsub_no78
probn_log10_bgsubsignal
Probability value for the signed rank test
probs_log10_bgsub_no78
probs_log10_bgsubsignal
Probability value for the Student's t test
probt_log10_bgsub_no78
probt_log10_bgsubsignal
Statistic for the Student's t test
t_log10_bgsubsignal
t_log10_bgsub_no78
Signed rank statistic
signrank_log10_bgsubsignal
signrank_log10_bgsub_no78
Contrasts were run to test specific genotype2 classes against each other. The p-values of the contrast results saved as follows:
Contrast
Name with all lines
Name without line 78
Line ore vs. line 2b3
Contrast_parents
Contrast_parents_no78
Offspring 0 vs. 1
Contrast_offspring
Contrast_offspring_no78
0 vs. 1
Contrast_0vs1
Contrast_0vs1_no78
Parents 0 vs. offspring 0
Contrast_p0vso0
Contrast_p0vso0_no78
Parents 1 vs. offspring 1
Contrast_p1vso1
Contrast_p1vso1_no78
Flags for the contrasts above were added. If the p-value for the contrast met a flat threshold of 0.05, then the values were flagged and saved under the same name as their contrast with “_flag” appended to the end of it. For instance, the flag values for contrast_orevs2b3 are labeled as contrast_orevs2b3_flag and contrast_orevs2b3_no78 as contrast_orevs2b3_flag_no78. The effects and interactions, results of the residual tests, appropriate means, and the contrasts as well as all of their respective flags were merged into a single file and saved as sig_norm_flags_geno2_0123_v4.sas7bdat.
20. The FDRs were calculated by macro_fdr_geno2.sas on sig_norm_flags_geno2_0123_v4.sas7bdat. If a probe meets the 0.05 threshold, then it is designated as red in the flag column, 0.2 then orange, 0.5 then yellow. If probe fails to meet any of those thresholds, then it is designated as tan. The flag columns are labeled as follows:
p-value
Name of FDR column
Pgenotype2_log10
Fdr_pgenotype2_log10
Pgenotype2_log10_no78
Fdr_pgenotype2_log10_no78
The results were saved as geno2_fdr_0123.sas7bdat.
21. The expression means, ANOVA results and flags, the FDR flags, and the annotation information were merged into a single file in results_all_parents.sas. The files sig_norm_flags_geno2_0123_v4.sas7bdat, means_order_v4.sas7bdat, results_fdr_geno2_0123.sas7bdat, and anno.sas7bdat were merged together by probeuid , saved as results_all_geno2_all0123.sas7bdat, and exported to results_all_geno2_all0123.csv.