Frachon et al. 2017 - NEE - Datasets
##DATASETS
Dataset1. Geographical positions of the 195 accessions
Dataset2. Climate data
			column names : year : from 1966 to 2013
					   6 climate variables : MAT (mean annual temperature), MWMT (mean warmest month temperature),	MCMT (mean coldest month temprature), 
											 MAP (mean annual precipitation),	DD0 (degree-days below 0°C),	DD5 (degree-days above 5°C)
Dataset3. Edaphic data
			column names : accession : see dataset 1 for ecotype_id match
					   soil : the three soils used in our experiment (A & B & C)
					   14 soil variables : see Supplementary information / Ecological characterization / soil characterization
Dataset4. Phenotypic data
			column names : treatment : A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = soil B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. annua
						soil : three types of soils (A, B and C)
						comp: absence (woP) or presence (wP) of Poa annua
						block: five blocks within each treatment
						array: three arrays per block
						line & column: position of the wells in each array
						ecotype_id : see dataset 1 for accession match
						generation: 2002 (TOU-A1) & 2010 (TOU-A6)
						29 phenotypic traits: raw values
Dataset5. BLUPs for 29 phenotypic traits
			column names : treatment : A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = soil B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. annua, mean = all tre
atments
					   generation : A1 = year 2002 & A6 = year 2010
					   accession : see dataset 1 for ecotype_id match
					   29 phenotypic traits
Dataset6. BLUPs for GWA mapping : BLUPs for 144 eco-phenotypic traits
			column names : ecotype_id
					   144 eco-phenotypic traits = name of phenotypic traits + treatment A, B, C, D, E & D (A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = soil B
 w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. annua)
Dataset7. haldanes : results for the 144 eco-phenotypic traits
			column names : trait : 144 eco-phenotypic traits = name of phenotypic traits + treatment A, B, C, D, E & D (A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = so
il B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. annua)
					   haldanes : hadanes estimates 
					   signif : significance for each eco-phenotypic trait (TRUE or FALSE)
					   fdr_accession : pvalue after a FDR correction at a nominal value of 5%
Dataset8. File of 214,051 SNPs genotyped for the 195 TOU-A accessions as well as 24 accessions located within 1km of the TOU-A population. 
Dataset8_bis. informations on each genotype of dataset8
			column names : id : id number in the Dataset_8.gds
						pop : population id
						Region : group of populations
						color : color to be plotted in Fig.3a
Dataset9. Broad-sense heritability estimate for each of the 174 eco-phenotypes
			column names : trait : 29 phenotypic traits
					   treatment : A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = soil B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. annua
					   combination : 'trait' and 'treatment' merged
					   heritability : estimate of the broad-sense heritability
					   fdr_accession : significance of the heritability estimate (pvalue after a FDR correction at a nominal value of 5%)
Dataset10. Data set for making Figure S4. 
			column names : trait : 144 eco-phenotypic traits = name of phenotypic traits + treatment A, B, C, D, E & D (A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = so
il B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. annua)
						generation : A1 = year 2002 & A6 = year 2010
						accession : see dataset 1 for ecotype_id match
						29 phenotypic traits: BLUPs
Dataset11. Multiple datasets of the median LD across the genome and on each chromosome, for a physical distance ranging from 1bp to 30kb between two SNPs.
			column names: bin : distance between two SNPs in bp
						median : median values of the LD estimates (r2) for each "bin"
Dataset12. List of the 328 flowering time candidate genes
Dataset13. Phenotypic standard deviation of the 144 eco-phenotypes to calculate standardized allelic effect of all the SNPs.
			column names: treatment : A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = soil B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. annua
					   SD (standard deviation)
Dataset14. Dataset to prepare data files containing the top SNPs
			column names: treatment : A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = soil B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. annua
					  trait_reduc : 29 phenotypic traits
					  trait : 'trait_reduc' and 'treatment' merged
					  evolution: yes = evolved eco-phenotype, no = unevolved ecophenotype
Dataset15. Summary information for the top 5000 SNPs of each eco-phenotype 
			column names: chromo : chromosome number
						position : position of the SNP on the chromosome
						SNP : SNP id ("chromo" and "position" merged )
						coeff :  allelic effect obtained with EMMAX
						pval : p-values of the association obtained with EMMAX
						maf : Minor Allele Frequency
						chrobs : number of chromosomes with allelic information
						A :  standardized allelic effect
						trait : 'trait_reduc' and 'treatment' merged
						trait_reduc : 29 phenotypic traits
						treatment : treatment : A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = soil B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. 
annua
						evolution : yes = evolved eco-phenotype, no = unevolved ecophenotype
						number : rank of each SNP according to the level of significance
Dataset16. Summary information for the SNPs with -log10 pval > 4
			column names: chromo : chromosome number
						position : position of the SNP on the chromosome
						SNP : SNP id ("chromo" and "position" merged )
						coeff :  allelic effect obtained with EMMAX
						pval : p-values of the association obtained with EMMAX
						maf : Minor Allele Frequency
						chrobs : number of chromosomes with allelic information
						A :  standardized allelic effect
						trait : 'trait_reduc' and 'treatment' merged
						trait_reduc : 29 phenotypic traits
						treatment : treatment : A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = soil B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. 
annua
						evolution : yes = evolved eco-phenotype, no = unevolved ecophenotype
						log : -log10 pval
Dataset17. Candidate genes located within a 2kb window on each side of the 50 top SNPs for each of the 144 eco-phenotypes.
			column names: ATG: ATG number of the candidate genes.
						trait: 144 eco-phenotypic traits = name of phenotypic traits + treatment A, B, C, D, E & D (A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua
, D = soil B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. annua)
Dataset17_bis. ATG number and position of all known genes of Arabidopsis thaliana
			column names: ATG : ATG number of the gene
			Chromo : chromosome
			start : start of the gene
			stop : end of the gene
Dataset18. Dataset with blocks informations obtained with PLINK
			column names: CHR : chromosome number
			BP1 : start position of the block
			BP2 : end position of the block
			KB : length of the block in kb
			NSNPS : number of SNPs in the block
			SNPS : SNP id of the SNPs in the block
Dataset19. Genetic architecture of the 144 eco-phenotypes based on 200 top SNPs.
			column names: trait: 144 eco-phenotypic traits = name of phenotypic traits + treatment A, B, C, D, E & D (A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = soil
 B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. annua)
					  block number: number of LD blocks containing top SNPs.
					  mean_log: mean -log10 p-value of the 200 top SNPs
					  mean_FST: mean Fst estimate of the 200 top SNPs
					  ATG: number of genes located within 2kb of the 200 top SNPs
					  evolution: yes = evolved eco-phenotype, no = unevolved ecophenotype
Dataset20. Genome-wide scan for selection based on temporal differentiation
			column names: chromo: chromosome
						position: physical position on the chromosome
						SNP: SNP id
						FST: temporal Fst estimate
						pval: p-value computed as the proportion of 10,000 simulations giving a locus-specific estimate of Fst larger than or equal to the observed value at the focal SNP.
Datasets21. Lists of the ecotype id for the 80 TOU-A1 accessions and the 195 TOU-A6 accessions.
Datasets22. Summary information of the SNPs for different '-log10 p-value' thresholds and number of top SNPs
			column names : SNP : SNP id,
						trait : 'trait_reduc' and 'treatment' merged,
						chromo : chromosome number
						position : position of the SNP on the chromosome
						coeff :  allelic effect obtained with EMMAX
						pval : p-values of the association obtained with EMMAX
						maf : Minor Allele Frequency
						chrobs : number of chromosomes with allelic information
						A :  standardized allelic effect
						trait : 'trait_reduc' and 'treatment' merged
						trait_reduc : 29 phenotypic traits
						treatment : treatment : A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = soil B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. 
annua
						evolution : yes = evolved eco-phenotype, no = unevolved ecophenotype
						number : rank of each SNP according to the level of significance
						n_trait : number of eco-phenotypes associated with a given SNP
						n_trait_reduc : number of phenotypic traits associated with a given SNP
						n_treatment : number of treatments where the SNP was found significantly associated with an eco-phenotype.
						TE : total effect of the SNP expressed in Euclidian distance
						Tm : total effect of the SNP expressed in Manhattan distance
						haldanes : hadanes estimates
						signif :  significance for each eco-phenotypic trait (TRUE or FALSE)
						polarity : polarity of SNP effect => -1 : SNP effect and haldanes have opposite directions, 1 : SNP effect and haldanes are congruent
						Shared : information if a SNP is found in evolved eco-phenotypes and unevolved eco-phenotypes, shared = a SNP is found in evolved eco-phenotypes and unevolved eco-p
henotypes, no_shared = a SNP is only found in evolved eco-phenotypes or unevolved eco-phenotypes
						Var : the variance of the eigenvalues of the error-corrected correlation matrix
						Neff : effective number of traits
						FST : temporal Fst estimate
						block : block id
						A1 : nucleotide identity of the major allele
						A2 : nucleotide identity of the minor allele
						delta_freq = frequence in A6 - frequence in A1
						evo_ok : SNP effect with the direction of evolution => -1 : SNP effect opposite to the direction of evolution, 1 : SNP effect congruent with the direction of evolut
ion
Dataset23. zip file containing 736 GO files.
Datasets24. output enrichment of Fst values for different Neff values
			column names: Neff values
			row names: Enrich: enrichment relative to the genomic background = median Fst values of top SNPs / median Fst value of the genome
			99.9%: quantile at 99.9% of the null distribution
			99%: quantile at 99% of the null distribution
			95%: quantile at 95% of the null distribution
			5%: quantile at 5% of the null distribution
			signif: significance compared to the genomic background
Datasets25. Distribution of fold-increase in median significance of FST values of top SNPs relative to the genomic background, for different Neff values
			column names: Neff values
Datasets26. output enrichment of fold-increase of -log10 p-values of Fst values for different Neff values
			column names: Neff values
			row names: median: median of the distribution of fold-increase in median significance of FST values of top SNPs relative to the genomic background
			5%: quantile at 5% of the distribution of fold-increase in median significance of FST values of top SNPs relative to the genomic background
			1%: quantile at 1% of the distribution of fold-increase in median significance of FST values of top SNPs relative to the genomic background
			0.1%: quantile at 0.1% of the d distribution of fold-increase in median significance of FST values of top SNPs relative to the genomic background
			signif: significance compared to the genomic background
Datatset27. Data file used to calculate enrichments for a priori candidate genes for natural genetic variation of bolting time.
			column names: NtopSNPs: numbre of top SNPs considered to calculate an enrichment.
						Enrichment: enrichment estimate
						Enrichment95: 95% quantile of the null distribution
						Enrichment05: 5% quantile of the null distribution
Dataset28. Data file used to calculate enrichments for the 736 GO terms for SNPs with the highest Fst values.
			column names: GO_term: gene ontology term
			Nhits: number of SNPs located in the vinicity of the genes belonging to a specific GO term.
			Enrichment: enrichment estimate
			Enrichment999, Enrichment99,Enrichment95, Enrichment05: 99.9%, 99%, 95% and 5% quantiles of the null distribution
Dataset29. Candidate genes located within a 2kb window on the SNPs in the 0.1% tail of the Fst values.
Dataset30. Data file used to retrieve for all 144 eco-phenotypes the mean, median, 95% quantile, 99% quantile and 99.9% quantile of the -log10 p-value distribution for 500 MARF values (with an increment of 0.01 f
rom 0.01 to 0.5).
			column names: trait: 144 eco-phenotypic traits = name of phenotypic traits + treatment A, B, C, D, E & D (A = soil A w/o P. annua, B = soil A w/ P. annua, C = soil B w/o P. annua, D = soil
 B w/ P. annua, E = soil C w/o P. annua, F = soil C w/ P. annua)
							MARF: Minor Allele Relative Frequency
							combination:'trait' and 'MARF' merged 
							mean, median, quantile95, quantile99, quantile999: mean, median, 95% quantile, 99% quantile and 99.9% quantile of the -log10 p-value distribution.
Dataset31. Data set used for making Figure S15: The distribution dependence of p-value distribution on minor allele relative frequency (MARF) for EMMAX across the 144 eco-phenotypes. 
			column names: see Dataset 30
Datasets for running EMMAX: genotypes.bed, genotypes.bim, genotypes.fam, genotypes.ped, genotypes.frq, genotypes.map, genotypes.ped
genotypes_TOUA1.frq & genotypes_TOUA6.frq: allele frequencies across the genome within each year (A1 = 2002, A6 = 2010).
genotypes.vcf.gz: SNP data of the TOU-A population in a vcf format.




##SCRIPTS
script_1. Script to perform PCA analysis on soil data and plot the 2 first PCA axes (output: Supplementary Fig. S3)
				Input : Dataset_3
script_2. Script to explore natural variation of the 29 phenotypic traits (mixed-model, output: Supplementary Table S2)
				Input : Dataset_4
script_3. Script to estimate BLUPs calculated across the 6 micro-habitats and within each micro-habitat(output: Dataset_5)
				Input : Dataset_4
script_4. Script to estimates Haldanes and the associated significance across the 6 micro-habitats and within each micro-habitat (output: Dataset_7)
				Input : Dataset_5
script_5. Script to represent the phenotypic changes in the TOU-A population over 8 generations (Fig. 2)
				Input : Dataset_5
script_6. Script to estimate Linkage Desequilibrium (LD) (output: Dataset_11)
				Input : genotypes.vcf.gz
script_7. Script to represent the genomic patterns of the TOU-A population (output: Fig. 3)
				Input : Dataset_11
script_8. Script to estimate the broad sense heritability (output: Dataset_9)
				Input : Dataset_4&5
script_9. Script to illustrate the genotype-by-environment interactions (output: Supplementary Fig. S4)
				Input : Dataset_10
script_10. Script to estimate LD blocks across the genome (output: Dataset_18)
				Input : genotypes.bed
script_11. Script to represent the blocks length (output: Supplementary Fig. S5)
				Input : Dataset_18
script_12. Script to perform the GWA mapping analysis (output: files.ps)
				Input : Dataset_6&genotypes
script_13. Script to add the standardized QTL effect (output: files.txt)
				Input : files.ps&Dataset_13
script_14. Script to choose the MARF threshold (output: Supplementary Fig. S15)
				Input : files.txt&Dataset_31
script_15. Script to extract the 5000 top SNPs / SNPs with -log10 p-value > 4 (output: Dataset_15&16)
				Input : files.txt
script_16. Script to draw Manhattan plots (outputs: Fig. 4a & Supplementary Fig. S8)
				Input : Dataset_9&files.txt
script_17. Script to calculate the enrichment ratios in flowering time candidate genes (output: Supplementary Fig. S7)
				Input : Dataset_12 & files.txt for bolting time
script_18. Script to estimate the temporal Fst (output: Dataset_20). http://dx.doi.org/10.5281/zenodo.375600
				Input : genotypes
script_19. Script to compute the allele frequencies for TOU-A1 and TOUA-A6 (output:genotypes_TOUA1.frq&genotypes_TOUA6.frq)
				Input : Dataset_21&genotypes.bed
script_20. Script to generate genomic information (Fst, TE, TM...) for the top SNPs (output: Dataset_22)
				Input : Dataset_6&7&15&16&18&20&genotypes&genotypes_TOUA1.frq&_TOUA6.frq
script_21. Script to retrieve the genes within 2kb of the 200 top SNPs and within 2kb of the SNPs with the highest Fst values(output: Dataset_17&29)
				Input : Dataset_17_bis&20&22(200SNPs)
script_22. Script to describe the genetic architecture of the 144 eco-phenotypes based on 200 top SNPs (output: Dataset_19)
				Input : Dataset_17&18&22(200SNPs)
script_23. Script to represent the frequency distribution of the effective number of eco-phenotypes affected by a SNP (output: Fig. 5a & Fig. S11)
				Input : Dataset_22
script_24. Script to compute enrichment for temporal Fst estimates & p-values (output: Dataset_24&25&26)
				Input : Dataset_20&22
script_25. Script to represent the genetic architecture underlying phenotypic evolution in the TOU-A population in situ when considering a threshold of 200 top SNPs (output: Fig 5)
				Input : Dataset_20&22&24
script_26. Script to represent the genetic architecture underlying phenotypic evolution in the TOU-A population in situ  when considering different thresholds (output: Supplementary Fig. S11)
				Input : Dataset_22&25
script_27. Script to plot the polarity effects of SNPs associated with evolved eco-phenotypes (output: Supplementary Fig. S14)
				Input : Dataset_22&24
script_28. Script to perform the enrichment in biological process in the 0.1% tail of the FST values (output: Supplementary Table S6)
				Input : Dataset_20&23&28