The AVESTAGENOME Project®

The AVESTAGENOME Project® was conceptualized with the intent of building a complete genealogical, medical, and genetic database of the Parsi population in India. In addition, it seeks to use the power of Population Genetics and Systems Biology for discovering novel biomarkers and novel drug targets for complex diseases.

The Parsis are a well-defined population of less than 50,000 individuals in India. Well-documented genealogical charts demonstrate an unbroken lineage for 300 years. Generations of marriages within the community resulted in an increased evidence of positive traits, such as greater longevity, and certain diseases, like stroke, heart disease, specific cancers, Parkinson’s and Alzheimer’s.

The Project aims at studying human diseases by bringing together genetic, disease and genealogical data of the Parsi community and studying the links between these aspects with a view of advancing the knowledge of inherited components for human disease.

OBJECTIVES
» To create a database of the genealogy and epidemiology of the Parsi community
» To determine the genetic basis of longevity
» To determine the genetic component of human diseases
» To archive the Parsi population for a systems bipogy analysis
» To develop drug targets and molecular biomarkers for predictive, preventive and personalized healthcare

A Systems Biology Approach

The AVESTAGENOME Project® uses a Systems Biology approach to investigating the samples. This involves Genomic, Transcriptomic, Proteomic and Metabolomic analyses of the samples.

Epidemiology

The AVESTAGENOME Project® is the largest epidemiological and genome study ever conducted in India. Apart from collecting blood samples from volunteer participants belonging to the Parsi community, each participant is asked to complete a detailed questionnaire that seeks information about the complete medical and treatment history of self as well as of family, i.e., parents, grandparents, spouse, etc. In addition, a large number of questions seeking information on socio-demographic factors such as nutrition, psychology, general health, reproductive factors, lifestyle factors and quality of life are being asked of each participant. A review of the literature from earlier epidemiological studies has shown significant associations between some of these exposure variables as aetiological factors for specific diseases/conditions.

The data so collected is checked for quality and completeness. Once the data is verified, a detailed analysis will be carried out using modern epidemiologic methods in order to determine the aetiological factors or risk/protective factors associated with a number of complex and common diseases/conditions prevailing in this population. These results help us for predicting the risk in individuals exposed to specific environments or conditions. This will help us to evolve preventive measures so that the disease may be controlled or eradicated in future. It may also provide a foundation for developing public policy and regulatory decisions relating to environmental problems.

In order to identify the aetiological factors (both environmental and genetic) that may be involved in a segment of the selected population, case-control studies of the Parsi community with its unusual site patterns of risk should be undertaken. Such a study may help in establishing the differences, if any, in a number of factors, such as age at marriage, breast feeding habits, number of pregnancies, smoking and chewing addictions, level of nutrition, personal hygiene, economic status, general health, quality of life, etc. The AVESTAGENOME Project® aims to achieve these goals.

Collection Process

Healthy Parsi adults with no history of severe anaemia are invited to volunteer for The Avestagenome Project®. After ascertaining the weight, height and blood pressure readings, and the attendant physician is convinced that the volunteer’s physical condition is conducive, about 50ml of blood sample is drawn in about 10-11 tubes using a closed system BD Vaccutainer. (One tube each for RNA, DNA and Archival; two tubes each for Cells and Plasma collection; three tubes for Serum collection; and if possible, an additional tube for Plasma). Specially trained phlebotomists are employed for this exercise. For senior citizens who appear to be frail, the quantity of blood drawn may be significantly decreased (only 7-8 tubes). The tubes are barcoded to maintain total anonymity. These tube samples are then frozen and dispatched within 24 hours to the Avesthagen Cryostorage facility at Bangalore. Volunteers are expected to provide medical and treatment information about themselves and their family members. The genealogy and epidemiology as also socio-demographic data of the volunteer and the family is collected in great detail through a questionnaire that has been especially designed so that the scientific study and its findings can be most accurate.

Samples collected so far:

City Number of Samples
Hyderabad & Secunderabad 437
Navsari 687
Surat 708
Ahmedabad 560
Pune 513
Bengaluru 178
Mumbai 1108
Chennai 94
Coonoor 25
Delhi 127
Total 4,437

Bioinformatics & Database Design

An architecture has been set up to accept inputs from questionnaires previously designed by the epidemiology team that should have the ability to incorporate new parameters and be able to cross-reference with databases in the public domain (TIGR, HUGO, FUGO, MAFF, SANGER, EMBL, etc.).

A comprehensive pipeline enables storage, retrieval and analysis of clinical data, which is fed into the genomics platform that is expected to contribute towards the identification of specific disease markers or traits. The genomics platform uses Affymetrix chips in the discovery and analysis of DNA polymorphisms and in the analysis of the Transcriptome. Proteomics and Metabolomics enable the identification of biomarkers by utilizing high-throughput techniques including Mass Spectroscopy. The data generated from these platforms is hosted and supported with high-end servers. Furthermore these platforms make use of several standard open source software such as PLINK or DeCyder that is complemented by in-house tools such as The AVESTAGENOME Project® Annotation database to integrate these systems and make them fully functional.

Genomic Analysis

This platform involves the study of genes and their association with longevity as well as different disorders. In this study, specific markers would be analysed using the Affymetrix microarray platform on the individual samples chosen from the Parsi population. These markers include Single Nucleotide Polymorphisms (SNPs) which are substitutions of a single nucleotide base (Adenine, Guanine, Cytosine or Thymidine) with another. In some cases, these SNPs are expressed differently in patient and normal control samples. The association of these SNPs with a particular condition would give insight into the genetic or molecular basis of the disease.

The instrumentation used for such a study includes the Affymetrix microarray platform instrumental in identifying SNPs, which along with statistical analysis would give the association of a particular SNP to a disease. In order to analyse SNPs on hundreds of samples in a fast and efficient manner, microarray chips are used. These chips are spotted with thousands of DNA fragments containing SNPs at an interval of about 10cM covering the entire human genome. The microarray chips also include copy number variable regions (CNVs) of the genome that differ in the number of copies present in different individuals. The Affymetrix genome-wide SNP Array 6.0 used in this study contains more than 906,600 single nucleotide polymorphisms (SNPs) and more than 946,000 probes for the detection of copy number variation.

For further validation and fine-mapping of the identified SNPs with disease, a different cohort of samples is used. The platform used for the most cost-effective and accurate SNP fine-mapping studies is the Sequenom Massarray system. This is a mass spectrometric instrument working on the principles of MALDI-TOF (Matrix-assisted laser desorption ionization - time of flight).

The implications of such a genomics study lead to the development of diagnostic tests and an insight into the molecular mechanism of disease, which can be used to identify drug targets and develop therapeutics.

Transcriptome Profiling

The transcriptome being the collection of all the mRNA produced by a cell or cell population, has in recent years also become amenable to high-throughput analysis with the use of microarray technology. This is especially useful as the global view afforded to a scientist by the use of microarrays, especially with the ability to analyze genome-wide expression profiles for every sample, makes it easier to describe biological mechanisms of complex pathways in diseases and other conditions. Our transcriptome platform is equipped for these quantitative global gene expression-profiling experiments with the Affymetrix GeneChip microarray system, using either the Human Exon 1.0 ST Array or the Gene 1.0 ST Array. We can further validate selected transcripts from the microarray studies using quantitative real-time PCR using the ABI PRISM 7900HT sequence detection system.

The transcriptome profiling platform is used to detect differential expression of genes in case and control samples. The transcriptome is the complete profile of gene expression products (RNA) derived from the genome of an individual. Genes have a definite expression pattern and each gene codes for a specific protein. Thus, any changes in the expression of the gene would lead to either an increase or decrease of products, which might play an instrumental role in susceptibility to or progression of disease.

The quantitative study of the transcriptome in case and control samples would give a good idea of the candidate gene involved in disease development and could serve as a biomarker for the particular disease. Also, correlating the expression pattern to the data obtained from genomic studies would give an idea of the functional significance of the SNPs identified. This information can be applied to diagnostics, used as biomarkers for disease states and/or drug development. The genome and transcriptome profile go hand-in-hand due to their interdependence and thus would be carried out simultaneously on all samples.

The platforms set up for transcriptome profiling include the Affymetrix Microarray Systems. The RNA obtained from blood samples of case and control samples is isolated and labelled-cDNA prepared from it. The labelled c-DNA from the samples would be hybridized together onto the microarray chip. Further confirmation of these differentially regulated genes would have to be carried out by independent experiments such as Northern blot analysis.

Proteomic Analysis

Comprehensive, systematic characterization of the plasma proteome in healthy and diseased states greatly facilitates the development of biosignatures for early disease detection, clinical diagnosis, and therapy. Blood plasma is the most complex human-derived proteome containing other tissue proteome subsets as well as a wide dynamic range of protein concentrations. Therefore, the characterization of biosignatures in the human plasma proteome is a very complicated task. Advances in methods and technology for profiling plasma proteins now enable construction of a comprehensive pipeline from candidate discovery, qualification, verification, research assay optimization, validation to eventual commercialization.

Analysis of the proteome from samples obtained from this population is of prime importance. The plasma and serum is isolated from the blood sample collected and flash frozen on site prior to be being shipped to our central cryostorage facility. The sample is then analyzed using a 2D-gel electrophoresis (DIGE) - MALDI-TOF system to determine the differential plasma and serum proteome profile for cases and control samples.

Proteomics is the study of content and function of the different proteins expressed by the genome in an individual. Proteins in blood samples can be isolated and analysed using two-dimensional or multidimensional chromatography. The platforms set up at Avesthagen to carry out proteomic profiling are Differential In Gel Electrophoresis (DIGE) analysis and Multidimensional Protein Identification Technology (MudPIT).

EttanTM DIGE: Differential In Gel Electrophoresis (DIGE) allows an investigator to reliably compare the expression level of proteins between two samples on a 2D gel platform.

MALDI-mass spectrometer: Differentially expressed proteins are excised from the gel and subjected to enzymatic digestion giving rise to small peptides which are then analysed in the MALDI-TOF mass spectrometer. The output from the mass spectrometric unit would be analysed by software to give the peptide a mass fingerprint. Comparison of the results to previously identified protein databases would lead to the identification of the specific protein of interest.

Multidimensional Protein Identification Technology (MudPIT): This technique is used for the separation and identification of complex protein and peptide mixtures. The peptides are directly eluted to the mass spectrometer which dissociates the sample and analyses it generating a tandem mass spectrum. These spectra are matched to a database and the peptides and protein are consecutively identified.

Proteomic technology plays an important role in drug discovery, diagnostics and molecular medicine because it is the link between genes, proteins and disease.

Advances in proteomics may help scientists eventually to create medications that are ‘personalized’ for different individuals to be more effective and have fewer side effects. Current research is looking at protein families linked to diseases including cancer, diabetes and heart disease.

Metabolomic Profiling

The Metabolomics Group of The AVESTAGENOME Project® is focused on the following objectives:
» To characterize the entire spectrum of low molecular weight chemical metabolites (LMC) found in the blood plasma of Parsi population.
» To construct an in-house blood plasma metabolite library for the above said population and to identify the potential (LMC) biomarkers for a wide spectrum of diseases.

The instrument used for the identification and characterization of LMCs is LC-MS/MS (MDX SCIEX, Applied Biosystems Inc.). The characterizations of the spectrum of LMCs from the body fluids, such as plasma help understand the physiology/ pathophysiology and the homoeostasis of an individual.

The construction of specific Metabolite Library is very useful for the unilateral and precise screening of the entire metabolome of an individual. The Metabolite Library enables clinicians and psychologists to target pathophysiology, pharmacogenomics and the psychophysiology condition of an individual for the early diagnosis of a wide spectrum of diseases and also understand the prognosis/ clinical response. The Metabolomic Library will enable scientists to understand the pattern of LMCs in different conditions. The differential expression, diversion of metabolic pathway, regulation of metabolites and their influence on divergent pathways are viewed as signals that help figure out potential biomarkers for specific target disease diagnosis.

The metabolome is a quantitative description of all low molecular weight endogenous metabolites in specified cellular, tissue or biofluid compartments. Most common diseases are caused by complex interactions between genetic factors, diet, other life style factors, and the environment. All these factors may influence the spectrum and concentrations of metabolites and other low molecular weight compounds in tissues and body fluids. These low molecular weight compounds that are involved in or affected by disease processes may serve as disease biomarkers. Compounds such as carbohydrates, nucleic acids, amino acids, lipids, various hormones, and phenolics have usually been individually measured in studies of diseases. Recent advances in NMR, GC/MS or LC/MS technology, have improved the analysis of low molecular weight compounds and thereby enabled more global metabolomic approaches for identifying novel markers for specific diseases, understanding more about the biology, as well as lifestyle and dietary factors behind the disease.

Two approaches shall be followed to achieve a comprehensive metabolite profiling from plasma of the Parsi population:
» Gas chromatography/ mass spectrometry (GC-MS): GC-MS is a coupled system with the gas chromatography unit and the mass spectrometer. Each component separated by chromatography is analysed by the mass spectrometer in an attempt to identify it. Identification of compounds will be based on comparison with mass spectra libraries as well as retention index.
» Liquid chromatography/ mass spectrometry (LC-MS): LC-MS is performed on 4000 Q TRAP mass spectrometer. The components of the sample are separated by liquid chromatography followed by an accurate mass measurement by the time-of-flight (TOF) mass spectrometer.

Donor Consent

Donation of samples (usually a few ml of blood) along with filling up the relevant questionnaire is strictly and entirely voluntary. The donor is explained the purpose of the project, following with the donor signs a consent form that certifying that the donation is voluntary, he/she understands the purpose of the Project, that he/she consents to his/her samples being used for research purposes, that this research might lead to development of novel biomarkers and drug targets for diseases, and that he/she has the right to withdrawn his/her consent of his/her biological material being used in the study.

Ensuring Donor Confidentiality

The AVESTAGENOME Project® has put in place systems to ensure that the confidentiality of all participants in the Project is maintained. The Project is structured in such a manner that all personal identifiers are de-linked from the biological samples taken from the Project participants. The identity of the participant is made anonymous with the help of an efficient bar-coding system. The biological samples are recognized by their bar-code, and not by the name of the participant. The company ensures that the Project protocols prevent the personal identity of the participant from being revealed and from being associated with any particular biological sample. Participant confidentiality is treated with the utmost importance in the structuring and execution of the Project.

No personal information relating to any participant is accessed by or given to any person outside the Project. Even within the Project, such information will only be collected or accessed by specified individuals designated for the purpose under the Project and following specified procedures relating to the handling of such information. No third party will be permitted to gain access to such information. The procedures set out under the Project for the same will be approved by the Institutional Ethics Committee, which may impose such terms and conditions as it may feel necessary for the maintenance of strict confidentiality in this regard.

Collection Process

Healthy Parsi adults with no history of severe anaemia are invited to volunteer for The AVESTAGENOME Project®. After ascertaining the weight, height and blood pressure readings, and the attendant physician is convinced that the volunteer’s physical condition is conducive, about 50ml of blood sample is drawn in about 10-11 tubes using a closed system BD Vaccutainer. (One tube each for RNA, DNA and Archival; two tubes each for Cells and Plasma collection; three tubes for Serum collection; and if possible, an additional tube for Plasma). Specially trained phlebotomists are employed for this exercise. For senior citizens who appear to be frail, the quantity of blood drawn may be significantly decreased (only 7-8 tubes). The tubes are barcoded to maintain total anonymity. These tube samples are then frozen and dispatched within 24 hours to the Avesthagen Cryostorage facility at Bangalore. Volunteers are expected to provide medical and treatment information about themselves and their family members. The genealogy and epidemiology as also socio-demographic data of the volunteer and the family is collected in great detail through a questionnaire that has been especially designed so that the scientific study and its findings can be most accurate.

Information for Sample Donors

Coming soon.

Benefits of the AVESTAGENOME Project®

The AVESTAGENOME Project® aims to uncover the basis of longevity in the Parsi population and the preponderance of some age-related diseases.

Through detailed analysis the key genes linked to disease could be isolated and their function and interaction with other genes within the biological pathway of disease studied. This approach homes in directly on the inherited components of human disease that will result in population-validated drug targets and diagnostic markers.

The results of a study on the Parsi population could have wide-ranging implications on human health for the general population around the world.

By bringing genetic, disease and genealogical data of a population together one has already been able to locate major genes involved in some of the most common diseases, and has isolated genes in a few of these. The database generated from this project will allow us multiple future uses:
» Identification of genes linked to diseases
» Risk prediction and planning of strategies to address prevention of disease
» Development of diagnostic kits for early prediction of disease; development of new drug targets
» Identification of new gene therapy targets.

This will be of great benefit to future generations of the Parsi community and humanity worldwide.

Contact Us

Email: info@theavestagenomeproject.org
Phone: