Supplementary Files
- Here is the basic descriptions on the supplementary files (ReadMe.txt).
- All files are plain tab-separated text files compressed with WinZip.
1.
(Di)nucleotide frequences related with germline methylation as well as other
general information for total human transposable elements (~4,200,000)
- The (sub)classification of transposable elements and divergence rates from
consensus are according to UCSC genome browser (as the results of
RepeatMasker program/database of RepBase)
- Due to the large amount of data, the database is
divided into four subsets according to their assigned chromosomes.
- Repeat_TotalInfo_1.zip (Chromosome 1-4)
- Repeat_TotalInfo_2.zip (Chromosome 5-9)
- Repeat_TotalInfo_3.zip (Chromosome 10-16)
- Repeat_TotalInfo_4.zip (Chromosome 17-22, X and Y)
2. Three major classes of human repeats (Alu,
L1 and LTR) located within 3000 bp from neighboring CpG islands.
- Note that Alus and L1s are the dominant and recently amplified families of
SINE and LINE (short and long interspersed nuclear element, respectively).
- The distance from CpG islands was calculated with the closest CpG islands
for individual elements, and those located within 3000 bp from nearby CpG
islands were selected and provdied.
- AluLess3000fromCpGisland.zip
- LTRLess3000fromCpGisland.zip
3. The information of ~220,000 Alu pairs
with basic information.
- 220,000 Alu pairs (two Alu elements less separated by 650 bp) were
classified into two pairs; inverted (two Alu elements have divergent
convergent orientation, tail-to-tail or head-to-dead) and direct (two Alu
elements have parallel orientation) pairs.
- Using BLAST algorithm (bl2seq.exe), the 'homologous segment' within two
Alu elements in determined (used Perl scripts are available on request). The
separating distance (bp) and sequence homology (%) between homologous
segments are then calculated and used to further classify the Alu pairs.
Note: For any question or request for Perl scripts, please contact the developer.