Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding distance calculation for Fst #1765

Open
jencs011 opened this issue Nov 25, 2024 · 2 comments
Open

Question regarding distance calculation for Fst #1765

jencs011 opened this issue Nov 25, 2024 · 2 comments

Comments

@jencs011
Copy link

Hi,

I'm trying to replicate analysis completed back in 2013 using an older version of Hyphy. The analysis I'd like to run would includes estimating distances for Fst using an ML approach under a GTR nucleotide substitution model, estimating all parameters independently for each branch. I'm currently using Hyphy from the command line (interactive mode) and I've chosen the options shown below:

                    +--------------------+
                    |Distance Computation|
                    +--------------------+

    (2):[Full likelihood] Estimate distances using pairwise MLE. More choices but slow.
 

                    |Data type|
                    +---------+
    (1):[Nucleotide/Protein] Nucleotide or amino-acid (protein).

           | Select a standard model. |
           +--------------------------+

    (GRM):General Reversible Model.Local or global parameters. Possible Rate heterogeneity (and HM spatial correlation).
   

|Model Options|
                    +-------------+

    (1):[Local] All model parameters are estimated independently for each branch.

When I run this I get the error "The dimension of the equilibrium frequencies vector 'codonFrequencies' (4) doesn't match the number of states in the dataset filter (64) 'twoSpecFilter".

The input data is nucleotide sequences of extracted ORFs of HIV and the codons may not match up with the regular start codons. Is this why I'm getting an error? Can you please tell me which options I should choose to replicate the 2013 analysis mentioned above?

Please let me know if I need to provide any further details.
Thanks for your help!

spond added a commit that referenced this issue Nov 25, 2024
@spond
Copy link
Member

spond commented Nov 25, 2024

Dear @jencs011,

I can confirm that it occurs for me as well, on the first dataset that I tried. Digging deeper, I noticed that at some point in the recent past, a bug was introduced into the F_st which would incorrectly route you down the Codon analysis path for Full likelihood options.

I fixed the issue, and included it in the 2.5.64 release today.

Best,
Sergei

@jencs011
Copy link
Author

Great, thank you so much Sergei!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants