Forums

Converting a gene sequence into numpy array

Hi all,

I am new to biopython. Is there any function to convert gene sequence into digits. I tried the code import numpy as np import Bio

seq = 'ACTTCAG' seq_array = Bio.numerize(seq) print(seq_array)

But got an error that "Traceback (most recent call last): File "<input>", line 1, in <module> AttributeError: 'module' object has no attribute 'numerize'"

Kindly help me in this issue

Hi all,

I am new to biopython. Is there any function to convert gene sequence into digits. I tried the code import numpy as np import Bio

seq = 'ACTTCAG' seq_array = Bio.numerize(seq) print(seq_array)

But got an error that "Traceback (most recent call last): File "<input>", line 1, in <module> AttributeError: 'module' object has no attribute 'numerize'"

Kindly help me in this issue

I don't think there is a function to do that that I know of. You would probably have to define your own alphabet. The simplest way would be to use a dictionary to convert the bases to numbers. Something like this:

seq = 'ACTTCAG'
conv={'A':1,'C':2,'G':3,'T':4,'N':0}
np.array([conv[item] for item in seq])

For anyone else coming across this thread later, there's some discussion about how to convert a text sequence into numbers (I think involving the same original poster on this thread) over on the biopython GitHub page.