How to enable the optimizer to determine the order of some elements
2 次查看(过去 30 天)
显示 更早的评论
For a little background I am working on an iGEM project (genetic engineering) and I have a bunch of DNA sequences for proteins. I need to figure out the optimal order for them so that I can break them up so that the first chunk is 500 bp long and each following chunk is 20 bp from the previous part + 480 bp that are next. I need to order them so that we use our 500 bp chunks as optimally as possible and so that proteins span as few of these 500 bp chunks as possible.
I know how to write a costing function so that given an order for these proteins I can determine how good that order is but I am currently at a loss for how to get the global optimizer to be able to change this order. The only thing that matters is that I end up with something I can order so I could take 20 arguments with each one being a number for its position, a number that I could put in order, or a vector that just had the order of the proteins in it. I just don't know how to generate that structure.
If I use a number for each one the optimizer could try to put more than one at the same value and if I give the optimizer a very large cost for those so that it gets thrown out I suspect it won't be able to find a good solution since most of the solutions it tries will be not possible. As for having it just shuffle the order of a vector I have no idea how to do that.
Any help would be appreciated. Thank you
Here is my costing function The function gives a cost (where lower is better) based on the order chosen. The only input that needs to change is order and it is just a vector that contains the indexes into the genes cell array.
For the data I have right now I would need a vector that is 17 elements consisting of the integers 1 to 17 in any order but without duplication and allow the optimizer to try various permutations to come up with an optimal solution.
function [ cost ] = gene_cost( genes, order )
%UNTITLED Summary of this function goes here
% Detailed explanation goes here
% genes is a cell array where column 1 is the gene name and column 2 is the
% DNA sequence, and column 3 is the length of the gene there is one row for each gene
%order is just a vector that defines the order of the genes
%ex [10 7 5 3...]
temp = genes(order,:);
upper_interval = zeros(1,length(temp));
for i=1:length(temp)
upper_interval(i)= sum(cell2mat(temp(1:i,3)));
end
lower_interval = [0 upper_interval(1:length(upper_interval)-1)];
min_cost = transpose(ceil(cell2mat(temp(:,3))/500));
dna = strjoin(transpose(temp(:,2)),'');
dna_length = length(dna);
%get first 500 bp and put it in the first chunk
block = {};
piece = dna(1:500);
block = vertcat(block, piece);
genes_in_block = 1 < upper_interval & lower_interval < 500;
cost = sum(genes_in_block,1);
for i = 481:480:dna_length
if dna_length -i < 500
chunk_size = dna_length -i;
else
chunk_size = 499;
end
piece = dna(i:i+chunk_size);
block = vertcat(block, piece);
genes_in_block = i < upper_interval & lower_interval < i+chunk_size;
cost = cost + sum(genes_in_block,1);
end
cost = sum((cost-min_cost).^4); %the power is used to give a high penalty for taking up more too many more blocks that necessary
2 个评论
Jonathan Epperl
2013-5-25
"Shuffling" a vector can be done with perms or randperm.
I don't think I completely understand what you are trying to do though, could you have another go at explaining your problem, maybe without referring to genetics at all?
Matt J
2013-5-25
I know how to write a costing function so that given an order for these proteins I can determine how good that order
Please write that for us so that we can see it, too.
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Multiobjective Optimization 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!