You might do this, but the call to seconds looks spooky. From audioread you get the sampling-frequency fs, that is the sampling-frequency you should send in to stft. If that matches the sampling-frequency from the study you're good. If that sampling-frequency doesn't match you will have to work - but making up a sampling-frequency or sampling-time array like you do cannot be anyting but an unnecessary risk, if you're right you might just as well have used fs if you're wrong you only set yourself up for more future pain and toil...
HTH
