If the purpose is to detect traffic cones, then you must train your detector on a set of images, some of which include cones and some of which do not include cones.
If you have a restricted set of "views" that can be input, then you only have to train on the possible views. For example if you are using fixed-view cameras looking at one of 10 different highway locations then you only need to train on those.
But if anyone can upload any picture to the detector, then your input has to be more varied. It would be... embarassing... if your program reported that this was a traffic cone: