Frankly I don't understand the consternation. It makes sense to me on many levels. Please allow me to stick more closely to music than poetry, painting, etc. while understanding that they follow similar paths, serving similar evolutionary advantageous traits.
Sound is a very important part of our our awareness of our environment even though it is rather outweighed by vision. We have visceral reactions to certain types of sound based on frequency, harmonic content, rhythm, and sound pressure level. These seem likely learned over time and reinforced over many generations because of their usefulness in discriminating between threats, neutral events and benefits. Very low frequencies, for example, tend to be produced by possibly cataclysmic events like thunderstorms and earthquakes and we equate lower frequency sounds not only with bigger events but with bigger animals which at the very least demand attention, since those that didn't pay attention likely got eaten, crushed by landslides, lost in sinkholes,, struck by lightning, or suffered hypothermia, etc.
Repeating or cyclic auditory cues add another dimension, many of which originally seem to have came from bio-rhythms. For example we have been bipeds for a very long time and for the vast majority of our history this has been our sole means of transportation. It is nearly intuitive information that a very slow rhythm like a dirge coincides with labored walking, while a slightly brisk pace connotes confidence and optimism and a rapid rhythm is more like running, either to something exciting or away from something threatening.
We have had vocal cords for longer than we have been even remotely human so we could mimic these sounds and cycles and communicate some information about an event even before formal language. So far we aren't talking about music per se just our ability to make and hear sound and to possibly communicate very simple concepts useful if a group, family or tribe, is to survive.
To extend this through Evolution into an actual intended compilation of any complexity is necessarily contentious because it happened so very long ago and the only evidentiary records are exceedingly rare by their very nature ie no tape recorders and instruments originally were simply hollow logs and branches that are biodegradable in a very short time. So while it is speculation or "guestimation" to go back much further than 60,000 years ago for what we would call music, the fundamental tools were in place for a very long time before that. Aside from immediate survival concerns, there is obviously a survival evolutionary benefit to communication and "team spirit". Also, abstract thought is a key element in human evolutionary success and all manner of storytelling has a long history. Art seems part of that history.
There is a decent wiki on the subject here -
http://en.wikipedia.org/wiki/Prehistoric_music