Some of the stuff posted is a bit ethnocentric IMO. I mean 'bass' is not a generally recognised essential feature of music as a whole at all. Counterpoint has a lowest line, but it's not a 'bass' as such - I reckon you could argue that true bass doesn't arrive until figured bass, Rousseau's treatise on harmony etc.
I reckon music is an emergent phenomenon of other kinds of mental activity, including the warning/affective cries and gestural communication that precede speech in out primate ancestors, the necessity of auditory streaming of an environment (an amazing feat, given that it's all constructed purely out of a single point of pressure variation within the ear), the illusory construction of a single present moment out of the integration of various concurrent perceptual streams, and other random abilities like what I would call the timbral sense - i.e. the knowledge of what kind of thing would make a noise like that (there's a survival advantage in knowing if the thing crashing through the trees is a gazelle or a rhino, and knocking on metal never sounds like knocking on wood).
Mixed in with that you have the added complications of acoustics per se (reflection, absorbtion etc), psychoacoustics (like the 'beating' that you hear when two notes are close together in pitch, which doesn't exist in the outside world but is a product of your brain integrating iinformation from two ears facing in opposite directions). Psychoacoustics is what gives you 'notes' in the first place - apart from pure sine tones all notes are made up of a variety of frequencies, and if you bang on a relatively untuned saucepan you can hear the apparent pitch change as your brain tries to get a single (or multiple) note 'lock' on an essentially unrelated set of overlayed sinusoids. The subsidiary tones in a complex tone also give rise to formants etc, which are essential to voice timbral recognition. This is a huge topic and still not well understood, although there are a few folks on here who are working on the problem (I'll let them 'out' themselves if they want to).
Then you have the higher-order enculturated stuff - like the categorical perception of scale steps (octaves are fairly ubiquitous, but the rest of the scale is learned to a large extent) and topical association (the real basis of musical affect), which is also essentially arbitrary in a semiotic way. For example, death music is slow and sombre in many cultures, but fast and ever-increasingly frenzied in others. Carl Dalhaus complained that the lydian fourth (I think - don't quote me!) was used in the 19th century as an exoticist topic, a historicist one, a religious indication etc etc - which one was approriate being indicated by the surrounding circumstances - like the title of the piece!