Elementary Survey Analysis

Over a long period of two decades in the capacity of academician and practitioner of survey research and survey analysis, I have seen my students and executives fear one thing for sure – survey analysis. Anything having remote connection to statistics is like reading a death sentence. I on the other hand, found life there.
An experienced survey analysis pro will essentially do it at three levels – analyzing one variable at a time, two at a time and more than two at a time. When one does two variable analysis at a time, it is called bivariate analysis, while that for more than two variables at a time is termed multivariate analysis.
It would be a consensus to be aware of the fact that most of the survey analysis deals with single variable or two variables. Multivariate analysis finds its application on rare occasions. I do not mean to say that they are not important or not useful, but not found in practice.

Single variable analysis means one is dealing with one variable at a time. One employs techniques like tabulation which includes frequency and percentage. In some cases, central tension measures like mean, mode and medium too are employed.

Frequency means number of occurrences of a particular attribute – for example, one may report the gender split of a sample in a survey analysis. If there are 45 males who have taken part in a sample of 100, this will be termed as frequency of male participants. When expressed in percentage, it will 45% of the sample. When sample sizes are large enough (how to define a 'large' sample – that's a huge science in itself; we would consider 100 as reasonably large), the survey analysis reports contain percentage as a way of reporting. Such frequency reporting is called tabled data or tabulated data. The moment, there is something called cross-tabulation, it would mean two or more variables.

Almost all the variables (questions in the questionnaire) in a survey analysis are reported through a single variable table. You would generally find tables of age, income, gender, occupation, etc as a part of demographic reporting. In a shopping mall study, the attributes like number of times shops visited, amount of grocery purchases, etc are reported through tables like these.

At times, it becomes important to specify the most representative figure of the findings – for example, average bill of mobile phone or average number of shops visited before buying furniture. Whenever such a 'central figure has to be reported, one deploys mean, mode or median. All the three are 'advantages' in a sense, but with different meaning. Mean as an average is more useful when one wants to have a mathematical figure. For example, average number of footfalls each day over a month – mean will be most useful here. Mode is used when one does not really want to have a feel of mathematical average like mean – for example, colors most preferred by you while shopping. Thus, red and green may come out to be the ones with maximum frequencies. We would call red and green colors to be the main 'modes'. Median is used when there is continuity of data – for example, age or income of the respondents. When data is continuous in nature, mean may lead to misleading concluding. So, median is used in such survey analysis tables.