Exploratory data analysis
In this assessment task, you will undertake an exploratory data analysis in the role of a consultant. The task is designed to test your R skills. You must present your findings, supported by data visualisations, in the form of a written report.
This task should take you 10 – 14 hours to complete.
A pharmaceutical company called Globex Pharma has contracted your consulting company to investigate their firmâ€™s workforce. Recently there has been some staff turnover at Globex, and the leadership team is aware of rumours that dissatisfied employees are seeking job opportunities elsewhere.
As it happens, the firmâ€™s employees have recently completed a survey (within the last year) measuring various attributes of them and their jobs. Here are the results of that survey, and an explanation of the variables in the data set:
Globex wants you to use this data to identify factors that might be causing staff discontent.
They would like you to submit a short report in which you consider three possible causes and draw a conclusion about each, supporting your conclusions with appropriate calculations and visualisations.
You should do the following:
Identify three research questions about the data. These should be questions about possible causes of staff discontent. For example: “Is there an imbalance between how much female staff are paid and how much male staff are paid”? Or: “Are single staff doing a disproportionate amount of overtime and not maintaining a healthy work-life balance”?
Try to answer each question by analysing the data. Explore ways of visualising the data that might help to answer each question, and perform any necessary calculations.
We encourage you to install and use RStudio to do your analysis, but if you’d prefer to work here in Ed you can use the “Assessment 2 workspace” slide that comes after this one.
If you’re using RStudio you might find the following helpful:
Once R Studio is installed, you can click to create a new R scriipt and then you can code there and save it each time and come back to that R scriipt. Hit the run button on the top right to run each line of code to get the output or you can highlight a portion of the code to run all at once or you can use the following shortcuts:
Ctrl + Enter: Run the current line and jump to the next one.
Alt + Enter: Run the current line without jumping to the next one.
Ctrl + Alt + R: Run whole scriipt.
The data set can be saved anywhere on your desktop. To import it, click import dataset and browse for the data file on your desktop. Once your data is imported you can write and run code just like here in Ed.
In your report you should:
State your three questions.
State your answer to each question, and support it with evidence. If you think the answer is clear from the data then say so and explain why. If you think the answer is not clear then say so and explain why. You should use your visualisations and calculations as evidence. Be clear and honest about any limitations of your analysis.
Draw conclusions. In light of what you have found, how would you advise the firm to address its concerns about staff discontent?
Include an appendix, in which you should explain how you conducted your analysis so that it can be replicated. If you have made any assumptions in your analysis, state all of those assumptions in an appendix to your report.
As a consultant you should demonstrate an understanding of the firmâ€™s context. You might need to do some research, such as a review of relevant industry publications and white papers.
There is no single correct answer to this task. There are many questions you might ask, and many ways to explore each question. You might need to try a variety of questions before settling upon your final three.
You might like to look for outliers in the dataset, and think about what we might infer from those outliers.
When exploring the data you will probably create a large number of graphs. In your report, only include the ones that support your answers.
You should propose research questions and conduct your analysis using what you have learned in the course . We do not expect you to use any specific statistical test (e.g. a chi square test). As long as your analysis can fully address your proposed question it is valid. See the marking criteria for more details.
Your report should be written in clear and simple language suitable for senior management.
It may include a table of contents, and an abstract/executive summary.
It should be no more than 1200 words, excluding cover pages, table of contents, abstract/executive summary, and R code. In-text references are included in the word count but any reference list you have at the end is not.
Words within figures are not included in the word count.
Figures and tables should be appropriately labelled.
It should be written with Microsoft Word or Google Docs and use correct spelling, grammar and punctuation.
Your language should be free of bias (including but not limited to race, gender, sexual orientation or disability).
Your references should follow the conventions of the Harvard system.