Cloudworks is no longer accepting new user registrations, and will be closing down on 24th June 2019. We hope to make a read-only archive of the site available soon after.
Introduction to statistical analysis
Paul Garthwaite Workshop 3A and Workshop 7B
Cloud created by:
13 March 2010
Tuesday 23 March, 2.45pm-3.45pm
Central Meeting Room 11
Wednesday 24 March, 3.15-4.15pm
Central Meeting Room 11
If you think your research will have a quantitative component that may involve some statistical analysis then please attend this session. If you are not sure about whether you will be using statistics in your research but would like to know about the role of statistics in research then this workshop is also for you. The workshop will introduce the use of statistics in research as well as the statistical advisory service that is available to support your research at the OU. You will be able to talk about your research and discuss any early concerns you have about any future statistical analysis you may perform.
Paul Garthwaite is Professor of Statistics at the Open University. He obtained a first degree in mathematics from Oxford and a PhD in Statistics from Aberystwyth. His PhD thesis won an international prize. He has a broad range of interests in statistical theory and much practical experience of designing experiments and analysing data in medical applications, psychology, and human and animal nutrition. He has jointly supervised PhD students working in the OU Departments of Earth Sciences and Computing (as well as Mathematics and Statistics) and others working in forestry, fisheries and public health. Paul has co-authored two books and eighty papers. Currently he is Head of the Statistics Group and leads the Statistical Advisory Service.
Paul Garthwaite (Session 7B)
Introduction to Statistical Analysis (Wednesday, 24th March, 2010, 3:45 pm).
Fairly large room with about 12 students.
Paul is introducing himself.
He is not certain how much statistics that the students have
He is going to link the statistics with the scientific method. He says the presentation would go up on the web soon.
He is asking whether all the students did some statistics at some stage. Most students have.
Scientific Method / Deductive Reasoning
He is introducing the scientific method i.e. hypotheses to tests
You could never test to see if your theory is correct but you can test if it is wrong – but after a while you can build up some idea of where your theory holds. Gives the example of Newton’s law of motions.
He is showing a number of theories (phrased as hypotheses).
You start with the hypothesis, then do the experimental design, followed by the data collection and analysis and then the conclusions which would then inform some other hypotheses.
The hypotheses testing is like the court of law – you start of not thinking it is true – i.e. the null hypothesis.
From inductive reasoning – you start with observations, notice a pattern, have a tentative hypothesis and then form a theory.
Indicates that some disciplines like psychology disapprove of observational studies ...
Pilot studies are good:
- If there are clear patterns then you can form theories
- You can find flaws in your questionnaires or methodologies
You’re extrapolating from a small part to a large population – that is why you need the sample to be representative and random
Warning of poor designs:
- Inefficient use of collected data
- Difficult data analysis
- Inability to draw meaningful conclusions
When designing experiments use common sense rather than statistical ability:
- The questions your research could answer
- Could you gather data related to those questions
- Would the data answer these questions (using common sense)
Uses the example of diabetes and diet and the data collected:
- use patient notes to get age at death
- age at diagnosis
- weight loss in first year after diagnosis
Statistics just makes common sense rigorous by trying to determine the random variation.
He mentions covariates i.e. the things you’re looking for might be obscured by other factors such as gender and age which are called the covariates
He is saying you should gather lots of data i.e. more data = firmer conclusions – it is better to collect more data initially than decide to go test something later.
How much data?
- For a controlled experiment: perhaps 40 independent observation
- With observational and questionnaire data: 150 data with 25 observations in each category
- More data is needed with counts (i.e. frequencies) than measurements
- More data is needed with binary quantities (yes/no, cured/not cured) than with Likert scores
Explaining what 5 point Likert scales are: i.e strongly agree, weakly agree, indifferent, disagree, strongly disagree. You can code them 1, 2, 3, 4, 5 respectively. Open ended questions are harder to analyse.
Statistical Data Analysis
- First produce summary statistics such as means, percentages, graphs, bar charts
- Try to get a feel for your data – what does it tell you?
- Form some quantitative hypotheses that you think the data will refute
Fundamental statistical methods
- T-tests – comparing two groups
- Comparison of proportions: comparing the proportion of two groups as in a binomial
- Contingency tables : used for finding associations between two groups – cross-tabulation
- Analysis of variance: talking about the blocking system which was developed from agriculture in determining how to decide treatments in experiments
- Tests whether one variable affects another while controlling for other variables such as the covariates
- Used for describing the relationship
- Stepwise methods help you find/ test which variables are important
- Generalised linear models add flexibility
He indicates that there are a few people who you can ask for advice before gathering data – and there is an advisory service he runs about design etc.
- Remember garbage in is garbage out
- The software is just a number cruncher
- Important to choose an appropriate method for your data
- Types of software e.g. spreadsheets, genstat, minitab, SAS, statistica, SPSS
Statistics Courses at the OU
- M248: Analysis Data – this is a good beginners course
- M249: Practical Modern Statistics - covers stuff not in M248
- M343: Probability (?)
- M346: Linear Statistical Modelling
Statistics Advisory Service:
Drop in Sessions:
- Mondays 2 to 4 pm (M216)
- Thursdays 10:30 to 12:20 (M214)
Sessions are in the Maths and Computing Building
15:32 on 24 March 2010 (Edited 15:03 on 8 June 2010)