Semester Project - Milestone Tasks

Task: Problem Statement

The problem statement should provide a clear and concise description of the issue that will help guide your research and analysis. It should include the following components:

Based on your topic:

Name the core problem you want to investigate.
Provide context for the problem
- Who is affected by this problem? This could include specific groups of people, ecosystems, or other stakeholders.
- What are the specific harms that are occurring or could occur as a result of this problem? This could include health impacts, economic impacts, environmental impacts, etc.
- Why is it important to address this problem?
Formulate at least two specific research questions that arise from the problem statement and that you want to answer with your project. These should be specific and focused, and they should be answerable with data. This means there should be a quantitative aspect to the question that can be addressed.
Add your research statement draft (this just needs to exist and we can iterate on this) to the shared GitHub repository

Problem state template

The below is a good template for writing a concise problem statement (adapted from Dr. Papadakis; ISAT 491).

The purpose of this project is to [A]. Because of [B] and [C]. We expect our project to result in [D]. Which will allow [E].

A: Your environmental issue
B: Broad Background
C: Harms/ Impacts to stakeholders/ ecosystems
D: Your specific research questions
E: Any next steps

Identify datasets

We have already learned how to address tabular datasets. Use your problem statement and research questions to search and find potential datasets.

Note: If you don’t find any data, you might want to consider modifying your research questions. This would also be a good time to check in with me. I might know some datasets an/or will be able to help you define some more approachable questions.

For now, you should:

Find and download at least one dataset that could be used to address one of the research questions.
The data should be readable using Python-Pandas (this means it should be something that you could in principle open in Excel)
Create a small description document that contains the following information
- Describe what the dataset contains
  - variables, temporal coverage, spatial extent, …
  - available metadata
- Describe the source of the data:
  - how was it collected?
  - who owns/ has published the data?
- How does this relate to the research question?
- Think about FAIR and try to evaluate to what extent the data fulfills FAIR criteria.
  - For the final project report, you will need to discuss this in more detail.
Add your dataset (if <100 MB) to the Data folder of your shared repository
Add the description document to your shared repository (chose a suitable location)

Exploratory Data Analysis

Here are some specific steps and suggestions

With pandas and xarray (will be introduced soon) you should be able to load most datasets.
- If you come across a format you don’t know or don’t know how, please let me know.
You should download data ASAP and check whether you can open the datasets
Document what exactly you download for reproducibility.
Try to download only what you need. Especially gridded data can get big very soon.
- If datasets are less than ~20 MB each, put them on GitHub in a Data directory.
- If they are larger, put them into a cloud storage and provide links in the repository.
Create initial plots and plan your analysis.
- Are there issues with the data?
- What steps do you need to take to conduct your analysis?
- Do you need to modify research questions?
- …
- I will discuss them with your groups.
Make sure to have all code on GitHub