Semester Project

Timeline & Steps

Step Target Date
  1. Select Initial Environmental Issue (completed)
Week 6
  1. Initial Research and Problem Statement
    • Write a research statement that
      1. Outlines the problem
      2. Identifies affected people, ecosystems, …
      3. Identifies specific harms
    • Define research Question
    • Identify questions for background Research
    • Begin collecting data sources
    • Deliverable: Have a first draft of the problem statement that
      • identifies two potential research questions
      • links these to a broader question and specific harms/ issues
      • who is affected, what is unknown…
Week 7
  1. Conduct Background Research
    • Using the problem statement as a start, identify what is unclear

    • Should explain the system, harms, and how it affects

      • stakeholder
      • ecosystems
  2. Datasets
    • Present and submit an initial dataset
    • Describe the data origins, what is contained in the dataset, …
    • Check with Dr. Gerken about your datasets (Week 7-8)
Week 7
  1. Exploratory Data Analysis
    • Acquire and load data
    • Do exploratory data analysis, like we did in the lectures
    • Develop ideas for additional analysis (/w Dr. Gerken)
Week 8+
  1. Data Analysis
  1. Report Writing

Requirements

Your semester project should

Tip

Let me know if you run into technical issues or need help as early as possible, so that I can help you.

  • report your results in a Jupyter notebook that combines text, figures, and code to generate these figures.
  • make use of at least two datasets (observations or models)
    • these datasets should be described and you need to document how the data was acquired
  • be placed in the Semester Project GitHub repository containing
    • a directory structure as discussed in class. At a minimum, there should be folders for

      -Main_Repository|
                      |- Data 
                      |- Code 
                      |- ProjectReport
                      |- OtherMaterials
                      |- Documentation 
                      |- <folders as needed>                    
    • an environment.yml file to document your computational environment. See instructions here for how to create this file with Anaconda Navigator.

    • code for all analysis (including any analysis not contained in the final report, such as exploratory analysis or code to download data). Place these into the code directory

    • contain a README.md file to explain the contents of the GitHub repository and its purpose.

    • data used or a description of the data and how it was acquired/ downloaded (in the Data directory)

    • move other files, such as working docs into the OtherMaterials folder

A sample report structure is found in this document

Grading Rubric

Note

Because you will present the main results in class, the grading rubric emphasizes criteria related to data management, methods, and background information.

  • Formal Requirements (20%)

    • Uploaded/ submitted via GitHub
    • Uses a markdown or .ipynb format
    • GitHub directory has adequate directory structure
    • GitHub directory contains a Readme.md file containing information about the project
  • Report Content:

    Note: Consult the Semester Project Outline for details.

    • Problem Introduction (25%):
      • adequately introduces the problem using a problem statement
      • provides sufficient background using external references (with References Cited section) for a scientifically minded person to understand the problem space
    • Data and Methods (30%):
      • describes the datasets including sources, identifiers, and links
      • provides background about datasets
        • Including exploratory analysis
      • outlines your data processing including how and why it was done
    • Results, Discussion, and Conclusion (25%):
      • the results in the report support the presentation results, ideally expanding on the presentation
        • Provide evidence for relationships between variables, groups, etc.
      • the discussion expands on the discussion presented in the presentation, especially regarding data aspects and future implications of results
      • a logical conclusion is presented

Task: Problem Statement

The problem statement should provide a clear and concise description of the issue that will help guide your research and analysis. It should include the following components:

Based on your topic:

  1. Name the core problem you want to investigate.
  2. Provide context for the problem
    • Who is affected by this problem? This could include specific groups of people, ecosystems, or other stakeholders.
    • What are the specific harms that are occurring or could occur as a result of this problem? This could include health impacts, economic impacts, environmental impacts, etc.
    • Why is it important to address this problem?
  3. Formulate at least two specific research questions that arise from the problem statement and that you want to answer with your project. These should be specific and focused, and they should be answerable with data. This means there should be a quantitative aspect to the question that can be addressed.
TipProblem state template

The below is a good template for writing a concise problem statement (adapted from Dr. Papadakis; ISAT 491). >The purpose of this project is to [A]. Because of [B] and [C]. We expect our project to result in [D]. Which will allow [E].

  • A: Your environmental issue
  • B: Broad Background
  • C: Harms/ Impacts to stakeholders/ ecosystems
  • D: Your specific research questions
  • E: Any next steps
TipActivity: In-class research
  • What data are needed?
    • variables
    • time periods
    • spatial/ temporal resolution
  • Checking initial data sources

Exploratory Data Analysis

Here are some specific steps and suggestions

  • With pandas and xarray (will be introduced soon) you should be able to load most datasets.
    • If you come across a format you don’t know or don’t know how, please let me know.
  • You should download data ASAP and check whether you can open the datasets
  • Document what exactly you download for reproducibility.
  • Try to download only what you need. Especially gridded data can get big very soon.
    • If datasets are less than ~20 MB each, put them on GitHub in a Data directory.
    • If they are larger, put them into a cloud storage and provide links in the repository.
  • Create initial plots and plan your analysis.
    • Are there issues with the data?
    • What steps do you need to take to conduct your analysis?
    • Do you need to modify research questions?
    • I will discuss them with your groups.
  • Make sure to have all code on GitHub