- St. Olaf Libraries
- Research Guides
- Working with Data Science
- Quality & Structure
Working with Data Science
Librarian
1510 St. Olaf Avenue
St. Olaf College
Northfield, MN 55057
Metadata & Secondary Data
Secondary Data
When working with someone else's data, ask yourself these questions:
- Where does this data come from?
- Who is it by?
- Who are the rights/usage?
- How recent is the data?
Consulting the metadata can be the best way to find additional information about data.
- Standford Libraries: Basic Approach to MetadataThis simple guide to metadata includes an example case study of what metadata might look like for a project.
- UMN: Data DocumentationAn excellent guide to data documentation and metadata.
File Naming & Data Sustainability
Developing a file naming schema is the best way to maintain data quality and sustainability. As the project is passed to future researchers, it will be easier to track the location of data.
- MIT: Organizing Your filesIncludes information about file version control, file renaming, and file naming conventions.
- UW Madison: File Naming and VersioningNaming and versioning conventions from research data services at the University of Wisconsin Madison.
- Data Reuse ChecklistA guide on preparing your data for reuse.
What is a Data Dictionary?
A Data Dictionary is a good resource for deciding how to organize your data and make sure that the formatting system is uniform.
What is important when putting together a Data Dictionary:
- List the data value you want (Example: Data)
- How do you want the data formatted?
- Write an example of what it should look like
- Do this for every value
This can help you organize/answer these questions:
- What data do I need/want?
- How do I want the data to look?
Column Headers |
Name |
Date |
Descriptions |
Lat |
Long |
Notes on how the data should look |
Uppercase, lowercase, no spaces, etc |
Numerical or Full Form |
Do you need a field to record notes or comments? |
How many decimal places? Should include positive and negative values |
Separate Lat/Long into 2 separate columns |
Example data |
City Name city_name CITYNAME |
01/09/2019 January 9, 2019 |
-44.4617 -44.46 |
93.1827 93.18 |
Example table with Address Information
The challenge with address information is not all organizations or regions have the same format - most likely, additional data cleaning will be involved.
Column Headers |
Name |
Date |
Descriptions |
Street Number |
Street Name |
City |
State |
Zip |
Notes on how the data should look |
Uppercase, lowercase, no spaces, etc |
Numerical or Full Form |
Do you need a field to record notes or comments? |
Prefix number, house number, building number |
Name of the street - decide if you want to write abbreviations in full |
Uppercase, lowercase, no spaces, etc |
Should this be abbreviated, capitalized, or all lowercase? |
Decide on if you want to do the basic zip or should include the postal zip |
Example data |
City Name city_name CITYNAME |
01/09/2019 January 9, 2019 |
1520 |
St. Olaf Avenue |
Northfield |
MN |
55057
|
- Last Updated: Dec 4, 2024 8:30 AM
- URL: https://libraryguides.stolaf.edu/datascience
- Print Page