Skip to Main Content

Working with Data Science

LibKey Nomad

Librarian

Profile Photo
Per the Puffin
they/them
Contact:
302 Rolvaag Memorial Library
1510 St. Olaf Avenue
St. Olaf College
Northfield, MN 55057

Books

Metadata & Secondary Data

Secondary Data

When working with someone else's data, ask yourself these questions:

  • Where does this data come from?
  • Who is it by?
  • Who are the rights/usage?
  • How recent is the data?

Consulting the metadata can be the best way to find additional information about data.

File Naming & Data Sustainability

Developing a file naming schema is the best way to maintain data quality and sustainability. As the project is passed to future researchers, it will be easier to track the location of data.

What is a Data Dictionary?

A Data Dictionary is a good resource for deciding how to organize your data and make sure that the formatting system is uniform.

 

What is important when putting together a Data Dictionary:

  • List the data value you want (Example: Data)
  • How do you want the data formatted?
  • Write an example of what it should look like
  • Do this for every value

This can help you organize/answer these questions:

  • What data do I need/want?
  • How do I want the data to look?

Column Headers

Name

Date

Descriptions

Lat

Long

Notes on how the data should look

Uppercase, lowercase, no spaces, etc

Numerical or Full Form

Do you need a field to record notes or comments?

How many decimal places?

Should include positive and negative values

Separate Lat/Long into 2 separate columns

Example data

City Name

city_name

CITYNAME

01/09/2019

January 9, 2019

 

-44.4617

-44.46

93.1827

93.18

 

Example table with Address Information

The challenge with address information is not all organizations or regions have the same format - most likely, additional data cleaning will be involved.

 

Column Headers

Name

Date

Descriptions

Street Number

Street Name

City

State

Zip

Notes on how the data should look

Uppercase, lowercase, no spaces, etc

Numerical or Full Form

Do you need a field to record notes or comments?

Prefix number, house number, building number

Name of the street - decide if you want to write abbreviations in full

Uppercase, lowercase, no spaces, etc

Should this be abbreviated, capitalized, or all lowercase?

Decide on if you want to do the basic zip or should include the postal zip

Example data

City Name

city_name

CITYNAME

01/09/2019

January 9, 2019

 

1520

St. Olaf Avenue

Northfield

MN

55057