Step 4: Data Processing

Reverse coding scales

You would use reverse coding if, in a scale with multiple questions, the answers for different questions that would be consistent with one another are in reverse order from one another.

For example, if one of your scale questions was "Do you feel positively about your career prospects?" and another was "Do you feel negatively about your career prospects?". For both questions, participants have to answer on a likert scale from 1 (strongly agree) to 5 (strongly disagree). If someone answered "1" to the first question and is answering consistently, we would expect them to answer "5" to the second question, but these scores will mathematically cancel one another out in our analyses. So we systemically recode people's answers to the second question so anyone who answered "1" is marked "5", who answered "2" is marked "4", is marked "3" stays the same", "4" is marked "2" and "5" is marked "1" so answers are consistent and we can later check the internal reliability of our scale (next step)

Checking internal reliability

Cronbach's alpha measures a scale's internal reliability, or how related the items in a scale are. It does this by measuriing how well each participant's answers to the scale prompts correlate with one another. So it's very important that you have reverse-coded any scale items that need it before performing this analysis, and that you use the reverse-coded columns in the code, not the original columns.

Reporting in APA format:

The [name of scale] was [not] found to be [highly] reliable (x items; α = ##).

Creating a new column with a calculation

If you've used a Likert scale with multiple questions as the operationalization of one of your conepts in your study or used a "quiz" where you need an accuracy score, you will likely need to summarize that data into a single statistic using some predetermined formula. Remember: R is basically an overpowered calculator. You can do adding (+), subtracting (-), multiplying (*), dividing (/), exponents (base^exponent), and square roots (sqrt(base)), amongst others. We use these operations to create new columns that contain the data we were shooting for, as we established in the Set Goals exercise.

If you used a Likert Scale:

If you used the Qualtrics Drag n Drop function for a memory test, this is how you parse the two columns with the list of answers that were dropped into each box into usable data:

Once you've completed this process, you can make an accuracy score column.

If you structured your memory test as a quiz, you must first make columns that reflect whether each answer was correct or incorrect in order to score accuracy:

Once you've completed this process, you can make an accuracy score column.

Creating an Accuracy Score Column

Creating your Final Analysis Dataframe

Now that you've created all the necessary columns somewhere, let's collect everything into a single, simple, and streamlined data frame where we have one column per variable like we planned for in the Setting Goals stage. Rule of thumb: if you're taking raw data along, if it's categorical, use the choice dataframe and if it's continuous use the numeric dataframe.

Now our data is ready for analysis. Let's start with the descriptives