From the Not Just Numbers blog:
Although we might do our best to use data validation to ensure text is correctly entered into a spreadsheet, we very often need to deal with data from other sources that might not be well controlled.
Often one of the biggest issues in dealing with data to make it useable is tidying up text, so that names, departments, addresses, etc.are entered consistently.
Once this tidying up exercise is done, these text fields become useful for filtering the data using autofilter, or pivot tables – or for looking up information from other lists.
A simple approach to take is to work on creating a formula that deals with most of the problems, then copying this down alongside the text list. Then, when we are happy, we can copy this column and paste special (as text) where we want the text to be – if this is a one-off exercise – alternatively we can use the calculated column to drive the reports if it needs to handle new data being added in future.
Excel provides a number of tools that can help to clean up the data.
Removing extra spaces
One of the particularly sticky problems with entered text is extra spaces – particularly because they are so hard to spot. Excel provides a nice simple solution to this, the TRIM function.
=TRIM(text) strips all spaces out of a text string, except single spaces between words.
text can be actual text which must be enclosed in quotation marks (i.e.”text”) or, as we would use for this purpose, a cell reference containing the text (e.g. A1).
That means that it will remove any spaces at the beginning or end of the text, as well as ensuring that any spaces between words consist of only one space. If they include more than one space, the extra spaces will be removed.
Correcting the case
Particularly where data is to be presented in some way, one big concern can be what case the text is using. This may be all on capitals (rarely looks good on a report), or worse – it might be an inconsistent mix of capitals and lower case.
Excel has three functions to handle this.
=LOWER(text) presents the text entirely in lower case, irrespective of its original case.
=UPPER(text) presents the text entirely in upper case, irrespective of its original case.
=PROPER(text) presents the text in lower case with a capital letter at the start of each word, again irrespective of its original case.
Comparing to valid data
Where the above functions might remove some relatively standard problems, we will often still have to do some editing to completely tidy the list. The time spent doing this can be considerably reduced if we can get Excel to tell us which ones we need to correct.
One of the simplest ways to do this is to use COUNTIF to compare the text to a list of valid options (almost retrospective data validation).
COUNTIF counts the number of instances in a range that meet certain criteria.e.g.
=COUNTIF(Sheet2!$A:$A,B1) counts the number of cells in column A of sheet 2, that contain the value (or text) contained in cell B1.
If column A of Sheet2 contained our list of valid values and column B of the current worksheet contained the text we wanted to correct, then all of the invalid entries will return a 0, which we can then filter the list by to show all of the rows that need correcting.
Happy tidying!
If you enjoyed this post, go to the top of the blog, where you can subscribe for regular updates and get your free report “The 5 Excel features that you NEED to know”.