Microsoft Office Excel 2007 Performing Magic with Array Formulas

The previous chapter provided an introduction to arrays and array formulas, and also presented some basic examples to whet your appetite. This chapter continues the saga and provides many useful examples that further demonstrate the power of this feature.
I selected the examples in this chapter to provide a good assortment of the various uses for array formulas. Most can be used as-is. You will, of course, need to adjust the range names or references used. Also, you can modify many of the examples easily to work in a slightly different manner.

On the CD 
The examples in this section are demonstrated in workbook named  single-cell array formulas.xlsx, which is available on the companion CD-ROM.

Working with Single-Cell Array Formulas
As I describe in the previous chapter, you enter single-cell array formulas into a single cell (not into a range of cells). These array formulas work with arrays contained in a range or that exist in memory. This section provides some additional examples of such array formulas.
Summing a Range That Contains Errors
You've probably discovered that Excel's SUM function doesn't work if you attempt to sum a range that contains one or more error values (such as #DIV/0! or #N/A). The SUM formula in cell C11 returns an error value because the range that it sums (C4:C10, named Data) contains errors.

The following array formula returns a sum of the values in a range named Data, even if the range contains error values:

=SUM(IFERROR(Data,""))

About the Examples in This Chapter
This chapter contains many examples of array formulas. Keep in mind that you press Ctrl+Shift+Enter to enter an array formula. Excel places curly brackets around the formula to remind you that it's an array formula. The array formula examples shown here are surrounded by curly brackets, but you should not enter the brackets because Excel will do that for you when the formula is entered.
The preceding function uses the IFERROR function, which is introduced in Excel 2007. Following is a formula that's compatible with earlier versions:
{=SUM(IF(ISERROR(Data),"",Data))}
This formula works by creating a new array (in memory, not in a range). This array contains the original values but without the errors. The IFERROR function effectively filters out error values by replacing them with an empty string. The SUM function treats strings as zeros, effectively ignoring those entries in the array. This technique also works with other functions, such as MIN and MAX.
In this example, the SUM function operates on this array:
{8, 20, 12, "", "", 5, 10}

Note: 
You may want to use a function other than ISERROR. The ISERROR function returns TRUE for any error value: #N/A, #VALUE!, #REF!, #DIV/0!, #NUM!, #NAME?, or #NULL!. The ISERR function returns TRUE for any error except #N/A. The ISNA function returns TRUE only if the cell contains #N/A.

Counting the Number of Error Values in a Range
The following array formula is similar to the previous example, but it returns a count of the number of error values in a range named Data:
 
{=SUM(IF(ISERROR(Data),1,0))}
This formula creates an array that consists of 1s (if the corresponding cell contains an error) and 0s (if the corresponding cell does not contain an error value).
You can simplify the formula a bit by removing the third argument for the IF function. If this argument is not specified, the IF function returns FALSE if the condition is not satisfied (that is, the cell does not contain an error value). In this context, Excel treats FALSE as a 0 value. The array formula shown here performs exactly like the previous formula, but it doesn't use the third argument for the IF function:
{=SUM(IF(ISERROR(Data),1))}
Actually, you can simplify the formula even more:
{=SUM(ISERROR(Data)*1)}
This version of the formula relies on the fact that
TRUE * 1 = 1
and
FALSE * 1 = 0
Summing Based on a Condition
Often, you need to sum values based on one or more conditions. The array formula that follows, for example, returns the sum of the positive values (it excludes negative values) in a range named Data:
 
{=SUM(IF(Data>0,Data))}
The IF function creates a new array that consists only of positive values and False values. This array is passed to the SUM function, which ignores the False values and returns the sum of the positive values. The Data range can consist of any number of rows and columns.
You can also use Excel's SUMIF function for this example. The following formula, which is not an array formula, returns the same result as the previous array formula:
=SUMIF(Data,">0")
For multiple conditions, the non-array solution uses the new SUMIFS function. For example, if you want to sum only values that are greater than 0 and less than or equal to 5, you can use this non-array formula:
=SUMIFS(Data,Data,"<=5",Data,">0")
Following is an array formula that performs the same calculation and is compatible with earlier versions of Excel:
{=SUM((Data>0)*(Data<=5)*Data)}
This formula calculates three arrays:
  • (Data>0)
  • (Data<=5)
  • Data
The formula then multiplies the three arrays together and calculates the sum of the products. The first two arrays consist of TRUE (1) or FALSE (0) values.
This formula also has a limitation: It will return an error if the Data range contains one or more non-numeric cells.

>
Caution:
 Contrary to what you might expect, you cannot use the AND function in an array formula. The following array formula, while quite logical, doesn't return the correct result:

{=SUM(IF(AND(Data>0,Data<=5),Data))}
You can also write an array formula that combines criteria using an OR condition. For example, to sum the values that are less than 0 or greater than 5, use the following array formula:
{=SUM(IF((Data<0)+(Data>5),Data))}
To understand how this formula works, keep in mind that the argument for the IF function returns TRUE if either Data<0 or Data>5. Otherwise, the argument returns FALSE. If it's FALSE, the item of the Data range is not included in the sum.


Caution 
As with the AND function, you cannot use the OR function in an array formula. The following formula, for example, does not return the correct result:
{=SUM(IF(OR(Data<0,Data>5),Data))}

For an explanation of the workarounds required for using logical functions in an array formula, see the sidebar, "Illogical Behavior from Logical Functions."
Illogical Behavior from Logical Functions
Excel's AND and OR functions are logical functions that return TRUE or FALSE. Unfortunately, these functions do not perform as expected when used in an array formula.
As shown here, columns A and B contain logical values. The AND function returns TRUE if all its arguments are TRUE. Column C contains non-array formulas that work as expected. For example, cell C3 contains the following function:
=AND(A3,B3)


On the CD 
This workbook, named  logical functions.xlsx, is available on the companion CD-ROM.

The range D3:D6 contains this array formula:
{=AND(A3:A6,B3:B6)}
You might expect this array formula to return the following array:
{TRUE,FALSE,FALSE,FALSE}
Rather, it returns only a single item: FALSE. In fact, both the AND function and the OR function always return a single result (never an array). Even when using array constants, the AND function still returns only a single value. For example, this array formula does not return an array:
{=AND({TRUE,TRUE,FALSE,FALSE},{TRUE,FALSE,TRUE,FALSE})}
I don't know whether this is by design or whether it's a bug. In any case, it certainly is inconsistent with how the other functions operate.
Column E contains another array formula, which follows, that returns an array of 0s and 1s. These 0s and 1s correspond to FALSE and TRUE, respectively.
{=A3:A6*B3:B6}
In array formulas, you must use this syntax in place of the AND function.
The following array formula, which uses the OR function, does not return an array (as you might expect):
=OR(A3:A6,B3:B6)
Rather, you can use a formula such as the following, which does return an array comprising logical OR, using the corresponding elements in the ranges:
{=A3:A6+B3:B6}
Summing the n Largest Values in a Range
The following array formula returns the sum of the ten largest values in a range named Data:
{=SUM(LARGE(Data,ROW(INDIRECT("1:10"))))}
The LARGE function is evaluated ten times, each time with a different second argument (1, 2, 3, and so on up to 10). The results of these calculations are stored in a new array, and that array is used as the argument for the SUM function.
To sum a different number of values, replace the 10 in the argument for the INDIRECT function with another value. To sum the n smallest values in a range, use the SMALL function instead of the LARGE function.

<><><>
Caution 
Both LARGE and SMALL return a #NUM! error if you supply a second argument that's greater than the number of cells in the first argument.

Computing an Average That Excludes Zeros
shows a simple worksheet that calculates average sales. Range B5:B12 is named Data. The formula in cell B14 is as follows:
=AVERAGE(Data)

This formula, of course, calculates the average of the values in the range named Data. Two of the sales staff had the week off, however, so this average doesn't accurately describe the average sales per representative.


Note: 
The AVERAGE function ignores blank cells but does not ignore cells that contain 0.

The following array formula returns the average of the range but excludes the cells containing 0:
{=AVERAGE(IF(Data<>0,Data))}

This formula creates a new array that consists of the nonzero values in the range and the value FALSE for any cells that do contain zero. The AVERAGE function then uses this new array, properly ignoring the FALSE values, as its argument. You can get the same result with a regular (non-array) formula:
=SUM(Data)/COUNTIF(Data,"<>0")
This formula uses the COUNTIF function to count the number of nonzero values in the range. This value is divided into the sum of the values.
Determining Whether a Particular Value Appears in a Range
To determine whether a particular value appears in a range of cells, you can choose the Home Editing Find & Select Find command and do a search of the worksheet. You also can make this determination by using an array formula.
shows a worksheet with a list of names in A5:E24 (named NameList). An array formula in cell D3 checks the name entered into cell C3 (named TheName). If the name exists in the list of names, the formula displays the text Found. Otherwise, it displays Not Found.

The array formula in cell D3 is
{=IF(OR(TheName=NameList),"Found","Not Found")}
This formula compares TheName to each cell in the NameList range. It builds a new array that consists of logical TRUE or FALSE values. The OR function returns TRUE if any one of the values in the new array is TRUE. The IF function uses this result to determine which message to display.
A simpler form of this formula follows. This formula displays TRUE if the name is found and returns FALSE otherwise:
{=OR(TheName=NameList)}
Another approach uses the COUNTIF function in a non-array formula:
=IF(COUNTIF(NameList,TheName)>0,"Found","Not Found")
Counting the Number of Differences in Two Ranges
The following array formula compares the corresponding values in two ranges (named MyData and YourData) and returns the number of differences in the two ranges. If the contents of the two ranges are identical, the formula returns 0 (no differences):
{=SUM(IF(MyData=YourData,0,1))}
The two ranges must be the same size and of the same dimensions.
This formula works by creating a new array of the same size as the ranges being compared. The IF function fills this new array with 0s and 1s (1 if a difference is found; 0 if the corresponding cells are the same). The SUM function then returns the sum of the values in the array.
The following formula, which is simpler, is another way of calculating the same result:
{=SUM(1*(MyData<>YourData))}
This version of the formula relies on the fact that
TRUE * 1 = 1
and
FALSE * 1 = 0

Returning the Location of the Maximum Value in a Range
The following array formula returns the row number of the maximum value in a single-column range named Data:
 
{=MIN(IF(Data=MAX(Data),ROW(Data), ""))}
The IF function creates a new array that corresponds to the Data range. If the corresponding cell contains the maximum value in Data, the array contains the row number; otherwise, it contains an empty string. The MIN function uses this new array as its second argument and returns the smallest value, which corresponds to the row number of the maximum value in Data.
 
If the Data range contains more than one cell that has the maximum value, the row of the first maximum cell is returned.
The following array formula is similar to the previous one, but it returns the actual cell address of the maximum value in the Data range. It uses the ADDRESS function, which takes two arguments: a row number and a column number.
{=ADDRESS(MIN(IF(Data=MAX(Data),ROW(Data), "")),COLUMN(Data))}
If Data spans more than one column, the following array formula will return the correct address.
{=ADDRESS(MIN(IF(Data=MAX(Data),ROW(Data),"")),
MIN(IF(Data=MAX(Data),COLUMN(Data),"")))

Finding the Row of a Value's nth Occurrence in a Range
The following array formula returns the row number within a single-column range named Data that contains the nth occurrence of the value in a cell named Value:
 
{=SMALL(IF(Data=Value,ROW(Data), ""),n)}
The IF function creates a new array that consists of the row number of values from the Data range that are equal to Value. Values from the Data range that are not equal to Value are replaced with an empty string. The SMALL function works on this new array and returns the nth smallest row number.
The formula returns #NUM! if the Value is not found or if n exceeds the number of the values in the range.
Returning the Longest Text in a Range
The following array formula displays the text string in a range (named Data) that has the most characters. If multiple cells contain the longest text string, the first cell is returned.
{=INDEX(Data,MATCH(MAX(LEN(Data)),LEN(Data),FALSE),1)}
This formula works with two arrays, both of which contain the length of each item in the Data range. The MAX function determines the largest value, which corresponds to the longest text item. The MATCH function calculates the offset of the cell that contains the maximum length. The INDEX function returns the contents of the cell containing the most characters. This function works only if the Data range consists of a single column.
Determining Whether a Range Contains Valid Values
You might have a list of items that you need to check against another list. For example, you might import a list of part numbers into a range named MyList, and you want to ensure that all the part numbers are valid. You can do this by comparing the items in the imported list with the items in a master list of part numbers (named Master).
The following array formula returns TRUE if every item in the range named MyList is found in the range named Master. Both of these ranges must consist of a single column, but they don't need to contain the same number of rows.
{=ISNA(MATCH(TRUE,ISNA(MATCH(MyList,Master,0)),0))}


The inside MATCH function compares each element of MyList against Master, returning an array of numbers and #NA! errors for those that don't match. That array is fed into ISNA, which converts it into an array of True and False values. The outer MATCH function attempts to find a True value in the array. If it finds a True, a number is returned, and the outer ISNA returns False. If no True values are found, ISNA returns True.
The array formula that follows returns the number of invalid items. In other words, it returns the number of items in MyList that do not appear in Master.
 
{=SUM(1*ISNA(MATCH(MyList,Master,0)))}
To return the first invalid item in MyList, use the following array formula:
{=INDEX(MyList,MATCH(TRUE,ISNA(MATCH(MyList,Master,0)),0))}

Summing the Digits of an Integer
I can't think of any practical application for the example in this section, but it's a good demonstration of the power of an array formula. The following array formula calculates the sum of the digits in a positive integer, which is stored in cell A1. For example, if cell A1 contains the value 409, the formula returns 13 (the sum of 4, 0, and 9).
{=SUM(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)*1)}
To understand how this formula works, start with the ROW function, shown here:
{=ROW(INDIRECT("1:"&LEN(A1)))}
This function returns an array of consecutive integers beginning with 1 and ending with the number of digits in the value in cell A1. For example, if cell A1 contains the value 409, the LEN function returns 3, and the array generated by the ROW functions is
{1,2,3}
This array is then used as the second argument for the MID function. The MID part of the formula, simplified a bit and expressed as values, is the following:
{=MID(409,{1,2,3},1)*1}
This function generates an array with three elements:
{4,0,9}
By simplifying again and adding the SUM function, the formula looks like this:
{=SUM({4,0,9})}
This produces the result of 13.

<><><>
Note 
The values in the array created by the MID function are multiplied by 1 because the MID function returns a string. Multiplying by 1 forces a numeric value result. Alternatively, you can use the VALUE function to force a numeric string to become a numeric value.

Notice that the formula does not work with a negative value because the negative sign is not a numeric value. The following formula solves this problem by using the ABS function to return the absolute value of the number. shows a worksheet that uses this formula in cell B4.
{=SUM(VALUE(MID(ABS(A4),ROW(INDIRECT("1:"&LEN(ABS(A4)))),1)))}

The formula was copied down to calculate the sum of the digits for other values in column A.
Summing Rounded Values
shows a simple worksheet that demonstrates a common spreadsheet problem: rounding errors. As you can see, the grand total in cell E7 appears to display an incorrect amount; that is, it's off by a penny. The values in column E use a number format that displays two decimal places. The actual values, however, consist of additional decimal places that do not display because of rounding (as a result of the number format). The net effect of these rounding errors is a seemingly incorrect total. The total, which is actually $168.320997, displays as $168.32.

The following array formula creates a new array that consists of values in column E, rounded to two decimal places:
=SUM(ROUND(E4:E6,2))
This formula returns $168.31.
You also can eliminate these types of rounding errors by using the ROUND function in the formula that calculates each row total in column E. This technique does not require an array formula.
Summing Every nth Value in a Range
Suppose you have a range of values and you want to compute the sum of every third value in the list-the first, the fourth, the seventh, and so on. One solution is to hard-code the cell addresses in a formula. A better solution, though, is to use an array formula.

The following array formula returns the sum of every nth value in the range:

{SUM(IF(MOD(ROW(INDIRECT("1:"&COUNT(Data)))-1,n)=0,Data,""))}
This formula generates an array of consecutive integers, and the MOD function uses this array as its first argument. The second argument for the MOD function is the value of n. The MOD function creates another array that consists of the remainders when each row number is divided by n. When the array item is 0 (that is, the row is evenly divisible by n), the corresponding item in the Data range will be included in the sum.
You'll find that this formula fails when n is 0 (that is, sums no items). The modified array formula that follows uses an IF function to handle this case:
{=IF(n=0,0,SUM(IF(MOD(ROW(INDIRECT("1:"&COUNT(data)))-1,n)=0,data,"")))}
This formula works only when the Data range consists of a single column of values. It does not work for a multicolumn range nor for a single row of values.
The preceding formula always includes the first item in the count. For example, when n is 5, the elements summed are 1, 6, 11, and 16. Following is a slightly modified version that begins summing with the nth element. When n is 5, the formula sums the following elements: 5, 10, and 15.
{=IF(n=0,0,SUM(IF(MOD(ROW(INDIRECT("1:"&COUNT(Data)))-n,n)=0,Data,"")))}
To make the formula work with a horizontal range, you need to transpose the array of integers generated by the ROW function. Excel's TRANSPOSE function is just the ticket. The modified array formula that follows works only with a horizontal Data range:
{=IF(n=0,0,SUM(IF(MOD(TRANSPOSE(ROW(INDIRECT
("1:"&COUNT(Data))))-1,n)=0,Data,"")))}
Removing Non-Numeric Characters from a String
The following array formula extracts a number from a string that contains text. For example, consider the string ABC145Z. The formula returns the numeric part-145.
{=MID(A1,MATCH(0,(ISERROR(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)
*1)*1),0),LEN(A1)-SUM((ISERROR(MID(A1,ROW
(INDIRECT("1:"&LEN(A1))),1)*1)*1)))}

Using Excel's Formula Evaluator
If you would like to better understand how some of these complex array formulas work, consider using a handy tool: the Formula Evaluator. Select the cell that contains the formula and then choose Formulas Formula Auditing Evaluate Formula. You'll see the Evaluate Formula dialog box shown in the accompanying figure. Click the Evaluate button repeatedly to see the intermediate results as the formula is being calculated. It's like watching a formula calculate in slow motion.

This formula works only with a single embedded number. For example, it fails with a string such as X45Z99 (returning 45Z9).
Determining the Closest Value in a Range
The array formula that follows returns the value in a range named Data that is closest to another value (named Target):
{=INDEX(Data,MATCH(SMALL(ABS(Target-Data),1),ABS(Target-Data),0))}
If two values in the Data range are equidistant from the Target value, the formula returns the first one in the list. shows an example of this formula. In this case, the Target value is 45. The array formula in cell D5 returns 48-the value closest to 45.

Returning the Last Value in a Column
Suppose you have a worksheet that you update frequently by adding new data to columns. You might need a way to reference the last value in column A (the value most recently entered). If column A contains no empty cells, the solution is relatively simple and doesn't require an array formula:
=OFFSET(A1,COUNTA(A:A)-1,0)
This formula uses the COUNTA function to count the number of nonempty cells in column A. This value (minus 1) is used as the second argument for the OFFSET function. For example, if the last value is in row 100, COUNTA returns 100. The OFFSET function returns the value in the cell 99 rows down from cell A1, in the same column.
If column A has one or more empty cells interspersed, which is frequently the case, the preceding formula won't work because the COUNTA function doesn't count the empty cells.
The following array formula returns the contents of the last nonempty cell in the first 500 rows of column A:
{=INDEX(A1:A500,MAX(ROW(A1:A500)*(A1:A500<>"")))}
You can, of course, modify the formula to work with a column other than column A. To use a different column, change the four column references from A to whatever column you need. If the last nonempty cell occurs in a row beyond row 500, you need to change the two instances of 500 to a larger number. The fewer rows referenced in the formula, the faster the calculation speed.

Caution:
 You cannot use this formula, as written, in the same column with which it's working. Attempting to do so generates a circular reference. You can, however, modify it. For example, to use the function in cell A1, change the references so they begin with row 2.



New:
In versions prior to Excel 2007, you could not use whole column or row references in an array formula, such as A:A or 3:3. In Excel 2007, you are allowed to use such references. The previous formula can be written as
=INDEX(A:A,MAX(ROW(A:A)*(A:A<>"")))
However, with the increase in the number of cells in a worksheet, using whole column or row references can significantly increase calculation time.

Returning the Last Value in a Row
The following array formula is similar to the previous formula, but it returns the last non- empty cell in the first 200 columns of a row (in this case, row 1):
{=INDEX(A1:GR1,MAX(COLUMN(A1:GR1)*(A1:GR1<>"")))}
Column GR is column 200. To use this formula for a different row, change the A1:GR1 reference to correspond to the range to check. A simpler (but slower) version of this formula that checks all 16,384 columns is
{=INDEX(1:1,MAX(COLUMN(1:1)*(1:1<>"")))}
Ranking Data with an Array Formula
Often, computing the rank orders for the values in a range of data is helpful. If you have a worksheet containing the annual sales figures for 20 salespeople, for example, you may want to know how each person ranks, from highest to lowest.
If you've used Excel's RANK function, you may have noticed that the ranks produced by this function don't handle ties the way that you may like. For example, if two values are tied for third place, the RANK function gives both of them a rank of 3. You may prefer a commonly used approach that assigns each an average (or midpoint) of the ranks-in other words, a rank of 3.5 for both values tied for third place.
shows a worksheet that uses two methods to rank a column of values (named Sales). The first method (column C) uses Excel's RANK function. Column D uses array formulas to compute the ranks.

The following is the array formula in cell D5:
=SUM(1*(B5<=Sales))-(SUM(1*(B5=Sales))-1)/2
This formula is copied to the cells below it.

Note:
Each ranking is computed with a separate array formula, not with an array formula entered into multiple cells.

Each array function works by computing the number of higher values and subtracting one half of the number of equal values minus 1.

Creating a Dynamic Crosstab Table
A crosstab table tabulates or summarizes data across two dimensions. Take a look at the data in this worksheet shows a simple expense account listing. Each item consists of the date, the expense category, and the amount spent. Each column of data is a named range, indicated in the first row.


Array formulas summarize this information into a handy table that shows the total expenses-by category-for each day. Cell F6 contains the following array formula, which is copied to the remaining 14 cells in the table:

=SUM(($E6=Date)*(F$5=Category)*Amount)
These array formulas display the totals for each day, by category.
The formula sums the values in the Amount range, but does so only if the row and column names in the summary table match the corresponding entries in the Date and Category ranges. It does so by multiplying two Boolean values by the Amount. If both Boolean values are True, the result is the Amount. If one or both of the Boolean values is False, the result is 0.
You can customize this technique to hold any number of different categories and any number of dates. You can eliminate the dates, in fact, and substitute people's names, departments, regions, and so on.

Note 
You also can use Excel's pivot table feature to summarize data in this way. However, pivot tables do not update automatically when the data changes, so the array formula method described here has at least one advantage.