Deep Dive into Array Basics Part 1

Arrays are the most important and widely used data-structure and this guide will cover the deep aspects of C# array data-structure.

By Pavneet Singh

Aug 28, 2018 • 17 Minute Read

Subscribe to the newsletter

Introduction

Data management is the backbone of all applications. The performance of an application is significantly dependent upon the choice of data structure to access and manipulate data. Every data structure has its own implementation and implications.

In earlier times, CPU's were specifically optimized to work with instruction sets that operate on one-dimensional data, known as vector processing. Modern GPUs are a modified version of a vector processor to support rapid computation.

Arrays (AKA Vectors) are the most important and widely used data-structure and this guide will cover the deep aspects of array data-structure.

Introduction to Arrays

An array is defined as a collection of similar data with fixed length, stored in a linear fashion. Every element in an array is accessed by an index (a numerical value) and every index can be computed by applying a mathematical operation.

In the below example

Book is an array
Chapters are the data stored inside the Book
Every chapter in the Book array is accessible with an index

      char[] book = {'A','B','C','D'};

Array Declarations and Initialization

Declaration is a process of defining a place holder name preceded by it's type and arrays are declared with rectangular brackets as

          type[] name_of_placeholder;
char[] book;
    

Initialization means to allocate memory to objects with the help of new keyword or array initializer.

          name_of_placeholder = new data_type[size_of_array];
book = new char[4];
    

Array Initializers: Arrays can also be declared using initialization list {array elements separated by comma} and suitable when you have the data while declaring array reference. e.g. create array of length 4 with values as A, B, C, D.

          char[] array = new char[] {'A','B','C','D'};
char[] array = {'A','B','C','D'};
char[] array = new[] {'A','B','C','D'};
    

Implicit typed array: var is the variable type placeholder (introduced in C# 3) whose type can be obtained from the right side of the expression using type inference which allows us to eliminate the explicit data type. The above examples can be declared using var as:

          var array = new char[4]; // creates array of length 4
var array = new char[4] {'A','B','C','D'};
var array = new char[] {'A','B','C','D'};
var array = new[] {'A','B','C','D' };
    

An array initializer cannot be used with the var keyword, so var array = { 'A', 'B' , 'C', 'D' }; is invalid. The decision was made by the design team to avoid the use of array initializer and its side effects on the parser to avoid parsing nested blocks of {}. The solution was the usage of new keyword with var for implicit types syntax.

Memory Allocation

The main memory is divided into separate logical sections and a heap is known as dynamic memory. Meaning that memory requested at runtime is allocated in heap.

Memory for arrays is allocated in a continuous manner. Meaning that if you create an array of four char elements then the application will allocate: On 32 bit OS, char is of 2 byte and 4 byte for reference variable so (2 * 4) + 4 = 12 bytes On 64 bit OS, char is of 2 byte and 8 byte for reference variable so (2 * 4) + 8 = 16 bytes and may be stored at memory address 2000 to 2008 (2008 is exclusive), look like this:

          index    Data      Memory addresses
       _____
0     |  A  |     4000  <=  name of the array
      |_____|               pointing to initial address
1     |  B  |     4002
      |_____|
2     |  C  |     4004
      |_____|
3     |  D  |     4006
      |_____|
    

This is a general scenario, the actual allocation might slightly differ according to OS implementation.

Read and Write

The value of an array is read by using the array name followed by index within rectangular brackets:

          array_name[numeric_index]
e.g       output
book[2]     C
    

The memory address of C can be computed using initial address of memory, index and size:

          initial address + (index * size of data type)
4000            + (2 * 2)
4000            +    4
=> 4004
    

In order to modify array elements, an assignment(=) operator is used along with the array name and index:

          array_name[numeric_index] = new_element
book[0]  =  'P' // Replace A with P
    

A modified book array will be:

'P'

Important: There are two types of data in C#:

Value Type: ValueType elements always have a fixed default value when initialized e.g. The default value of an int element is 0 and bool is false.
Reference Type: The default value of reference type is null.

Array fundamentals

Zero-based numbering: Array index always starts at 0 and the value of the last index will be length_of_array - 1.
Fix length: Once an array is created, its size cannot be altered to increase the length. A larger array is required to store the values from the smaller array using a copy operation.
By default, every element in array is initialized to their default value.
Every array object is derived from Array class and inherits the data member (length,Rank) and methods (GetLowerBound, GetUpperBound) of the Array class.
An array can also represent matrix data as a two dimensional, multi-dimensional, or jagged array. In a jagged array, the size of each row can be different and can be defined at run time as per requirement.
The default maximum size of an Array is two gigabytes (GB) in a 32 bit environment and in a 64 bit environment, gcAllowVeryLargeObjects method can be invoked to expand the limit to four billion entries.
Arrays are not thread safe; multiple-threads can modify the data at the same type which can cause inconsistency issues.

Array Class and Structure of an Array

Every array variable has Array class as the base type and can access the properties and methods. In C or C++, you cannot get the array length from the array object, so being an object type of a class allows array objects to store additional information like length, IsReadOn, and helpful methods like Clone and Equals.

Clone will provide a shallow copy which is suitable for one dimensional arrays. For a multi-dimensional array, you need to copy each and every element from the source array to the newarray. This is known as deep copy.

Array class: An array class implements various methods from interfaces to provide a fixed set of features for consistency across any supported framework and platform.
IList: To support collections like dynamic list with no size limit, supported by using System.Linq which is used to convert data from one type to another.

          using System.Linq;

int[] intArr = new [] { 11, 2, 0, 14, 112 };
List<int> intList = intArr.OfType<int>().ToList(); // convert array to List
    

ICloneable: To support clone method.

          int[] arr = new int[2];
int[] temp = (int[])arr.Clone();
    

IStructuralEquatable: To compare array values as per index.

          StructuralComparisons.StructuralEqualityComparer.Equals(new int[]{1,2}, new int[]{2,1}); // False

StructuralComparisons.StructuralEqualityComparer.Equals(new int[]{1,2}, new int[]{1,2}); // True

IStructuralComparable: To support methods like sort to define the order or create customize ordering.

          StructuralComparisons.StructuralComparer.Compare(new int[]{1,2}, new int[]{2,1}); // -1 first comes before in a second

StructuralComparisons.StructuralComparer.Compare(new int[]{1,2}, new int[]{1,2}); // 0 both are equals

StructuralComparisons.StructuralEqualityComparer.Equals(new int[]{2,1}, new int[]{1,2}); // 1, first comes after second

Array Traversal

Traversal is the process of accessing each element of an array to perform a read or update operation. To print the book array, you can write:

          Console.WriteLine("["+book[0]+","+book[1]+","+book[2]+","+book[3]+"]"); // direct access
Console.WriteLine("[{0},{1},{2},{3}]",book[0],book[1],book[2],book[3]); // interpolated string
Console.WriteLine("[{0}]", string.Join(",", book)); // get array string and use comma as separator
Output:
[A,B,C,D] // for all 3 statements
    

As you can observe, the only dynamic part to access the array is the index value. So, we just need to execute the book[index_value] statement with an index value in an increasing order. This can be achieved by simply using for loop.

Loops allow us to execute the same set of instructions several times in a defined sequence. A simple for loop syntax is:

          for(int i = initial_value ; exit_condtion ; increment/decrement){
  // statement
}
    

Below is a simple program to replace the value of book array with any random alphabet:

          Random rand = new Random();
string strA_Z = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
for(int i = 0 ; i < book.Length ; i++){
  // Note : FormattableString string instance is reused to display new values
  Console.Write("book[{0}] = {1}",i,book[i]);
  // random number between 0 - 25
  int randomOndex = rand.Next(strA_Z.Length);
  // replace element with random alphabet
  book[i] = strA_Z[randomOndex];
  Console.WriteLine(" : book[{0}] = {1}",i,book[i]);
}
Output: Its random, so output may differ
book[0] = A : book[0] = H
book[1] = B : book[1] = U
book[2] = C : book[2] = G
book[3] = D : book[3] = O
    

Note: String objects can be used as an array because of inbuilt indexer support in string class.

Use of Out, Ref, and Param Keyword

Arrays can be passed from one method to another as a parameter to process array data. Passing an array as a parameter means the callee method can only access or manipulate the array data but cannot assign a new array object to the passed array reference.

Note: Caller is the method that calls another method - e.g. Main is a caller and ChangeReference is a callee method, invoked by Main method.

For example:

          static void Main(string[] args){
  char[] book = new[]{'A','B','C','D'};
  Console.WriteLine("[{0}]",string.Join(",",book)); // [A,B,C,D]
  ChangeReference(book); // no change after this executes
  Console.WriteLine("[{0}]",string.Join(",",book)); // [A,B,C,D]
}
static void ChangeReference(char[] book){
  Console.WriteLine("[{0}]",string.Join(",",book)); // [A,B,C,D]
  book = new[]{'W','X','Y','Z'};
}
    

In the above example, the book array is passed to another method where a new array object is assigned to the book reference. Though this will not change the original book reference pointing to A,B,C,D, the ChangeReference method can access or modify the elements inside of the book array.

There is the possibility that an uninitialized array could be passed or that a method needs to initialize the array reference before using it and make the changes that reflect in the caller method. Fortunately, C# introduced keywords like ref and out in version 7 to apply constrains on references at compilation time.

Out

out indicates to the compiler that the received array reference (in callee method) should be initialized before use and if you try to use the array reference before initialization then it will cause a compilation error.

          static void Main(string[] args){
  char[] book = new[]{'A','B','C','D'};
  Console.WriteLine("[{0}]",string.Join(",",book)); // [A,B,C,D]
  ChangeReference(out book);
  Console.WriteLine("[{0}]",string.Join(",",book)); // [W,X,Y,Z]
  ChangeReference(out null); // compile time error, un-assignable reference
  ChangeReferenceErr(out book);
}
static void ChangeReference(out char[] book){
  book = new[]{'W','X','Y','Z'};
}
static void ChangeReferenceErr(out char[] book){
  Console.WriteLine("[{0}]",string.Join(",",book)); // compilation error, Cannot use before initialization
  book = new[]{'W','X','Y','Z'}; // do this before using book
}
    

Note:

The passed reference should be an assignable reference, meaning you cannot pass null as a parameter with out although you can assign null value to reference inside callee method.
It doesn't matter whether the array reference was initialized before or not, the callee method has to initialize the reference before using it.

Ref

ref acts as a constraint so that the array reference should be initialized before passing it to another method.

          static void Main(string[] args){
  char[] book = new[]{'A','B','C','D'};
  PrintArrayValues(ref book); // [A,B,C,D]
  char[] bookNull;
  PrintArrayValues(ref bookNull); // compile time error, use of unassigned local variable
  char[] upComingBooks = null;
  PrintArrayValues(ref upComingBooks); // Runtime error, ArgumentNullException, because books is null
}
static void PrintArrayValues(ref char[] book){
  Console.WriteLine("[{0}]",string.Join(",",book));
}
    

Note:

Caller method can initialize the reference as null which will crash the application if you perform any operation which required a not-null reference
ref is useful when one method wants to send data and receive modified data; it's two way communication. Whereas out is one way communication every time a reference should be initialized
ref and out are also applicable on value type data
A method cannot be overloaded on the basis of ref and out

          static void add (ref int a){}
static void add (out int a){} // compilation error, ref and out are not enough to overload a method
    

Params

The params keyword allows us to receive any arbitrary number of parameters as an array.

          static void Main(string[] args){
  TotalAnyLengthArray(1,2,3);          // 6
  TotalAnyLengthArray(1,2,3,3,4,5,5);  // 23
  TotalAnyLengthArray();               // 0
}
static void TotalAnyLengthArray(params int[] ints){
 long sum = 0;
 for(int i = 0 ; i < ints.Length ; i++){
     sum += ints[i];
 }
 Console.WriteLine("Total is {0}",sum);
}
    

You can pass separate int parameters by keeping params as the last parameter:

          static void Main(string[] args){
  TotalAnyLengthArrayAndVerify(6,1,2,3);           // True
  TotalAnyLengthArrayAndVerify(20,1,2,3,3,4,5,5);  // False, total is 23 not 20
}
static void TotalAnyLengthArrayAndVerify(int expectedTot, params int[] ints){
 long sum = 0;
 for(int i = 0 ; i < ints.Length ; i++){
     sum += ints[i];
 }
 Console.WriteLine("Match is {0}", sum == expectedTot);
}
    

Key Points

Arrays index starts from 0 and type of data can be restricted by the type of the array.

          object[] all = new object[]{1,true,""}; // object can store any type
object[] listInt = new int[] {1, 2}; // not possible
    

Looping an array is almost twice as fast as lists.
Never compare reference type with == . Instead, use SequenceEqual with arrays as:

          char[] nName = new[]{'P', 'A', 'V' };
Console.WriteLine(nName == new char[]{'P','A','V'}); // False
Console.WriteLine(nName.SequenceEqual(new char[]{'P','A','V'})); // True
    

It's good practice to return an empty array instead of null (to avoid null checks) in case there is an issue with data availability or consistency.

      char[] a = new char[]{}; // create empty array

You can convert char array to string to perform string manipulations.

          char[] chars = {'P','A','V','N','E','E','T'};
string name = new string(chars); //PAVNEET
    

Share your appreciation and press like on this guide. Thank you for reading!

Pavneet S.

Pavneet is a software engineer with 5+ years of experience in mobile, web, and application development. Have developed solutions for AOSP, IoT, OS Rom, Services, Tools, dev servers using native and hybrid technologies. He is proficient in architecture & API design, TDD, debugging, and analysis.

More about this author