Randomizing rows in a DataTable

Onion Blog

Syndication

I've been working on a project recently where I had the need to randomly shuffle all of the rows in a DataTable. I wanted to do it with the DataTable itself instead of in the act of populating the DataTable for a couple of reasons: 1) I wanted to keep the DataTable in memory and shuffle it in place multiple times without going back to the source, and 2) I had multiple sources where data was coming from (SQL and XML) so I preferred to keep the randomization logic in one place. I also didn't want to copy all of the data (even though it was not a large amount) each time I shuffled, so I decided to use a DataView to display the data shuffled each time I needed it.

Here's the utility function I came up with - each time you call RandomizeDataTable it will return a newly shuffled DataView of all the data passed in through the DataTable. Note that because I reuse the added column "rndSortId" each time, any DataViews retrieved from previous calls to the method will have the new shuffle order. You could change this behavior by adding a new column each time with its own unique sort sequence.

As always, comments/improvements welcome – enjoy!

public static class DataSetUtilities

{

static Random _rand = new Random();

 

public static DataView RandomizeDataTable(DataTable dt)

{

// Create array of indices and populate with ordinal values

int[] indices = new int[dt.Rows.Count];

for (int i = 0; i < indices.Length; i++)

indices[i] = i;

 

// Knuth-Fisher-Yates shuffle indices randomly

for (int i = indices.Length - 1; i > 0; i--)

{

int n = _rand.Next(i + 1);

int tmp = indices[i];

indices[i] = indices[n];

indices[n] = tmp;

}

 

// Add new column to data table (if it's not there already)

// to store shuffle index

if (dt.Columns["rndSortId"] == null)

dt.Columns.Add(new DataColumn("rndSortId", typeof(int)));

int rndSortColIdx = dt.Columns["rndSortId"].Ordinal;

for (int i = 0; i < dt.Rows.Count; i++)

dt.Rows[i][rndSortColIdx] = indices[i];

 

DataView dv = new DataView(dt);

dv.Sort = "rndSortId";

return dv;

}

}


Posted Apr 16 2008, 07:38 AM by fritz-onion

Comments

Mike wrote re: Randomizing rows in a DataTable
on 04-16-2008 6:49 AM
Did you consider just assing a GUID to the rndSortId column and Sorting on that column?

Because GUIDs are random you should get a fast, simple, and random sort.
Fritz Onion wrote re: Randomizing rows in a DataTable
on 04-16-2008 6:57 AM
Ha - great point Mike - I don't actually need to shuffle at all, I just need a decent distribution of random numbers in the shuffle column. That's why I post these things :)
Here's what I'll do instead I think:
dt.Rows[i][rndSortIdx] = _rand.Next(int.MaxValue);

No need to waste the space of a full GUID really.
Jason Coyne wrote re: Randomizing rows in a DataTable
on 04-16-2008 1:37 PM
Coding Horror made an excellent post on this deceptively simple problem. Depending on why you want numbers to be random, the formula used to generate the random numbers is very important.

http://codinghorror.com/blog/archives/001015.html

The GUID might be better than just the rand.Next, especially since the GUIDs will never have the same number show up twice, but your solution I bet will end up with at least one duplicate number in a run of any given size, and then the sorting would default to some other order, that is probably deterministic. If the randomness is something like a gambling site, where having anything deterministic could be a security flaw, that would be a big issue.
Fritz Onion wrote re: Randomizing rows in a DataTable
on 04-16-2008 4:28 PM
Jason - that's a good point, and if you're using this technique for something where true randomness really matters, be advised. However, for my purposes the simplest solution suffices (I'm just re-ordering UI elements for presentation).
Bob wrote re: Randomizing rows in a DataTable
on 06-12-2008 2:27 AM
DOESN'T WORL

Add a Comment

(required)  
(optional)
(required)  
Remember Me?