DRY - Why?

home

"DRY" is one of those things which comes up in "how to be a better programmer" searches, or often in replies to "here's my program, I need help!". New coders are going to see it. It's an acronym, but luckily for us, a single phrase: "Don't Repeat Yourself". That was clearly made to be cool-sounding, so let's go to the original: "a piece of knowledge should only be in one place". Once we're done laughing over programs containing pieces of knowledge, that version actually makes sense. Many common, useful features basically do that. Don't get me wrong -- DRY is useless. But it's a fun excuse to talk about those features, and convince you I know what I'm talking about when I say not to waste time reading up on DRY.

A simple, straight-on example is constants. Here we decide the margin is some number -- 0.15 in this case -- and we use it in a bunch of places:

xPos = 0.15 + c*6; // measuring 6 letters from left margin
  ...
useableWidth = width - 0.15*2; // subtract left and right margins
  ... pretend there are more, all spread out

This seems fine, until we need to change margins to 0.12. Then we'll have to search for every 0.15 and might miss a few. Or we can Find&Replace, but that will change unrelated 0.15's. "Repeating ourself" by using the margin in lots of places can't be avoided, but repeating the "knowledge" of the actual size can be, using a constant:

// "knowledge" of the margin size:
const double MARGIN_SIZE = 0.15;

xPos = MARGIN_SIZE + c*6;
   ...
useableWidth = width - MARGIN_SIZE*2;

That make margins hugely easier to change -- one clearly marked spot instead of tracking down who knows how many. That also lowers the chance for bugs. This is what I mean by saying DRY is a real thing -- we avoided repeating something and got a better program. But constants also make the program easier to read, even when we weren't repeating anything (they cut out the mental step "what's 0.15? Oh, it's the margin!"). We can't completely say constants are a DRY thing.

Another simple repeating-numbers problem occurs when dealing with cut-off values in tables. The code below converts scores into grades. The cut-offs for B's, C's and D's are repeated:

// ewww, repetition:
if(score>=87) grade='A';
if(score>=75 && score<87) grade='B';
if(score>=65 && score<75) grade='C';
if(score>=52 && score<65) grade='D';
  ...

I'll just jump to the improved version, where each cut-off occurs once. The trick is to make one big if-else, known as a Cascading-If:

if(score>=87) grade='A';
else if(score>=75) grade='B';
else if(score>=65) grade='C';
else if(score>=52) grade='D';
  ...

Each cut-off is in one spot, making the code shorter and easier to update, and less error-prone; and it seems easier to read. We could definitely call this a DRY trick, if we cared to.

Besides repeating numbers we often repeat lines of code, which is bad for the usual reasons. Functions fix that. Suppose it takes several lines to find the best pumpkin in a list, and we need to repeat that for a few lists and so on. The trick is having one copy of "find the best pumpkin". Label it findBestPumpkin to let everyone refer back to it -- bp1=findBestPumpkin(P1); runs it for the pumpkin list P1.

The "piece of knowledge" is how to find the best pumpkin. That makes sense, right? Putting it in one place is good for the same old reasons (changes, fewer errors, readability), but we also say functions are a useful abstraction. That's a fancy way of saying that functions feel like built-in features. We don't need or want to think about how findBestPumpkin works -- it just finds the best pumpkin. My point here is that functions are considered as their own thing. It's weird to cover functions as part of DRY, because we cover functions as ... functions.

Moving on, objects in object oriented programming definitely come from wanting knowledge to be in one place. Suppose our program uses dots which have an x,y position and a size. We define it in one place and have everyone else refer back:

// define a Dot:
struct Dot { float x, y, radius };

// two dots (the computer looks up the definition above):
Dot d1, d2;

// takes a dot as input:
void useDot(Dot d) { // looks up Dot, above
  ...

Not needing to type float x1,y1,radius1; for every dot gets us the usual DRY stuff (especially, it's easier to change what makes a Dot), but abstraction is the big deal here. Dot feels like a built-in type. When we see Dot d1; we don't think about 3 numbers -- we think "d1 is a dot". Structs (classes) are another big main feature, explained on their own, where you could say "and they also help with DRY" but don't really need to.

Next are Arrays, which let us reuse lines in certain special cases. People who don't know arrays often write code like this:

// create 20 numbers which are sort of a list:
int a1, a2, a3, a4, ... a20;
  ...
// none can be more than ten:
if(a1>10) a1=10;
if(a2>10) a2=10;
if(a3>10) a3=10;
   ...
if(a20>10) a20=10;

All that typing is nuts, and we won't even have to wait for changes to get errors -- we'll have them right away. This is common, too. We often reset a list of variables to 0, or count how many are negative, and so on. There should be an easy "do this thing to a1 through a20" trick, and there is, using arrays and a loop:

int A[20]; // makes all 20 at once
  ...
for(int i=0; i<20; i++) { if(A[i]>10) A[i]=10; }

We avoided repeating ourselves, with all the benefits -- totally DRY stuff. But arrays are their own thing. The first step here was to not write a1 through a20, and instead make one 20-part list. That's how we teach arrays, not as a DRY thing.

This next one is odd since it solves a problem we shouldn't have. Suppose we write a function to find the largest item in a list, using >. It should work on whole numbers, decimals, words and so on; but it doesn't. We have to write four near-identical copies, one for each type. Bleh. And this happens all the time.

The fix is polymorphism. It's put out as this big complicated thing, and the details are complicated, but it accomplishes one simple thing. Polymorphism lets you reuse functions with different types. Totally DRY.

This next one is getting pretty advanced, but not too bad. Often we want versions of a function with one little change. For example, generic sorting which could be cats by age or dogs by name backwards, and so on. We can do that by passing in a function. Built-in sorts do this by allowing us to send a "how-to-sort" function:

sort(Cats, (c1,c2)=>c1.age<c2.age); // cats by age

var dogsByNameLen = (d1,d2)=>d1.name.len()>d2.name.len;
sort(Dogs, dogsByNameLen); // dogs, longest name 1st

The explanation is full of technical stuff, sometimes they call it a delegate, but the purpose is simple: reuse most of a function. More DRY. But teaching it, we focus on being able to treat functions as variables. DRY is just a distraction.

Plugs-ins are a variation. Instead of supplying a function every time, we'll supply it once at the start, to a class, and save it. We could make a class with a list and a permanent sort-how function to get a list of cats-by-age or dogs-by-name.

The plug-in trick usually goes way beyond that. We might make an "events" class with nice features and all, but the thing it checks for is a plug-in, and what it does is another plug-in. It's very, very reusable. We'd say it accomplishes DRY, buuut, by the time we teach this stuff no one needs to be told reusablity is good.

And last, it's possible to break things into parts even if we don't really need to and add plug-ins where we don't really them. Basically, make things modular just in case. That hopefully lets them share some sub-parts. That's very hand-wavy, but it's a thing.

So that's my list of ways to be DRY, but it felt more like The DRY Challenge. It was very unnatural. For one thing, we wouldn't group these topics together. Cascading-IF's are simply an advanced IF; constants are done later as a software engineering trick; functions should be covered early, too early for DRY, and so on. To study DRY we'd study those topics, which don't go together, so studying DRY is stupid.

But wait, maybe we don't use DRY to group topics. Maybe we teach in a normal order and mention DRY as needed. Doing it that way, DRY is still bad. Because it's way easier to be direct and say when a trick makes things re-usable, and/or easier to change, and/or less error-prone. Or we don't say anything since if we're doing it right, students will figure out the advantages for themselves. It's a waste of time to stop and say "OK, remember DRY? Well here's how this new thing relates to it".

But double-wait, maybe DRY's an advanced software engineering principle? Maybe it should be ignored until you know all the tricks? I fully agree with ignoring it while you learn, but then keep ignoring it. Because at this point it's merely about how far to go. We all agree on how we can use standard tricks to combine parts of just about any objects. There's the easy obvious stuff, then it gets to be more work and more complicated for less and less reusability. How much is worth it? This version of DRY is just a fancy way of saying to really go for it.

But let's get to the important part. There are acronyms for not following dry, and you know exactly what they are: Moist and Damp. Yeah. Yet somehow, WET caught on. That's no good. Wet means completely not DRY -- we repeat everything all the time -- which never happens. So if you ever hear someone say a program is WET, you tell them "no". You tell them to own their metaphor. "A piece of knowledge should..." was perfectly good, but they just had to make it sound cooler and now they have to live with it. Moist, moist, moist.

Comments. or email adminATtaxesforcatses.com