Adding multiple values to key in perl hash

aki2all picture aki2all · Sep 2, 2013 · Viewed 11.1k times · Source

I need to create multi-dimensional hash.

for example I have done:

$hash{gene} = $mrna;
if (exists ($exon)){
  $hash{gene}{$mrna} = $exon;
}
if (exists ($cds)){
  $hash{gene}{$mrna} = $cds;
}

where $gene, $mrna, $exon, $cds are unique ids.

But, my issue is that I want some properties of $gene and $mrna to be included in the hash. for example:

$hash{$gene}{'start_loc'} = $start;
$hash{gene}{mrna}{'start_loc'} = $start;

etc. But, is that a feasible way of declaring a hash? If I call $hash{$gene} both $mrna and start_loc will be printed. What could be the solution?

How would I add multiple values for the same key $gene and $mrna being the keys in this case.

Any suggestions will be appreciated.

Answer

David W. picture David W. · Sep 3, 2013

What you need to do is to read the Perl Reference Tutorial.

Simple answer to your question:

Perl hashes can only take a single value to a key. However, that single value can be a reference to a memory location of another hash.

my %hash1 = ( foo => "bar", fu => "bur" };  #First hash
my %hash2;
my $hash{some_key} = \%hash1;   #Reference to %hash1

And, there's nothing stopping that first hash from containing a reference to another hash. It's turtles all the way down!.

So yes, you can have a complex and convoluted structure as you like with as many sub-hashes as you want. Or mix in some arrays too.

For various reasons, I prefer the -> syntax when using these complex structures. I find that for more complex structures, it makes it easier to read. However, the main this is it makes you remember these are references and not actual multidimensional structures.

For example:

$hash{gene}->{mrna}->{start_loc} = $start;  #Quote not needed in string if key name qualifies as a valid variable name.

The best thing to do is to think of your hash as a structure. For example:

my $person_ref = {};   #Person is a hash reference.
my $person->{NAME}->{FIRST} = "Bob";
my $person->{NAME}->{LAST} = "Rogers";
my $person->{PHONE}->{WORK}->[0] = "555-1234"; An Array Ref. Might have > 1
my $person->{PHONE}->{WORK}->[1] = "555-4444";
my $person->{PHONE}->{CELL}->[0] = "555-4321";
...

my @people;
push @people, $person_ref;

Now, I can load up my @people array with all my people, or maybe use a hash:

my %person;
$person{$bobs_ssn} = $person;   #Now, all of Bob's info is index by his SSN.

So, the first thing you need to do is to think of what your structure should look like. What are the fields in your structure? What are the sub-fields? Figure out what your structure should look like, and then setup your hash of hashes to look like that. Figure out exactly how it will be stored and keyed.

Remember, this hash contains references to your genes (or whatever), so you want to choose your keys wisely.

Read the tutorial. Then, try your hand at it. It's not all that complicated to understand. However, it can be a bear to maintain.

When you say use strict;, you give yourself some protection:

my $foo = "bar";
say $Foo;    #This won't work!

This won't work because you didn't declare $Foo, you declared $foo. The use stict; can catch variable names that are mistyped, but:

my %var;
$var{foo} = "bar";
say $var{Foo};    #Whoops!

This will not be caught (except maybe that $var{Foo} has not been initialized. The use strict; pragma can't detect mistakes in typing in your keys.

The next step, after you've grown comfortable with references is to move onto object oriented Perl. There's a Tutorial for that too.

All Object Oriented Perl does is to take your hash references, and turns them into objects. Then, it creates subroutines that will help you keep track of manipulating objects. For example:

 sub last_name {
    my $person = shift;   #Don't worry about this for now..
    my $last_name = shift;

    if ( exists $last_name ) {
      my $person->{NAME}->{LAST} = $last_name;
    }
    return $person->{NAME}->{LAST};
}

When I set my last name using this subroutine ...I mean method, I guarantee that the key will be $person->{NAME}->{LAST} and not $person->{LAST}->{NAME} or $person->{LAST}->{NMAE}. or $person->{last}->{name}.

The main problem isn't learning the mechanisms, but learning to apply them. So, think about exactly how you want to represent your items. This about what fields you want, and how you're going to pull up that information.