I use PostgreSQL 9.1.2 and I have a basic table as below, where I have the Survival status of an entry as a boolean (Survival)
and also in number of days (Survival(Days))
.
I have manually added a new column named 1-yr Survival
and now I want to fill in the values of this column for each entry in the table, conditioned on that entry's Survival
and Survival (Days)
column values. Once , completed the database table would look something like this:
Survival Survival(Days) 1-yr Survival
---------- -------------- -------------
Dead 200 NO
Alive - YES
Dead 1200 YES
The pseudo code to input the conditioned values of 1-yr Survival
would be something like:
ALTER TABLE mytable ADD COLUMN "1-yr Survival" text
for each row
if ("Survival" = Dead & "Survival(Days)" < 365) then Update "1-yr Survival" = NO
else Update "1-yr Survival" = YES
end
I believe this is a basic operation however I failed to find the postgresql syntax to execute it. Some search results return "adding a trigger", but I am not sure that is what I neeed. I think my situation here is a lot simpler. Any help/advice would be greatly appreciated.
The one-time operation can be achieved with a plain UPDATE
:
UPDATE tbl
SET one_year_survival = (survival OR survival_days >= 365);
I would advise not to use camel-case, white-space and parenthesis in your names. While allowed between double-quotes, it often leads to complications and confusion. Consider the chapter about identifiers and key words in the manual.
Are you aware that you can export the results of a query as CSV with COPY
?
Example:
COPY (SELECT *, (survival OR survival_days >= 365) AS one_year_survival FROM tbl)
TO '/path/to/file.csv';
You wouldn't need the redundant column this way to begin with.
To avoid empty updates:
UPDATE tbl
SET "Dead after 1-yr" = (dead AND my_survival_col < 365)
,"Dead after 2-yrs" = (dead AND my_survival_col < 730)
....
WHERE "Dead after 1-yr" IS DISTINCT FROM (dead AND my_survival_col < 365)
OR "Dead after 2-yrs" IS DISTINCT FROM (dead AND my_survival_col < 730)
...
Personally, I would only add such redundant columns if I had a compelling reason. Normally I wouldn't. If it's about performance: are you aware of indexes on expressions and partial indexes?