*updated 7/20/15 after finding more ways people write CASRN or create fake CASRN that will sneak through my original macro*
I wrote a simple and fairly short SAS macro to validate CAS Registry Numbers. I have gotten enough free SAS advice and a few macros from various internet sources, so I thought it only fair to share this if it of use to anyone. Hopefully the comments give ample information about what input is needed and what the output is. The macro will catch an invalid CAS RN if it is
- too long
- too short
- has all 0’s
- does not return the correct check digit based on CAS calculation
Information about proper CAS RNs can be found from ACS who produce CAS RNs. Contact me if you have questions about the macro or find an error with it.
*macro to determine if a CAS number is a valid CAS number;
*input is name of dataset to be examined where CAS numbers have variable name CAS_number;
*returns valid = 1 if CAS is valid and valid = 0 if invalid CAS;
*returns character variable CAS which will be CAS number with hyphens and no leading 0s;
%macro CASnumber_check(CAS_dataset);
data &CAS_dataset (drop = CAS_num CASlength R N1-N9 QR Q Rcheck j);
length CAS_num $ 10;
set &CAS_dataset;
*give CAS numbers with alphabet characters or that are blank a 00-00 CAS number;
if CAS_number = “” then CAS_number = “00-00”;
if anyalpha(CAS_number) ne 0 then CAS_number = “00-00”;
*determine if CAS is numeric or character variable;
CAS_vartype = vtype(CAS_number);
*if CAS is numeric, converts it to character;
if CAS_vartype = “N” then CAS_num = STRIP(PUT(CAS_number, 8.));
*if CAS is character, removes all non-numeric characters;
if CAS_vartype = “C” then CAS_num = compress(CAS_number,,”kd”);
*breaks CAS number apart into digits;
CASlength = length(CAS_num);
R = input(substr(CAS_num,length(CAS_num)),8.);
QR = 0;
array N_(9) N1 – N9;
do j = 1 to 9;
if CASlength > j then N_(j) = input(substr(CAS_num,CASlength-j,1),8.);
else N_(j) = 0;
QR = QR + N_(j)*j;
end;
Q = int(QR/10);
Rcheck = QR – Q*10;
*checks on validity of CAS based on check digit and length;
if Rcheck = R then valid = 1; else valid = 0;
if N9 = 0
then if N8 = 0
then if N7 = 0
then if N6 = 0
then if N5 = 0
then if N4 = 0 then valid = 0;
if CASlength < 5 then valid = 0;
if CASlength > 10 then valid = 0;
*builds character variable called CAS with no leading 0s;
if N9 ~= 0 then CAS = cats(N9,N8,N7,N6,N5,N4,N3,”-“,N2,N1,”-“,R);
else if N8 ~= 0 then CAS = cats(N8,N7,N6,N5,N4,N3,”-“,N2,N1,”-“,R);
else if N7 ~= 0 then CAS = cats(N7,N6,N5,N4,N3,”-“,N2,N1,”-“,R);
else if N6 ~= 0 then CAS = cats(N6,N5,N4,N3,”-“,N2,N1,”-“,R);
else if N5 ~= 0 then CAS = cats(N5,N4,N3,”-“,N2,N1,”-“,R);
else CAS = cats(N4,N3,”-“,N2,N1,”-“,R);
run;
%mend CASnumber_check;