Domain Name Abuse Detection Liming Wang
Outline 1 Domain Name Abuse Work Overview 2 Anti-phishing Research Work 3 Chinese Domain Similarity Detection 4 Other Abuse detection ti 5 System Information 2
Why? Responsibility CNNIC takes the responsibility of China s domain name registry to operate and administrate i t CN.CN,. 中 国,. 公 司 and. 网 络 Necessity Domain name abuses have caused heavy losses; It is reported that, every year in China, the losses brought about by the phishing attacks are more than 30 billion RMB.
IP Whois EMAIL Big IM Data Info DNS LOG Registration i Info Social Network collection Detection Assessment Optimization Phishing SPAM BotNet Porn sites Other illegal sites
Outline 1 Domain Name Abuse Work Overview 2 Anti-phishing Research Work 3 Chinese Domain Similarity Detection 4 Other Abuse detection ti 5 System Information 5
Phishing Definition Phishing is the act of tricking someone into giving confidential information (like passwords and credit card information) on a fake web page or email form pretending to come from a legitimate company (like their bank).
Why we do anti-phishing? Phishing attacks are rampant Data Source: APWG CNNIC is committed to wipe out phishing attacks with.cn domain. CNNIC try to combat phishing in China with other TLD for the public welfare.
Anti-Phishing Research Background Features in phishing attacks nowadays Phishing URLs are always complicated multilevel string with. and / Domain name and path The living time of phishing URLs is short Shortcomings traditionalanti phishing anti methods Blacklist based methods cannot deal with fresh phishing sites Popular Web browser plug-in methods work in passive way to wait for detection URLs provided by clients Ourresearch research goal Detect phishing sites in short time and in an active way.
The phishing Data Analysis Results in China Analysis Results over 84% of phishing URLs have similar domain names toward their target brands. phishing URLs paths perform a mass of identical forms for each target Research Start Point Our anti-phishing research start point is domain name similarity analysis
Our Detection Process DNS Logs Phishing Repository Suspicious Host Retrieval Path Frequency Compute Top N Phishing Host Phishing Paths URL Construction Phishing URLs Detection Optimization Content Based Phishing Recognition
URL Construction Phishing URL = Phishing Host + Phishing Path
Content Based Phishing Recognition Recognition Features: Landing Box Title Copyright Web out-links Suspicious Phishing URLs Landing Box in No Non Phishing Webpage? Yes Title Landing Box Title contains target brand? Copyright contains target brand? Out links to target sites? 1/0 1/0 1/0 Summation>0? Yes No Non Phishing Copyright Very Suspicious Phishing URLs
Further Detection Optimization Why further optimization? False alarm in phishing detection will cause serious consequence. Keep low false alarm rate is significant in phishing detection. How? We adopt Web content quality to filter non-phishing Web pages Web quality assessment is an open issue which will be discussed in next section. Optimization rules IF Quality<5 THEN the suspicious URL is phishing, ELSE the URL is nonphishing. Evaluated Quality is in interval [0,10].
Detection Mechanism in Phishing Phishing Detection System: Jan. 2011 ~ Dec. 2012 9537 phishing URL detected Accounts for 15.6% of the total reported data from APAC Top 2 in contribution, after taobao.com
Outline 1 Domain Name Abuse Work Overview 2 Anti-phishing Research Work 3 Chinese Domain Similarity Detection 4 Other Abuse detection ti 5 System Information 15
Detection Mechanism for Spoof Attack in CDN Detection Approaches for Spoof Attack in Chinese Domain Names Calculating the similarity for single characters The character dot matrix can accurately describe the visual features in Chinese characters Based on the dot matrix, the visual computing can be converted into vector computing, and measure the similarity between single characters.
Detection Mechanism for Spoof Attack in CDN Detection Approaches for Spoof Attack in Chinese Domain Names Calculating the similarity for single characters Converting the character dot matrix into multi-dimensional vectors Calculate the vector correlations based on the spatial vector model and get the similarity between the single characters
Detection Mechanism for Spoof Attack in CDN Detection Approaches for Spoof Attack in Chinese Domain Names Calculating the similarity for single characters The reasons for using vector space models It is verified and accepted in other fields; The similarity value is [0,1] when using the cosine power of the vector to calculate. No extra normalization is required; More features from the characters can be extended as additional dimensions of the vectors very conveniently when representing the characters via vector
Detection Mechanism for Spoof Attack in CDN Detection Approaches for Spoof Attack in Chinese Domain Names Similarity Calculation for String Based on the similarity between single characters. Modeling A and B are domain name with the length of n The similarity between the corresponding characters in A and B are known Formula for the similarity calcualtion:
Detection Mechanism for Spoof Attack in CDN Detection Approaches for Spoof Attack in Chinese Domain Names Similarity Calculation for Strings The similarity between A and B are based on the Bayes condition probability formula; The similarity between characters : Unsimilarity between characters: T value The longer the string is, the more similar characters there are, the more similar the two strings are: 茅 台. 中 国 vs 茅 合. 中 国 贵 州 茅 台 集 团. 中 国 vs 贵 州 茅 合 集 团. 中 国
Detection Mechanism for Spoof Attack in CDN Detection Approaches for Spoof Attack in Chinese Domain Names Similarity Calculation for Strings Example: 康 师 傅 VS 康 帅 博 Assume ( 师, 帅 )=0.9,( 傅, 博 )=0.8,set T = 1.1 The similarity:0.992:
Experiments Evaluation Experiments Evaluation The total similarity calculation test for CDNs 淘 宝 网 啕 宝 网 陶 宝 网 掏 宝 网 0.9969 0.9963 0.9955 results 工 商 银 行 工 商 根 行 工 商 垠 行 工 商 很 行 0.9997 0.9995 0.9994 腾 讯 幐 讯 腾 汛 滕 讯 0.9673 0.9624 0.9586
Outline 1 Domain Name Abuse Work Overview 2 Anti-phishing Research Work 3 Chinese Domain Similarity Detection 4 Other Abuse detection ti 5 System Information 23
The Other Domain Name Abuse Detection Works Analysis on large-scale DNS authoritative Binding data, Active Server Page Extraction ti and analysis, Phishing page detection Large-scale log analysis Page contents detection Jumping cheating detection Hiding cheating detection Word co-occurrence analysis Automatic page grabbing
Outline 1 Domain Name Abuse Work Overview 2 Anti-phishing Research Work 3 Chinese Domain Similarity Detection 4 Other Abuse detection ti 5 System Information 25
An Overview of the Domain Name Abuse Detection System With the CNNIC own data resource as the input, CNNIC has designed and developed a Phishing Detection System to detect and decide the phishing webpage automatically. Big Data Analysis The daily DNS queries can reach 100 million during the peak time, leading to massive DNS recursive data resolution The data mining and analyzing task requires 10 million independent computers. The whole process including the source data extraction, Automatic data preprocess to the malicious webpage decision and Processing the evidence storage are all automatically done via machines Express Interface There is express interface for reporting
Domain Name Abuse Detection System Automatic bad data analysis Daily automatic log acquisition and poccessing Bad host server detection Evidence grabbing before reporting
System Running Frequency and the Processing Data Amount Once everyday Running Frequency The detection results are reported directly Data Source The recursive DNS server query log is used as the data source The number of daily queries to the DNS recursive servers: 400 million per day 数 据 量 The number of distinct independent computers: 30 million per day
2012 Domain Name Abuse Detection Result 7181 malicious domain names were detected, involving 23 TLD pornography websites 3393, phishing websites 2214, gambling websites 1439, drug websites 119, guns and explosive websites 16. The system reports 90% of the all the CNNIC reported websites. The reported phishing websites accounts for the 7.02% of all the anti-phishing Alliance Some registrants use the same template for 2012 Reported websites in.cn 2012 503 porn pornography websites in.cn. 200Compared with 4267 websites in 2011, it reduced 88.2% 100 2012 the malicious information 3% reported from CNNIC 7% 90% 国 家 域 名 安 全 中 心 举 报 其 他 各 方 举 报 人 力 查 找 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
北 京 市 海 淀 区 中 关 村 南 四 街 四 号 中 科 院 软 件 园 邮 编 : 100190 www.cnnic.cn