Hadoop Elephant in Active Directory Forest Marek Gawiński, Arkadiusz Osiński Allegro Group
Agenda Goals and motivations Technology stack Architecture evolution Automation integrating new servers Making AD users and groups visible to Linux Making architecture non-vulnerable to AD service inaccessibility Auto-deployment clients software on desktops
Allegro Hadoop cluster in numbers 4 terabytes RAM 2 petabytes disk space 47 datanodes 79 projects 612 users
Goals and motivations Secured cluster Central authentication and authorisation Compliance for real and project users and groups Cluster resources available from desktop Integrating new servers automatically Making whole architecture non-vulnerable for failures or timeouts to AD Auto-deployment and autoconfiguration of Hadoop clients software on users desktops
Technology stack Cloudera CDH5 MIT Kerberos Microsoft Active Directory FreeIPA sssd puppet msktutil Hadoop desktop client
History - FreeIPA+FreeIPA Kerberos In te rn al ha do op FreeIPA User cr ed Chec k gro Kerberos Service Ticket ups Check user/pass s Secured Hadoop cluster Local groups management User/pass Client Kerberos KDC
History - FreeIPA+own Kerberos Secured Hadoop cluster Chec Internal hadoop creds k gro Kerberos Service Ticket ups Check user/pass FreeIPA User Local groups management User/pass Client Kerberos KDC Kerberos KDC MIT
History - FreeIPA+own Kerberos+AD In te rn al ha do op FreeIPA User cr ed Chec k gro ups Ch kg ec Kerberos Service Ticket Local groups management ps u ro Check user/pass s Secured Hadoop cluster User/pass Client Kerberos KDC MIT Us e r/p s Check user/pass as AD User&Groups AD Kerberos
Final - own Kerberos+AD In te rn al ha do op cr ed s Secured Hadoop cluster Ch kg ec Kerberos Service Ticket ps u ro Client Kerberos KDC MIT Us e r/p s Check user/pass as AD User&Groups AD Kerberos
Integrating new Linux servers automatically with AD Kerberos keytab user e t a Cre AD Kerberos Msktutil Create AD User&Groups princip al
Integrating new Linux servers automatically with AD define get_ad_keytab ( $path = '',...) {... $realm = 'SOME_REALM' $pass = hiera('hadoop_prod/ad/krb_manager_pass') $principal = "${title}/${host}@${realm}" $command = "echo ${pass} kinit _hadoop_manager@${realm}; \ /usr/local/bin/add_ad_princ.sh ${title} ${host} ${path}; kdestroy"... msktutil -c -s $PRINCIPAL --upn $PRINCIPAL -k $KEYTAB \ --computer-name $COMPUTER_NAME \ --server $SERVER_KRB \ --realm $REALM \ -b $USER_LDAP_ROOT \ --dont-expire-password \ --description "\"$DESCRIPTION\"" \ --user-creds-only
Integrating new Linux servers automatically with AD root@nn1:~# klist -ket Keytab name: FILE:/etc/krb5.keytab KVNO Timestamp Principal ---- ------------------- -----------------------------------------------------1 08/17/2015 13:26:45 host/nn1.local@ipa.realm (aes256-cts-hmac-sha1-96) 1 08/17/2015 13:26:45 host/nn1.local@ipa.realm (aes128-cts-hmac-sha1-96) 1 08/17/2015 13:26:45 host/nn1.local@ipa.realm (des3-cbc-sha1) 1 08/17/2015 13:26:45 host/nn1.local@ipa.realm (arcfour-hmac) 1 08/17/2015 13:26:45 host/nn1.local@ipa.realm (camellia128-cts-cmac) 1 08/17/2015 13:26:45 host/nn1.local@ipa.realm (camellia256-cts-cmac) 4 08/17/2015 13:30:23 91c76848bc458b62e67$@AD.REALM (arcfour-hmac) 4 08/17/2015 13:30:23 91c76848bc458b62e67$@AD.REALM (aes128-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 91c76848bc458b62e67$@AD.REALM (aes256-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 host/nn1.local@ad.realm (arcfour-hmac) 4 08/17/2015 13:30:23 host/nn1.local@ad.realm (aes128-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 host/nn1.local@ad.realm (aes256-cts-hmac-sha1-96)
Integrating new Linux servers automatically with AD Separated Subtree in AD structure
System Security Services Daemon Identity and authentication Multiple providers (FreeIPA, LDAP, AD) High availability for backends Provides PAM and NSS modules Caching > 1.11.x - stable support for AD forest auth
System Security Services Daemon /etc/sssd/sssd.conf [domain/ad.realm] id_provider = ad ad_server = h1, h2, h3 ad_backup_server = hb1, hb2, hb3 auth_provider = ad chpass_provider = ad access_provider = ad enumerate = False krb5_realm = AD.REALM ldap_schema = ad ldap_id_mapping = True cache_credentials = True ldap_access_order = expire ldap_account_expire_policy = ad ldap_force_upper_case_realm = true fallback_homedir = /home/ad.realm/%u default_shell = /bin/false ldap_referrals = false AD schema with no modifications root@nn1:~# id _hc_tech_prod tr "," "\n" uid=1827653611(_hc_tech_prod) gid=1827600513(domain users) groups=1827600513(domain users) 1827652945(_gr_hc_users_common) 1827647474(_gr_hc_hadoop_prod) 1827652940(_gr_hc_project1_prod) 1827652919(_gr_hc_project2_prod)
Making whole architecture nonvulnerable for failures Active Closest DC Fallback servers in Remote DC Local filesystem nss cache /etc/sssd/sssd.conf [nss] memcache_timeout = 3600
Auto-deployment and autoconfiguration on desktops Install script for Hadoop Client on desktops Refresh configs with currently prod environment Support for HDFS/YARN/Hive/Spark [marek.gawinski:~/allehadoop] $ sh env.sh Password for marek.gawinski@ad.realm: ************** [marek.gawinski:~/allehadoop] $ klist Ticket cache: FILE:/tmp/krb5cc_1511317717 Default principal: marek.gawinski@ad.realm Valid starting Expires 09/04/15 23:31:35 09/05/15 09:31:35 renew until 09/11/15 23:31:33 Service principal krbtgt/ad.realm@ad.realm
Auto-deployment and autoconfiguration on desktops [marek.gawinski:~/allehadoop] Found 8 items drwxr-xr-x - marek.gawinski drwxr-xr-x - marek.gawinski drwxr-xr-x - marek.gawinski drwx------ marek.gawinski drwxr-xr-x - marek.gawinski -rw-r--r-3 marek.gawinski -rw-r--r-3 marek.gawinski drwxr-xr-x - marek.gawinski $ hdfs dfs -ls hadoop hadoop hadoop hadoop hadoop hadoop hadoop hadoop 0 0 0 0 0 43 13 0 2015-08-06 2015-07-28 2015-07-09 2015-05-22 2015-08-31 2015-05-26 2015-08-31 2015-04-16 [marek.gawinski:~/allehadoop] $ hive hive (default)> show databases; OK database_name tpch_benchmarks... xwing_poc Time taken: 0.816 seconds, Fetched: 72 row(s) hive (default)> set hive.execution.engine = tez; hive (default)> select count(*) from table1; 02:00 21:01 10:43 02:35 13:11 15:26 12:30 16:21.Trash.hiveJars.sparkStaging.staging oozie1 ozzietest1.hql pwd.txt tables
Auto-deployment and autoconfiguration on desktops
Auto-deployment and autoconfiguration on desktops
Auto-deployment and autoconfiguration on desktops
Auto-deployment and autoconfiguration on desktops
Benefits One standard for access control to all company resources Every new employee automatically can play with Hadoop with no additional effort One password to all systems
Thank you! Questions?