Unleashing the Power of Apache Atlas with Apache Ranger

UnleashingthepowerofApacheAtlaswith
ApacheRanger
VirtualDataConnectorProject
NIGELJONES
JONESN@UK.IBM.COM
DATAWORKS,MUNICH,APRIL2017
Apache®,ApacheAtlas,ApacheRanger&otherApacheprojectnamesreferencedareeitherregisteredtrademarksortrademarksoftheApache
SoftwareFoundationintheUnitedStatesand/orothercountries.NoendorsementbyTheApacheSoftwareFoundationisimpliedbytheuseof
thesemarks.

AboutMe–NigelJones
•https://www.linkedin.com/in/nigelljones/
•jonesn@uk.ibm.com(Anyonestilluseemail?)
•@planetf1–noisy,f1,electricvehicles,food&drink….Asplitofwork/life
accountsdidn’tworkforme!
•AndofcoursetheApacheAtlas&Rangermailinglists&JIRA!
•Sciencefanatschooluni.Itwascloudchambersbackthen…nowjustthecloud
J
•IBMHursley,UKsince1990
•Last3yearsfocusonDataLake,InformationGovernance,OpenMetadata

TheProblem…..
WHYAREWEHERE…..

Data?
•WhatdatadoIhave?
•Whatdoesitmean?
•Whereisit?
•Whohasaccesstoit?
•Whoownsit?
•Whatqualityisit?
•Howdoesitrelatetootherdata?
•HowtoIcontrol,audit&understandaccess?

Regulatoryneeds
•AdheretoregulationslikeBCBS-239andGDPR
•Needtoknowmeaning,valueofthedata
•Demonstrateprocessesinplacetogovernaccess
•Audit
•Signiﬁcantﬁnesifrulesbreached
•Whilstensuringeasy,readyaccesstoappropriatedatafordataprofessionalstosupport
anagilebusiness

Metadata..
•Metadataenablesdatatobeusedoutsideoftheapplicationthatcreatedit.
•Analyticsanddecisionmaking
•Newbusinessapplications
•Reportingandcompliance
•Metadatadescribestheformatandcontentofdataallowingpeopletojudgewhich
datasettouseforanewproject
•Structure
•Meaning
•Origin
•Validvaluesandquality
•Usageandownership
•Regulationsandclassiﬁcationsthatapply

Whichcansupport…
•Anenterprisedatacataloguethatlistsalldataincludingwhereitis,whatitis,who
ownsit,it’smeaning,quality,whereitcamefrom,andcanfullydescribeit’s
businesscontext&howthedatashouldbegoverned….
•SubjectMatterexpertssearching,collaborating,feedingbackabouttheirdata
needsanduse
•Automatedgovernanceactionstoprotectandmanageincludingauditing,
monitoring,qualitycontrol,rightsmanagement

Buteasily…
•Openframeworks&APIs
•Automaticcollection&discoveryofmetadatainadynamicheterogeneous
environment
•Usingpredeﬁnedstandardsforglossaries,schemas,rules,regulationstoreduce
cost
•Cheaptointegratenewtools
•Noproprietarylock-in&assumptionsthatalltoolsarefromonesuiteorvendor
•Avoidingsilos
•DistributedandOpen

Thevision

Open and
Uniﬁed Metadata

VirtualizationDataConnectorproject

Datavirtualizationproject
•Collaboration–IBM,severalbanks&opencommunity
•ADataLakeenvironment
•NotjustHadoop,butothersourcestoo
•BusinessTerms,Classiﬁcations,Metadatarich
•Offervirtualizedviews.Exposerelationaldatawithbusinessterms
•ManageAccesstoresources–permit,deny,log,ﬁlter/mask….THROUGH
METADATA
•Open,pluggable
•Workingthroughusecases,design,initialMVP(thisyear)
•Critique,feedbackiswelcomed.We’relookingforguidanceandsupportfromthe
Atlas&Rangercommunitiesaswellascontributeourideas
•ProposedchangesallgothroughmailinglistandJIRAforfeedback

ApacheAtlas
•“Atlasisascalableandextensiblesetofcorefoundationalgovernanceservices–
enablingenterprisestoeffectivelyandefﬁcientlymeettheircompliance
requirementswithinHadoopandallowsintegrationwiththewholeenterprisedata
ecosystem.”….http://www.apache.org
•OpenCommunity--ApacheIncubatorsinceMay2015
•Typeagnosticmetadatastore
•RESTAPI&UI
•SupportsmanyHadoopcomponentsincludingHBase,Hive,Sqoop,Storm&
others

ApacheRanger
•Centralizedsecurityadministrationtomanageallsecurityrelatedtasksinacentral
UIorusingRESTAPIs.
•Finegrainedauthorizationtodoaspeciﬁcactionand/oroperationwithHadoop
component/toolandmanagedthroughacentraladministrationtool
•StandardizeauthorizationmethodacrossallHadoopcomponents.
•Enhancedsupportfordifferentauthorizationmethods-Rolebasedaccesscontrol,
attributebasedaccesscontroletc.
•Centralizeauditingofuseraccessandadministrativeactions(securityrelated)
withinallthecomponentsofHadoop.
•…fromhttp://ranger.apache.org

ProjectInteractions
Search/Report
GaianDB
•Searchforlistofassetsbymetadata
•Searchfordata
•Reportingtoolobtainsdatatodrawreport
Underlyingdata,sql,hive,
HDFS,Oracle,Netezzaetc
Manageslogicalviews
Deploysrules,pushes
classiﬁcations,sourcefor
userroles(notusers)
+rangerplugintopermit/deny,masketc
Pullsrules.classiﬁcations
RDBMSHadoop
ApacheAtlas
Apache
Ranger
ApacheSolr

WhyAtlasandRanger?
•OpenSourceessentialtoforminganactiveecosystem
•Vision,activecommunity&evolving–abilitytocontribute&workwithothersto
providethebestsolution
•Alreadyhavegoodcorecapabilities
•Atlastypesystemisveryﬂexible
•Rangeroffersarangeofpolicytypesandprovidesapluggableframework
•Alreadycrossprojectintegration
•UseoftagbasedpolicieinRangersourcedfromAtlas
•CanbeusedindependentlyoffullHadoopstack

Refinedvirtualconnectorscopescope

GaianDB
Ranger
Plugin

Titan
(GraphDB,
Metadata
Repository)
Ranger
Config
RangerServer
Atlas
PollPolicies
OMAS
OMRS

IGC
PrePostCreate View
Metadata
Extract physical
metadata
Manage
Logical
Tables
Virtualizer
Retrievemetadata
Retrievemetadata
Retrievemetadata
Pushmetadata
OracleNetezza
Hive
Tables
Pushandquerymetadata
DataLakeRepositories
Meta
Data
DataLakeVirtualization
tag-sync
rule-sync
Config (eg Policies,
Audit log locaMon)
LDAP
Audit Log
Mapper
Searchfordata/reporting
Pushandquery
metadata
Meta
Data
Navigator
Meta
Data
Datameer

GaianDB&Virtualizer
•GaianDB
•OpenSource
•Federated,selflearning,dynamicconfiguration
•BasedonApacheDerby
•Alreadyhad“policy”support–we’replugginginRangerfor
thisproject
•Virtualizer
•Listenstoeventnotificationsonassetsetc
•CreatesviewdefinitionsinGaianDB,andnewAtlasAPIsto
storemetadata.Couldusedifferentvirtualengine..
•Designedtobeopentoothervirtualizationtechnologies.
LT1LT2
DS2DS1DS3
Policy
Plugin
(ranger)
VirtualizerAtlas
GaianDBsupportsfederation
–notusedforMVP

Atlas–glossaryenhancements
•GetAtlasclosertoparitywithcommercialofferings
•BusinessTerms–categories,categoryhierarchies
•Has-a,is-a,type-of,synonym,antonym,arbitraryrelationships
•AssetsmappedtoBusinessTerms
•Classiﬁcations
•Hierarchy
•Navigablemappingstoretainabilitytoﬂattentagstoranger
•InsteadofhivecolumnEMP_SALARY->SPI,nowcanbeEMP_SALARY->SALARY->
SPI…
•Usedtodrivegovernance
•ATLAS-1410

Atlas–otherenhancements
•ConsumerCentricAPIs
•OpenMetadataAccessServices(OMAS)
•REST&moreKafkanotifications
•Asset,Catalog,Connector,Glossary,GovernanceAction,GovernanceDefinitions,
InformationView,RolesandAccess
•RepositorylevelAPIs
•OpenMetadataRepositoryServices(OMRS)
•REST&moreKafkanotifications
•PluggabilitythroughanOpenConnectorFrameworktoothermetadatarepositories–
distributedandOpen
•Standarddatamodel/core
•Enhancementtocoremodel–versioning,externallinkageetc
•Morestandardtypesieforallrelationaldatabasestoeasesharing

Rangerareasbeinglookedat
•BuildingapluginforGaianDB
•Accesscontrol,simplemasking.Morelater
•Usersynchronization(large#users,roleofAtlas)
•ChangestotagsyncprocessforNewglossaryproposal
•AsmoremetadatagoesintoAtlas,itbecomessourceforgenerationofsomekinds
ofpolicies.Whereisthemaster?
•Generatingrangerrulesfromgovernancedefinitions
•HowaboutcontrolofaccesstoAtlasitself?
•Aside:Interfacesusedbyenforcementengines(suchastogetclassificationdata)
needtobeefficient–theseshouldworkforprojectslikeApacheSentryaswellas
Atlas

BeyondtheMVP
•OpenDiscoveryFramework
•Considerothersecurityenforcementengines–suchasApacheSentry&driving
morecapabilityaroundrules&governanceactionsfromAtlasmetadata
•Workonstandardmodelstosupportdifferentdomains
•Lineage
•Fromhighleveldesignlineagethroughtooperationaldetail.Logsvsgraph….
•APImetadata
•Infrastructure–JanusGraph…
•AbstractionaddedbyIBMinlastfewmonthsfortitan1

Thevision
•Anenterprisedatacatalogthatlistsallofyourdata,whereitislocated,itsorigin(lineage),
owner,structure,meaning,classificationandquality
•Spanningsystemsbothonpremiseandcloudproviders
•Hostedlocallytoyourdataplatformsbutintegratedtoprovidetheenterpriseview
•Newdatatools(fromanyvendor)connecttoyourdatacatalogoutofthebox
•Novendorlock-in;norexpensivepopulationofyetanotherproprietarysiloedmetadatarepository
•Metadataisaddedautomaticallytothecatalogasnewdataiscreated
•Extensiblediscoveryprocessescharacteriseandclassifythedata
•Interestedpartiesandprocessesarenotified
•Subjectmatterexpertscollaboratingaroundthedata
•Locatethedatatheyneed,quicklyandefficiently
•Feedbacktheirknowledgeaboutthedataandtheusestheyhavemadeaboutittohelpothersand
supporteconomicevaluationofdata
•Automatedgovernanceprocessesprotectandmanageyourdata
•Metadata-drivenaccesscontrol

Summary
•Atlascanhelpushaveanindustrywidecommonmetadataplatformaroundwhicha
vibrantecosystemcanevolve
•NotonlyinHadoopbutmorebroadly
•Metadatadrivengovernancecanbescalable&enableustomanageourdatabetter,
andbecompliantwithregulations
•Theideaspresentedhereresonatewithmanypeoplewe’vespokento
•Getinvolved!I’dlovetohearthefeedbackonthisapproach!
•CommentontheJIRAS,askquestions,contribute,disagree…;-)
•LookatJIRATag“VirtualDataConnector”orstartatATLAS-1689
•Atlaswiki
•“Innovationhappensbestnotinisolationbutincollaboration”(keynote)
•THANKS!

Questions
Afterthistalk
jonesn@uk.ibm.com
17:50Room4–Security&GovernanceBOF
z
zzz
z
z
z
Questions?

Atlas
graphDB
“gaiandb”
IG
C
IGC REST API
Oracle
Data
HDFS
Data
Netezza
Data
P-JDBCP-JDBCP-JDBC
GAF OMAS
Virtual
Asset
OMAS
Search
Search/ExploreUI
Catalog
OMAS
OMR
S
OMR
S
GAF Pre
GAF Post
Connector Framework
*
Atlas boundaries
Developed in POC
May not be in POC iniNally
*May be hardcoded at ﬁrst
Conne
ctor
Frame
work
ATLAS
Virtualizer
Architecture

Metadataareasandtypes

Policy Metadata (Principles,
Regula6ons, Standards, Approaches,
Rule Specifica6ons, Roles and Metrics)
Governance
Ac6ons and
Processes

Augmenta6on
Mapping
Implementa6on
Connector Directories
Access
Access
Informa6on
Auditor
Integra6on
Developer
Business
Analyst
Data
Scien6st
Informa6on
Worker
Informa6on
Owner
Informa6on
Governor
Informa6on
Steward
Data
Quality
Analyst
Business Objects and
Rela6onships, Taxonomies and
Ontologies
Business AMributes
Organiza6on
Informa6on
Curator
Teaming Metadata
(people profiles, communi6es,
projects,
notebooks, …)
Models and Schemas
3
2
4
5
Physical Asset Descrip6ons
(Data stores, APIs,
models and components)
Asset Collec6ons
(Sets, Typed Sets, Type
Organized Sets)
Informa6on Views
Rights
Management
Reference Data
Feedback Metadata
(tags, comments, ra6ngs, …)
Classifica6on
Schemes
C
l
a
s
s
if
i
c
a
6
o
n
StrategySubject Area Defini6on
Campaigns and Projects
Infrastructure and systems
Rollout

1
Discovery
Metadata (profile data, technical
classifica6on, data classifica6on,
data quality assessment, …)
Augmenta6on

Instrument
Associa6on
Informa6on Process
Instrumenta6on (design lineage)
6
7

User&Group/Rolesynchronization
UserSync2
LDAPholdsrole-membership
(LDAPgroups)–couldalsobe
ActiveDirectory
ATLASmanagesdeﬁnitive
listofroles<thatareusedfor
atlasmanagedsources>
•CorporateLDAPhasahugenumberofusers/groups
•Rangercurrentlyneedstosyncall
•Infutureperhapsweestablishgroup/rolemembership
duringauthentication
•Capabilityforalternativesourcecouldbemergedinto
baseUserSync
LDAPlookup->
group:member
GovernanceActionOMAS
-getRoles
Apache
Ranger
LDAP
ApacheAtlas

AtlasGlossaryv2:TagSynctoRanger
TagSync2
ATLASglossarymanagesa
sophisticatedenterpriseglossary
structure
•AtlasGlossaryv2ProposedinATLAS-1410(DavidRadley)SyncBuildsonexistingtagsyncapproach
•NewAPIinAtlaswillflattenclassificationstructure
•Nochangestoranger–butexposingricherclassificationcouldbeareaoffuturework
Confidential
Salary
emp_renum
Business
Term
HiveColumn
Business
Term
Confidential
emp_renum
HiveColumn
Tag
Apache
Ranger
ApacheAtlas

Policy(Rule)synchronization
RuleSync
•GeneratepoliciesinRangerbasedoffentitiesinAtlas
•Currentlydesigninghowthisworks
•ScopedbypolicyservicesoexistingRangerUIapproachstillworks
-getRules
Role
Classiﬁcations
Asset
RangerRule
Action
Apache
RangerApacheAtlas

VirtualDataConnectorJIRAS20170402
•RANGER-
1488
•RANGER-
1487
•RANGER-
1486
•RANGER-
1485
•RANGER-
1464
•RANGER-
1454
•RANGER-
1234
•RANGER-
•CreateRangerpluginforgaiandb
•generaterulesfromGovernancedeﬁnitionsinAtlas
•NewusersyncalternativeforAtlas(vdc)
•RangersupportforVirtualDataConnectorProject(ATLAS)
•SupportAtlasv2glossaryinAtlasplugin(foraccesscontroltotermsetc)
•SupportofAtlasv2glossaryAPIproposalfortagsource
•Post-evaluationphaseuserextensions
•RangerSource:eclipse
•Adddatamaskingfortagbasedpolicies
•GovernanceActionFrameworkOMAS
•SampleassetstosupportVirtualConnectorProject
•OMASInterfacesforAtlas
•BuildATLASusingDocker

References
•ApacheAtlas-http://atlas.apache.org/
•ToplevelJIRAforthisactivityhttps://issues.apache.org/jira/browse/ATLAS-1689
•ApacheRanger-http://ranger.apache.org/
•GaianDB
•https://github.com/gaiandb/gaiandb
•https://developer.ibm.com/open/openprojects/gaian-database/
•Thecaseforopenmetadata–A.M.Chessell
•http://www.ibmbigdatahub.com/blog/case-open-metadata

Unleashing the Power of Apache Atlas with Apache Ranger

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to Unleashing the Power of Apache Atlas with Apache Ranger

Similar to Unleashing the Power of Apache Atlas with Apache Ranger (16)

More from DataWorks Summit/Hadoop Summit

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded

Recently uploaded (20)

Unleashing the Power of Apache Atlas with Apache Ranger