110 likes | 323 Views
Hive Security. Yongqiang He Software Engineer Facebook Data Infrastructure Team. Agenda. User/Group/Role and Privilege. User can belong to some groups. The user and group information are provided by authenticator.
E N D
Hive Security • Yongqiang He • Software Engineer • Facebook Data Infrastructure Team
User/Group/Role and Privilege • User can belong to some groups. The user and group information are provided by authenticator. • And each user or group can have some privileges and roles. A role can be a member of another role, but not in a circular manner. Hive manages roles and the mapping between user/group and role. • Privileges are associated with user, group, and role. Can grant a privilege to a user individually. Can grant to a group, and all users in the group will get the privilege. Can grant to a role, and all users who have the role will get the privilege.
No Deny, Only grant • Grant all to group1 on db_name.tbl_name; • Revoke all on db_name.tbl_name from group1; • Revoke all on db_name.tbl_name from usr1_in_group1; should fail because the grant is on the group and not on the user usr1.
4 levels of privileges • User level: • For all objects in all databases; It’s globally. • DB level: • For all objects in one database; • Table/Partition level • For all objects in that table/partition; If the table is partitioned, will check partition level and ignore table level. If not, will check table level. Partition level privileges are automatically inherited from table level at the partition creation time. • Column level • Rule: • First, check user level privilege. If pass, then pass. • Second, check db level, if pass then pass. • Third, check table/partition level, if pass then pass. • Last, check column level, if pass then pass. • Finally, deny.
Use case: • 1. In a database, most tables are accessible to everyone. New tables got • created all the time, and they should be visible to everyone. But there is • one( or a few) table ‘secret_tbl’ that is only accessible to a small group of • people. And only a few columns (c1, c2, c3) in that table are visible to everyone • after a amount of time. • Partition the table based on date; • Add the small group of people to ‘s_group’; • Create 3 roles, one is ‘everyone_role’, one is ‘s_group_role’, and the other is ‘s_column_role’; • Grant role ‘everyone_role’ to ‘everyone’; grant ‘s_group_role’ to ‘s_group’; grant ‘c1,c2,c3’ to ‘s_column_role’; • 5) Grant all on ‘secret_tbl’ to ‘s_group_role’; grant ‘select(c1, c2, c3) ’ on ‘secret_tbl’ to ‘s_column_role’; • 6) Whenever a new table get created, by default grant all on that table to ‘everyone_role’ • 7) After a mount of time, revoke all on ‘secret_tbl/ds=partition’ from ‘s_group_role’
HDFS Permission • Without HDFS support, there is no real security. If a user has direct access to the file, the user can do anything. • For a highly-secured table, set the group permission on files of that table. Hive should pass the correct unixgroup information to HDFS. • For column level privileges, the most secured way is file level isolation --- file format that support column group like Zebra. • One other option is to Hive Server. All queries should be submitted from hive server.
First version of Hive authorization • Goal: • Protect a good user from committing a mistake. • Malicious user can hack the system in different ways. More protection just complicated their hack process. If they want, they can always find ways to do it.